[GPU] Implement CDNA block intrinsics (4/6) by nirvedhmeshram · Pull Request #23977 · iree-org/iree

nirvedhmeshram · 2026-03-31T18:01:52Z

Intrinsics with a single-element accumulator (e.g. 4x4 f64 with 4 blocks) require the acc to be extracted to a scalar before passing to amdgpu.mfma, and the scalar result broadcast back to the vector type. This is because otherwise we don't have a valid result type as per the op definition.

Part of #23941

Block intrinsics with a single-element accumulator (e.g. 4x4 with 16 blocks) require the acc to be extracted to a scalar before passing to amdgpu.mfma, and the scalar result broadcast back to the vector type. Part of iree-org#23941 Signed-off-by: Nirvedh Meshram <nirvedh@gmail.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

nirvedhmeshram requested review from Groverkss, Max191, krzysz00 and qedawkins as code owners March 31, 2026 18:01

nirvedhmeshram mentioned this pull request Mar 31, 2026

[GPU] Add CDNA block intrinsics #23941

Open

6 tasks

qedawkins approved these changes Apr 1, 2026

View reviewed changes

nirvedhmeshram merged commit 37dce38 into iree-org:main Apr 1, 2026
63 checks passed

nirvedhmeshram deleted the block_intrinsics_task4 branch April 3, 2026 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Implement CDNA block intrinsics (4/6)#23977

[GPU] Implement CDNA block intrinsics (4/6)#23977
nirvedhmeshram merged 1 commit intoiree-org:mainfrom
nirvedhmeshram:block_intrinsics_task4

nirvedhmeshram commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nirvedhmeshram commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nirvedhmeshram commented Mar 31, 2026 •

edited

Loading