-
Notifications
You must be signed in to change notification settings - Fork 873
[GPU] Add CDNA block intrinsics #23941
Copy link
Copy link
Open
Description
Block intrinsics allow subgroups to work on muliple blocks (batch dimenion) in parallel. They often have smaller sizes which make them a good fit for skinny shapes. This issue will track various steps to have them added. These tasks are sharded from this test PR which verified that we have e2e correctness #23934
- Add intrinsics and layout
- Add Pack to intrinisic support
- Add ConfigureTensorLayout support (required by VectorDistribute)
- Fix inner tiled to amdgpu.mfma lowering for single element result seen in block intrinisics
- Add e2e batch matmul tests for both TileAndFuse and VectorDistribute
- Add blockIntrinisics in the target details and update heuristic to use it in appropriate situations with correct lowering configs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels