Support FA MXFP8 with HEAD-DIM=128 by njriasan · Pull Request #942 · facebookexperimental/triton

njriasan · 2026-02-21T02:05:35Z

Implements the buffer sharing strategy described in https://docs.google.com/document/d/16QRLjx0a_KJWkZamD7l5qQpRaHqBpaz-aYUwTI_yCxc/edit?tab=t.0 to allow overlapping the scales with QK. This lets us keep BLOCK_N=128 for HEAD-DIM=128.

This is not optimized for performance. I will analyze that next. In additional I will add a followup to support BLOCK_N=64 (which should be blocked on P's quantization) so I can compare the performance of these strategies.

njriasan · 2026-02-21T02:06:42Z

This is blocked on a bug in the buffer reuse implementation where I was treating bytes as the unit of TMEM instead of columns. This resulted in aliasing 4x as many columns for the scale as I should (because the scale byte is 1/4th the size of the other buffer).

…ead_dim_128

njriasan · 2026-02-21T02:43:53Z

Okay after fixing there is still a deadlock issue.

njriasan · 2026-02-21T02:51:33Z

Okay deadlock is fixed. Now its all accuracy issues because I see all NaNs.

njriasan · 2026-02-21T03:06:03Z

Okay I believe i have an issue with qk_empties. Its not enough for QK to be empty to override it, I need the other data (e.g. alpha) to be done as well. I think I need to modify the code to handle this.

njriasan · 2026-02-21T16:58:43Z

This depends on #943

njriasan added 11 commits February 13, 2026 17:01

Added explicit tmem

96e0701

fixed perf regression

990777c

add async steps

3e56594

updated the code

4ecdd9a

move to commit

6f6f73d

Merge branch 'main' into njriasan/explicit_tmem

d7d631d

Added reuse group specification

61a8a92

Fix Q's indexing

8747b2f

added Q scale support

42e0421

added barriers for K and V

08cde05

Fixed bug

d8ede28

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 21, 2026

njriasan added 2 commits February 20, 2026 18:35

add claude fixes

3f98e9a

Merge branch 'njriasan/fix_tmem_buffer_reuse' into njriasan/support_h…

3344c21

…ead_dim_128

njriasan added 2 commits February 20, 2026 18:48

fixed the bug

37ab263

fix another barrier issue

7eff811

njriasan added 2 commits February 20, 2026 19:00

fixed several NaN issues

ac9ac43

Fixed another bug

62da1fb

njriasan added 7 commits February 21, 2026 06:38

Added L, but its wrong

7d1301e

Added comments for fixing the issues

4fba416

Added buffer fixes

31b357a

cleaned up the barrier logic

1f23f4d

fix barrier error

fb1b6aa

Fixed remaining bugs

0506359

Updated accuracy affter fixing buffer accesses

5e2b20d

njriasan changed the title ~~[WIP] Support FA MXFP8 with HEAD-DIM=128~~ Support FA MXFP8 with HEAD-DIM=128 Feb 21, 2026

njriasan added 3 commits February 21, 2026 09:01

Merge branch 'main' into njriasan/fix_tmem_buffer_reuse

d282340

refactor the code

d43e900

Resolved merge conflict

546d00e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support FA MXFP8 with HEAD-DIM=128#942

Support FA MXFP8 with HEAD-DIM=128#942
njriasan wants to merge 27 commits intofacebookexperimental:mainfrom
njriasan:njriasan/support_head_dim_128

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

njriasan commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant