Skip to content

Support FA MXFP8 with HEAD-DIM=128#942

Open
njriasan wants to merge 27 commits intofacebookexperimental:mainfrom
njriasan:njriasan/support_head_dim_128
Open

Support FA MXFP8 with HEAD-DIM=128#942
njriasan wants to merge 27 commits intofacebookexperimental:mainfrom
njriasan:njriasan/support_head_dim_128

Conversation

@njriasan
Copy link
Contributor

Implements the buffer sharing strategy described in https://docs.google.com/document/d/16QRLjx0a_KJWkZamD7l5qQpRaHqBpaz-aYUwTI_yCxc/edit?tab=t.0 to allow overlapping the scales with QK. This lets us keep BLOCK_N=128 for HEAD-DIM=128.

This is not optimized for performance. I will analyze that next. In additional I will add a followup to support BLOCK_N=64 (which should be blocked on P's quantization) so I can compare the performance of these strategies.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 21, 2026
@njriasan
Copy link
Contributor Author

This is blocked on a bug in the buffer reuse implementation where I was treating bytes as the unit of TMEM instead of columns. This resulted in aliasing 4x as many columns for the scale as I should (because the scale byte is 1/4th the size of the other buffer).

@njriasan
Copy link
Contributor Author

Okay after fixing there is still a deadlock issue.

@njriasan
Copy link
Contributor Author

Okay deadlock is fixed. Now its all accuracy issues because I see all NaNs.

@njriasan
Copy link
Contributor Author

Okay I believe i have an issue with qk_empties. Its not enough for QK to be empty to override it, I need the other data (e.g. alpha) to be done as well. I think I need to modify the code to handle this.

@njriasan njriasan changed the title [WIP] Support FA MXFP8 with HEAD-DIM=128 Support FA MXFP8 with HEAD-DIM=128 Feb 21, 2026
@njriasan
Copy link
Contributor Author

This depends on #943

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant