Skip to content

Update buffer reuse to use Columns as the TMEM unit instead of bytes#943

Closed
njriasan wants to merge 1 commit intofacebookexperimental:mainfrom
njriasan:njriasan/fix_tmem_buffer_reuse
Closed

Update buffer reuse to use Columns as the TMEM unit instead of bytes#943
njriasan wants to merge 1 commit intofacebookexperimental:mainfrom
njriasan:njriasan/fix_tmem_buffer_reuse

Conversation

@njriasan
Copy link
Contributor

Fixes a bug where by failing to use columns as the fundamental unit of TMEM reuse/scaling scales will allocate too many buffers, breaking reuse.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 21, 2026
@njriasan njriasan changed the title [WIP] Update buffer reuse to use Columns as the TMEM unit instead of bytes Update buffer reuse to use Columns as the TMEM unit instead of bytes Feb 21, 2026
int m = shapePerCTA[shapePerCTA.size() - 2];
int numBuffers = shape.size() == 3 ? shape[0] : 1;
int numColumn = ceil<int>(m, 32) * ceil<int>(k, 4) * numBuffers;
int numColumn = getTmemScalesColumnsPerBuffer(m, k) * numBuffers;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring is to share code with the dummy layout.

@njriasan njriasan requested a review from htyu February 21, 2026 17:23
@njriasan
Copy link
Contributor Author

I hit this issue when trying to share with scales and data in the MXFP8 kernel (it tried to pad too many columns). With this change the kernel is working now.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Feb 21, 2026

@njriasan has imported this pull request. If you are a Meta employee, you can view this in D93986455.

Copy link
Contributor

@htyu htyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Value element, int64_t currentOffset, int64_t bytesBetweenBufferGroups,
int64_t alignment, int64_t currentGroupSize,
DenseMap<Value, std::tuple<int64_t, int64_t, int64_t>> &offsetMap) {
DenseMap<Value, std::tuple<int64_t, int64_t, int64_t>> &offsetMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment about what fields mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually in the comment above the function (line 25).

@njriasan njriasan force-pushed the njriasan/fix_tmem_buffer_reuse branch from d43e900 to 5975f18 Compare February 25, 2026 20:56
njriasan added a commit to njriasan/fb-experimental-triton that referenced this pull request Feb 25, 2026
…facebookexperimental#943)

Summary:
Fixes a bug where by failing to use columns as the fundamental unit of TMEM reuse/scaling scales will allocate too many buffers, breaking reuse.


Reviewed By: htyu

Differential Revision: D93986455

Pulled By: njriasan
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Feb 25, 2026

@njriasan has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93986455.

…facebookexperimental#943)

Summary:
Fixes a bug where by failing to use columns as the fundamental unit of TMEM reuse/scaling scales will allocate too many buffers, breaking reuse.


Reviewed By: htyu

Differential Revision: D93986455

Pulled By: njriasan
@njriasan njriasan force-pushed the njriasan/fix_tmem_buffer_reuse branch from 5975f18 to 9cc003e Compare February 25, 2026 22:29
@meta-codesync meta-codesync bot closed this in 1d49e96 Feb 26, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Feb 26, 2026

@njriasan merged this pull request in 1d49e96.

tissue3 pushed a commit to tissue3/triton that referenced this pull request Feb 27, 2026
…acebookexperimental#943)

Summary:
Fixes a bug where by failing to use columns as the fundamental unit of TMEM reuse/scaling scales will allocate too many buffers, breaking reuse.

Pull Request resolved: facebookexperimental#943

Reviewed By: htyu

Differential Revision: D93986455

Pulled By: njriasan

fbshipit-source-id: 4c7a68d6428b3143ba708f7332082ba4f35cde5b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants