[Performance] Optimize NSA Indexer K/S Buffer Access with Fused Triton Kernels#13812
Conversation
Summary of ChangesHello @Johnsonms, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the performance of the NSA Indexer's K/S buffer access within the SGLang framework. It achieves this by migrating from existing Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations by implementing fused Triton kernels for accessing K/S buffers in the NSA Indexer. The changes are well-structured and effectively reduce kernel launch overhead. My review includes a few minor suggestions to enhance code clarity and maintainability, such as removing commented-out debug statements and refactoring a small part of a Triton kernel to avoid redundant computations.
c55a480 to
21e94e4
Compare
|
Please add correctness tests for the get_k_and_s kernel (can put them under |
fd63f10 to
c13e826
Compare
f971d49 to
5d84e0f
Compare
|
/tag-and-rerun-ci |
|
@Johnsonms Can you verify the correctness of this PR on AIME/GPQA? |
Removed commented print statements for Triton functions.
6d38869 to
3caef92
Compare
…n Kernels (sgl-project#13812) Co-authored-by: Johnsonms <johnson@together.ai>
…n Kernels (sgl-project#13812) Co-authored-by: Johnsonms <johnson@together.ai>
…n Kernels (sgl-project#13812) Co-authored-by: Johnsonms <johnson@together.ai>
…n Kernels (sgl-project#13812) Co-authored-by: Johnsonms <johnson@together.ai>




Motivation
#13811
Modifications
Implement fused Triton kernels that:
torch_fastto optimized Triton implementationsAccuracy Tests
Before:

After:

Benchmarking and Profiling
Before:


After:


Checklist