optimize get_topk_ragged by fusing get k and k_scale triton kernel#16043
optimize get_topk_ragged by fusing get k and k_scale triton kernel#16043Fridge003 merged 6 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
16212d6 to
8c3d453
Compare
|
The above commit splits the for loops for getting k\scale and ke\ks into two separate kernels.Judging solely from my test results。
command:
|
5332824 to
7ed5427
Compare
2b4b674 to
4507c0c
Compare
|
Please help me review code. Thanks! |
|
@BJWang-ant Please fix conflict |
|
Please post the GPQA/AIME2025 results as the instructions here |
I will test again, and post the GPQA result. |
|
Please fix lint with |
I test AIME2025 again,the same result was obtained. |
60ef273 to
6680bbc
Compare
|
/tag-and-rerun-ci |
…gl-project#16043) Co-authored-by: abing <wangbingjia.wbj@alibaba-inc.com>
…gl-project#16043) Co-authored-by: abing <wangbingjia.wbj@alibaba-inc.com>
|
Hi, after this PR was merged, DeepSeek-v3.2 encounters the following error when running with a long context (> 65,536 tokens): Server: Client: |
I will push code either today or tomorrow. |
|
@BJWang-ant Thanks for the update. The snapshot looks like you have another PR page. Could you share the PR link so I can run some tests once it’s merged? |
I'm not quite sure which PR you are referring to. |
|
Hi @BJWang-ant, I also found this PR will cause errors on some extremely long context. So we need to revert this PR temporarily. Please combine this PR with fix together in the next PR. |
OK. Couldo you please give me some bad case? |
|
It can be easily hit when input length is 128k |
|
@BJWang-ant The original code has a bug in the CP (context parallel) scenario. Could you please include this fix as well? |
OK |
…gl-project#16043) Co-authored-by: abing <wangbingjia.wbj@alibaba-inc.com>





Optimize the get_index_k_scale_buffer function to reduce the number of concatenation operations of k_fp8 and k_scale.
