[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next#12441
[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next#12441FlamingoPg merged 11 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
420d09f to
b4f8e40
Compare
mingfeima
left a comment
There was a problem hiding this comment.
let's finish the minor issues first and then dig into performance related staff.
|
@Valentine233 how much does this kernel contribute in e2e benchmarks right now? |
This kernel is about 13.67% of e2e, for Qwen3-Next prefill phase with BS=1, 1k length, TP=2 on GNR. |
b4f8e40 to
462242a
Compare
mingfeima
left a comment
There was a problem hiding this comment.
we can continue simplify the code a little bit.
|
@Valentine233 need to update |
|
@Valentine233 update this check util according this #12324 (comment) |
678c26d to
4f55760
Compare
|
fix CI fails. |
1769d45 to
6cf79bf
Compare
|
@Valentine233 Hi, could you plz fix lint? I will help you merge this PR. |
78ac30c to
03432e7
Compare
|
Thanks @FlamingoPg, the previous lint issue has been fixed. The current lint issue is not related to the PR: test/srt/test_priority_scheduling.py. |
03432e7 to
6654b2e
Compare
fa64e15 to
a71afa3
Compare
|
Hi @FlamingoPg, I have rebased again. There is no related CI issue now. |
Motivation
This PR adds
chunk_gated_delta_rulekernel for Qwen3-next.Test Plan:
test/srt/cpu/test_mamba.py -k test_chunk_gated_delta_ruleModifications
Accuracy Tests
Benchmarking and Profiling
Checklist