Skip to content

[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next#12441

Merged
FlamingoPg merged 11 commits intosgl-project:mainfrom
Valentine233:chunk_gated_delta_rule
Dec 3, 2025
Merged

[CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next#12441
FlamingoPg merged 11 commits intosgl-project:mainfrom
Valentine233:chunk_gated_delta_rule

Conversation

@Valentine233
Copy link
Contributor

@Valentine233 Valentine233 commented Oct 31, 2025

Motivation

This PR adds chunk_gated_delta_rule kernel for Qwen3-next.

Test Plan:
test/srt/cpu/test_mamba.py -k test_chunk_gated_delta_rule

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Valentine233 Valentine233 marked this pull request as draft October 31, 2025 07:00
@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch 3 times, most recently from 420d09f to b4f8e40 Compare November 6, 2025 02:14
Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's finish the minor issues first and then dig into performance related staff.

@mingfeima mingfeima added cpu cpu backend performance optimization intel sgl-kernel labels Nov 7, 2025
@mingfeima
Copy link
Collaborator

@Valentine233 how much does this kernel contribute in e2e benchmarks right now?

@Valentine233
Copy link
Contributor Author

@Valentine233 how much does this kernel contribute in e2e benchmarks right now?

This kernel is about 13.67% of e2e, for Qwen3-Next prefill phase with BS=1, 1k length, TP=2 on GNR.

@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch from b4f8e40 to 462242a Compare November 10, 2025 09:01
Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can continue simplify the code a little bit.

@mingfeima
Copy link
Collaborator

@Valentine233 need to update https://github.com/sgl-project/sglang/blob/main/test/srt/run_suite.py#L493-L510 to make CI really launch the test.

@mingfeima
Copy link
Collaborator

@Valentine233 update this check util according this #12324 (comment)

@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch 2 times, most recently from 678c26d to 4f55760 Compare November 12, 2025 06:06
Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mingfeima mingfeima marked this pull request as ready for review November 12, 2025 06:11
@mingfeima
Copy link
Collaborator

fix CI fails.

@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch 2 times, most recently from 1769d45 to 6cf79bf Compare November 13, 2025 03:05
@FlamingoPg FlamingoPg self-assigned this Nov 20, 2025
@FlamingoPg
Copy link
Collaborator

@Valentine233 Hi, could you plz fix lint? I will help you merge this PR.

@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch from 78ac30c to 03432e7 Compare November 24, 2025 03:12
@Valentine233
Copy link
Contributor Author

Thanks @FlamingoPg, the previous lint issue has been fixed. The current lint issue is not related to the PR: test/srt/test_priority_scheduling.py.

@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch from 03432e7 to 6654b2e Compare November 25, 2025 07:46
@Valentine233 Valentine233 force-pushed the chunk_gated_delta_rule branch from fa64e15 to a71afa3 Compare November 26, 2025 01:50
@Valentine233
Copy link
Contributor Author

Hi @FlamingoPg, I have rebased again. There is no related CI issue now.

Copy link
Collaborator

@FlamingoPg FlamingoPg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FlamingoPg FlamingoPg merged commit c233e9d into sgl-project:main Dec 3, 2025
221 of 231 checks passed
tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025
yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu cpu backend performance optimization intel run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments