Skip to content

[Spec Decoding] Support MTP for dsv3.2#11652

Merged
hnyls2002 merged 12 commits intosgl-project:mainfrom
Paiiiiiiiiiiiiii:mtp
Oct 19, 2025
Merged

[Spec Decoding] Support MTP for dsv3.2#11652
hnyls2002 merged 12 commits intosgl-project:mainfrom
Paiiiiiiiiiiiiii:mtp

Conversation

@Paiiiiiiiiiiiiii
Copy link
Contributor

@Paiiiiiiiiiiiiii Paiiiiiiiiiiiiii commented Oct 15, 2025

Motivation

Based on #11109
We have implemented MTP support for DS v3.2 and cuda graph in our in-house maintained version of sglang.
Since the community has completed the MTP modification, we are ready to contribute this feature back to the community.

python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --attention-backend nsa --nsa-prefill flashmla_prefill --nsa-decode flashmla_decode --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --cuda-graph-max-bs 32 --max-running-requests 32

Modifications

Todo

  • Accuracy test

Accuracy Tests

python3 /gsm8k/bench_sglang.py --num-questions 200
d2cd1fe1a434f31986e97715e2a60929

Benchmarking and Profiling

f8e96a17c1fcf74444bbffa9efa940d7 image

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Paiiiiiiiiiiiiii
Copy link
Contributor Author

@Fridge003 here

@Fridge003
Copy link
Collaborator

Accuracy results:
#11596 (comment)

@Fridge003
Copy link
Collaborator

cc @hnyls2002 Can you also take a look?

@Fridge003 Fridge003 added the ready-to-merge The PR is ready to merge after the CI is green. label Oct 18, 2025
@hnyls2002 hnyls2002 merged commit efa4733 into sgl-project:main Oct 19, 2025
122 of 126 checks passed
@wwj-2017-1117
Copy link

--nsa-prefill flashmla_prefill

launch_server.py: error: argument --nsa-prefill-backend: invalid choice: 'flashmla_prefill' (choose from flashmla_sparse, flashmla_kv, fa3, tilelang, aiter)

@Fridge003
Copy link
Collaborator

@wwj-2017-1117 The server arguments name changed recently. Please use flashmla_sparse instead of flashmla_prefill
Ref: https://docs.sglang.ai/basic_usage/deepseek_v32.html

@llc-kc
Copy link
Contributor

llc-kc commented Nov 4, 2025

@Paiiiiiiiiiiiiii Can we use use MTP in PD? Should the MTP-related parameters be added to the prefill launch parameters, or only to the decode?
Thank you very much.

@Paiiiiiiiiiiiiii
Copy link
Contributor Author

@Paiiiiiiiiiiiiii Can we use use MTP in PD? Should the MTP-related parameters be added to the prefill launch parameters, or only to the decode? Thank you very much.

Yes, it can be used in PD, you need to add MTP-related parameters to the P node like D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority ready-to-merge The PR is ready to merge after the CI is green. run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants