[Spec Decoding] Support MTP for dsv3.2 by Paiiiiiiiiiiiiii · Pull Request #11652 · sgl-project/sglang

Paiiiiiiiiiiiiii · 2025-10-15T05:09:52Z

Motivation

Based on #11109
We have implemented MTP support for DS v3.2 and cuda graph in our in-house maintained version of sglang.
Since the community has completed the MTP modification, we are ready to contribute this feature back to the community.

python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --attention-backend nsa --nsa-prefill flashmla_prefill --nsa-decode flashmla_decode --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --cuda-graph-max-bs 32 --max-running-requests 32

Modifications

Todo

Accuracy test

Accuracy Tests

python3 /gsm8k/bench_sglang.py --num-questions 200

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-15T05:09:55Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Paiiiiiiiiiiiiii · 2025-10-15T05:13:35Z

@Fridge003 here

Fridge003 · 2025-10-15T05:27:06Z

Accuracy results:
#11596 (comment)

python/sglang/srt/speculative/eagle_draft_extend_cuda_graph_runner.py

python/sglang/srt/layers/attention/nsa/nsa_indexer.py

python/sglang/srt/layers/attention/nsa_backend.py

Fridge003 · 2025-10-17T22:16:57Z

cc @hnyls2002 Can you also take a look?

wwj-2017-1117 · 2025-10-28T11:23:42Z

--nsa-prefill flashmla_prefill

launch_server.py: error: argument --nsa-prefill-backend: invalid choice: 'flashmla_prefill' (choose from flashmla_sparse, flashmla_kv, fa3, tilelang, aiter)

Fridge003 · 2025-10-28T17:15:56Z

@wwj-2017-1117 The server arguments name changed recently. Please use flashmla_sparse instead of flashmla_prefill
Ref: https://docs.sglang.ai/basic_usage/deepseek_v32.html

llc-kc · 2025-11-04T03:30:48Z

@Paiiiiiiiiiiiiii Can we use use MTP in PD? Should the MTP-related parameters be added to the prefill launch parameters, or only to the decode?
Thank you very much.

Paiiiiiiiiiiiiii · 2025-11-04T07:41:43Z

@Paiiiiiiiiiiiiii Can we use use MTP in PD? Should the MTP-related parameters be added to the prefill launch parameters, or only to the decode? Thank you very much.

Yes, it can be used in PD, you need to add MTP-related parameters to the P node like D.

Paiiiiiiiiiiiiii added 2 commits October 15, 2025 13:05

[Spec Decoding] Support MTP for dsv3.2

25ae072

[Spec Decoding] Support MTP for dsv3.2

db92765

Paiiiiiiiiiiiiii requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kssteven418, kushanam, merrymercy and zhyncs as code owners October 15, 2025 05:09

Paiiiiiiiiiiiiii mentioned this pull request Oct 15, 2025

[Spec Decoding] Support MTP for dsv3.2 #11596

Closed

5 tasks

Fridge003 assigned Fridge003 and hnyls2002 Oct 15, 2025

Fridge003 added run-ci high priority labels Oct 15, 2025

Fridge003 reviewed Oct 15, 2025

View reviewed changes

python/sglang/srt/speculative/eagle_draft_extend_cuda_graph_runner.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/attention/nsa/nsa_indexer.py Outdated Show resolved Hide resolved

[Spec Decoding] Support MTP for dsv3.2

7fb1b34

Paiiiiiiiiiiiiii requested a review from Fridge003 October 16, 2025 03:52

Fridge003 reviewed Oct 16, 2025

View reviewed changes

[Spec Decoding] Support MTP for dsv3.2

e8209cb

Paiiiiiiiiiiiiii requested a review from Fridge003 October 16, 2025 08:04

Paiiiiiiiiiiiiii and others added 3 commits October 17, 2025 02:18

[Spec Decoding] Support MTP for dsv3.2

b35b639

Merge branch 'main' into mtp

a73c997

Merge branch 'main' into mtp

7ccf51c

Fridge003 approved these changes Oct 17, 2025

View reviewed changes

Fridge003 mentioned this pull request Oct 17, 2025

[Tracking] DeepSeek-V3.2-Exp Day 0 Support #11060

Closed

9 tasks

Fridge003 added 2 commits October 17, 2025 13:25

Merge branch 'main' into mtp

8f05a81

Merge branch 'main' into mtp

a729d39

Merge branch 'main' into mtp

e75f273

hnyls2002 approved these changes Oct 18, 2025

View reviewed changes

Merge branch 'main' into mtp

909f657

Fridge003 added the ready-to-merge The PR is ready to merge after the CI is green. label Oct 18, 2025

Merge branch 'main' into mtp

49f2e5e

hnyls2002 merged commit efa4733 into sgl-project:main Oct 19, 2025
122 of 126 checks passed

Eva20150932-atlascloud mentioned this pull request Dec 8, 2025

feat: DeepSeek new v3.2 encoding #14249

Merged

5 tasks

Fridge003 mentioned this pull request Dec 13, 2025

[Roadmap] DeepSeek v3.2 (GLM 5) Optimization #15025

Open

34 tasks

benchislett mentioned this pull request Feb 21, 2026

[Bug]: [H200] DeepSeek V3.2 MTP > 1 run into error (FLASHMLA_SPARSE backend) vllm-project/vllm#31845

Open

1 task

Conversation

Paiiiiiiiiiiiiii commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Todo

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 15, 2025

Uh oh!

Paiiiiiiiiiiiiii commented Oct 15, 2025

Uh oh!

Fridge003 commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fridge003 commented Oct 17, 2025

Uh oh!

Uh oh!

wwj-2017-1117 commented Oct 28, 2025

Uh oh!

Fridge003 commented Oct 28, 2025

Uh oh!

llc-kc commented Nov 4, 2025

Uh oh!

Paiiiiiiiiiiiiii commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Paiiiiiiiiiiiiii commented Oct 15, 2025 •

edited

Loading