[Spec Decoding] Support MTP for dsv3.2#11652
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@Fridge003 here |
|
Accuracy results: |
python/sglang/srt/speculative/eagle_draft_extend_cuda_graph_runner.py
Outdated
Show resolved
Hide resolved
|
cc @hnyls2002 Can you also take a look? |
|
--nsa-prefill flashmla_prefill launch_server.py: error: argument --nsa-prefill-backend: invalid choice: 'flashmla_prefill' (choose from flashmla_sparse, flashmla_kv, fa3, tilelang, aiter) |
|
@wwj-2017-1117 The server arguments name changed recently. Please use |
|
@Paiiiiiiiiiiiiii Can we use use MTP in PD? Should the MTP-related parameters be added to the prefill launch parameters, or only to the decode? |
Yes, it can be used in PD, you need to add MTP-related parameters to the P node like D. |
Motivation
Based on #11109
We have implemented MTP support for DS v3.2 and cuda graph in our in-house maintained version of sglang.
Since the community has completed the MTP modification, we are ready to contribute this feature back to the community.
python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --attention-backend nsa --nsa-prefill flashmla_prefill --nsa-decode flashmla_decode --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --cuda-graph-max-bs 32 --max-running-requests 32Modifications
Todo
Accuracy Tests
python3 /gsm8k/bench_sglang.py --num-questions 200Benchmarking and Profiling
Checklist