Skip to content

Support Dp and Dp attn for MTP#297

Open
ZhangLirong-amd wants to merge 2 commits intomainfrom
zlr/mtp_dp
Open

Support Dp and Dp attn for MTP#297
ZhangLirong-amd wants to merge 2 commits intomainfrom
zlr/mtp_dp

Conversation

@ZhangLirong-amd
Copy link
Contributor

@ZhangLirong-amd ZhangLirong-amd commented Mar 10, 2026

Motivation

  1. support MTP draft model run in dp dummy docode/prefill
  2. Solve some seqlen or mtp_k issue for DP
python3 -m atom.entrypoints.openai_server --model /data/DeepSeek-R1-0528/ -tp 8 --port 5678 --server-port 7777  --kv_cache_dtype fp8 --torch-profiler-dir ./log --method mtp --num-speculative-tokens 3 --block-size 10000 --gpu-memory-utilization 0.41 --enable-dp-attention --enable-expert-parallel

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings March 10, 2026 09:16
@ZhangLirong-amd ZhangLirong-amd changed the title Support Dp and Dp attn for DS MTP Support Dp and Dp attn for MTP Mar 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Data Parallel (DP) support and DP attention (Dp attn) to the DeepSeek MTP (Multi-Token Prediction) speculative decoding path. It fixes crashes and incorrect behavior that occur when dummy runs—used for DP synchronization—are executed alongside a drafter model.

Changes:

  • Added is_dummy_run guards around slot mapping computation and spec decode metadata calculation to prevent crashes during DP synchronization
  • Updated dummy_execution and dummy_prefill_execution to capture and propagate hidden_states through the drafter model for CUDA graph capture
  • Added defensive guards in SpecStats._log() to prevent division-by-zero in DP edge cases, and initialized num_rejected/num_bonus in tokenIDProcessor.clean()

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
atom/model_ops/attentions/aiter_mla.py Updated max_seqlen_qo to account for MTP tokens; wrapped slot mapping and sum token computation with is_dummy_run guards in prepare_decode
atom/model_engine/model_runner.py Captures hidden_states from run_model in dummy execution paths; runs drafter model in dummy runs for CUDA graph capture; skips calc_spec_decode_metadata during dummy runs; initializes num_rejected/num_bonus in clean()
atom/model_engine/scheduler.py Added ts == 0 and iv_steps == 0 early returns in SpecStats._log() to prevent division-by-zero in DP scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants