Skip to content

[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE#12054

Merged
ch-wan merged 2 commits intomainfrom
cheng/refactor/deepep-deepgemm
Oct 25, 2025
Merged

[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE#12054
ch-wan merged 2 commits intomainfrom
cheng/refactor/deepep-deepgemm

Conversation

@ch-wan
Copy link
Collaborator

@ch-wan ch-wan commented Oct 24, 2025

Motivation

Deprecate forward_deepgemm_contiguous and forward_deepgemm_masked. Finalize deep_gemm backend.

Modifications

Accuracy Tests

Benchmarking and Profiling

Server:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --trust-remote-code --tp 8 --enable-dp-attention --dp 8 --moe-dense-tp-size 1 --enable-dp-lm-head --moe-a2a-backend deepep --enable-two-batch-overlap --ep-num-redundant-experts 32 --ep-dispatch-algorithm dynamic --eplb-algorithm deepseek --cuda-graph-bs 256 --max-running-requests 2048 --disable-radix-cache --device cuda

Benchmark:

python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompt 512 --random-input 1000 --random-output 1000 --random-range-ratio 1 --host 127.0.0.1 --port 30000 --max-concurrency 128

Token throughputs are both around 2120.

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ch-wan ch-wan mentioned this pull request Oct 24, 2025
66 tasks
@ch-wan ch-wan force-pushed the cheng/refactor/deepep-deepgemm branch 3 times, most recently from 6c197cf to 38f98e3 Compare October 24, 2025 21:02
@ch-wan
Copy link
Collaborator Author

ch-wan commented Oct 24, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mixture-of-Experts (MoE) layer by reorganizing the deepgemm runner logic. The core change is deprecating forward_deepgemm_contiguous and forward_deepgemm_masked from DeepEPMoE and moving their functionality into a more modular, runner-based architecture within deep_gemm.py. This is a solid architectural improvement that enhances code clarity and maintainability. The introduction of CombineInput wrappers and renaming of DispatchOutput classes also contribute to better code readability. I've identified a couple of potential memory leaks due to missing tensor disposals and a minor point about using assert False for deprecation warnings. Overall, this is a great step towards a cleaner MoE implementation.

@ch-wan ch-wan changed the title [10/N] [wip] MoE Refactor: reorganize deepgemm runner in DeepEPMoE [10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE Oct 24, 2025
@ch-wan ch-wan force-pushed the cheng/refactor/deepep-deepgemm branch from 412a6c4 to ed71fae Compare October 25, 2025 00:48
@ch-wan ch-wan merged commit 6499498 into main Oct 25, 2025
46 of 70 checks passed
@ch-wan ch-wan deleted the cheng/refactor/deepep-deepgemm branch October 25, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments