[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE by ch-wan · Pull Request #12054 · sgl-project/sglang

ch-wan · 2025-10-24T04:57:27Z

Motivation

Deprecate forward_deepgemm_contiguous and forward_deepgemm_masked. Finalize deep_gemm backend.

Modifications

Accuracy Tests

Benchmarking and Profiling

Server:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3-0324 --trust-remote-code --tp 8 --enable-dp-attention --dp 8 --moe-dense-tp-size 1 --enable-dp-lm-head --moe-a2a-backend deepep --enable-two-batch-overlap --ep-num-redundant-experts 32 --ep-dispatch-algorithm dynamic --eplb-algorithm deepseek --cuda-graph-bs 256 --max-running-requests 2048 --disable-radix-cache --device cuda

Benchmark:

python3 -m sglang.bench_serving --backend sglang --dataset-name random --num-prompt 512 --random-input 1000 --random-output 1000 --random-range-ratio 1 --host 127.0.0.1 --port 30000 --max-concurrency 128

Token throughputs are both around 2120.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-24T04:57:30Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ch-wan · 2025-10-24T21:05:01Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the Mixture-of-Experts (MoE) layer by reorganizing the deepgemm runner logic. The core change is deprecating forward_deepgemm_contiguous and forward_deepgemm_masked from DeepEPMoE and moving their functionality into a more modular, runner-based architecture within deep_gemm.py. This is a solid architectural improvement that enhances code clarity and maintainability. The introduction of CombineInput wrappers and renaming of DispatchOutput classes also contribute to better code readability. I've identified a couple of potential memory leaks due to missing tensor disposals and a minor point about using assert False for deprecation warnings. Overall, this is a great step towards a cleaner MoE implementation.

python/sglang/srt/layers/moe/moe_runner/deep_gemm.py

python/sglang/srt/layers/moe/ep_moe/layer.py

ch-wan requested review from BBuf, Edwardf0t1, FlamingoPg, HaiShaw, Ying1123, hnyls2002, ispobock, kushanam, merrymercy and zhyncs as code owners October 24, 2025 04:57

sglang-bot added the run-ci label Oct 24, 2025

ch-wan mentioned this pull request Oct 24, 2025

[Roadmap] MoE Refactor #8715

Open

66 tasks

ch-wan force-pushed the cheng/refactor/deepep-deepgemm branch 3 times, most recently from 6c197cf to 38f98e3 Compare October 24, 2025 21:02

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

ch-wan changed the title ~~[10/N] [wip] MoE Refactor: reorganize deepgemm runner in DeepEPMoE~~ [10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE Oct 24, 2025

shrink

ed71fae

ch-wan force-pushed the cheng/refactor/deepep-deepgemm branch from 412a6c4 to ed71fae Compare October 25, 2025 00:48

fix

6f59809

ch-wan merged commit 6499498 into main Oct 25, 2025
46 of 70 checks passed

ch-wan deleted the cheng/refactor/deepep-deepgemm branch October 25, 2025 02:16

ch-wan mentioned this pull request Oct 28, 2025

[hotfix] Incorrect CombineOverlapArgs in SBO #12230

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE#12054

[10/N] MoE Refactor: reorganize deepgemm runner in DeepEPMoE#12054
ch-wan merged 2 commits intomainfrom
cheng/refactor/deepep-deepgemm

ch-wan commented Oct 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Uh oh!

ch-wan commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ch-wan commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Uh oh!

ch-wan commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

ch-wan commented Oct 24, 2025 •

edited

Loading