support eplb for qwen3 by yizhang2077 · Pull Request #6533 · sgl-project/sglang

yizhang2077 · 2025-05-22T18:03:20Z

Motivation

support eplb for qwen3moe, ~~need merge #6120 first, then do other modification. (add ExpertLocationDispatchInfo)~~

simple test

python3 -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-FP8 --tp 8 --dp 2 --enable-dp-attention --trust-remote --enable-deepep-moe --deepep-mode normal --enable-eplb --expert-distribution-recorder-buffer-size 50  --expert-distribution-recorder-mode stat --disable-radix-cache --eplb-rebalance-num-iterations 50 --ep-dispatch-algorithm static --ep-num-redundant-experts 32

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319

Accuracy: 0.949
Invalid: 0.000
Latency: 216.240 s
Output throughput: 941.167 token/s

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

fzyzcjy · 2025-05-22T23:52:12Z

LGTM as long as tests pass

libratiger · 2025-05-27T09:58:28Z

this PR seem broken the pipeline parallelism for Qwen MoE model。

we can reproduce with the following command.

python3 -m sglang.launch_server --model Qwen/Qwen3-30B-A3B --pp 2

[2025-05-27 09:50:45 PP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2322, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, pp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 280, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 78, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 234, in __init__
    self.initialize(min_per_gpu_memory)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 269, in initialize
    self.load_model()
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 553, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 23, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 386, in load_model
    self.load_weights_and_postprocess(
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 394, in load_weights_and_postprocess
    model.load_weights(weights)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 799, in load_weights
    if isinstance(layer.mlp, Qwen3MoeSparseMoeBlock)
                  ^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
    raise AttributeError(
AttributeError: 'PPMissingLayer' object has no attribute 'mlp'

[2025-05-27 09:50:45] Received sigquit from a child process. It usually means the child failed.
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:07<00:00,  2.26it/s]

jinyouzhi · 2025-05-28T13:10:22Z

this PR seem broken the pipeline parallelism for Qwen MoE model。

we can reproduce with the following command.

python3 -m sglang.launch_server --model Qwen/Qwen3-30B-A3B --pp 2

[2025-05-27 09:50:45 PP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2322, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, pp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 280, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 78, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 234, in __init__
    self.initialize(min_per_gpu_memory)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 269, in initialize
    self.load_model()
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 553, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 23, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 386, in load_model
    self.load_weights_and_postprocess(
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 394, in load_weights_and_postprocess
    model.load_weights(weights)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 799, in load_weights
    if isinstance(layer.mlp, Qwen3MoeSparseMoeBlock)
                  ^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
    raise AttributeError(
AttributeError: 'PPMissingLayer' object has no attribute 'mlp'

[2025-05-27 09:50:45] Received sigquit from a child process. It usually means the child failed.
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:07<00:00,  2.26it/s]

Hi, I can reproduce and try to fix this in #6709. I appreciate very much if you can give some comment or guidance. Thanks!

yizhang2077 requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy, zhaochenyang20 and zhyncs as code owners May 22, 2025 18:03

yizhang2077 changed the title ~~support eplb for qwen3~~ [WIP] support eplb for qwen3 May 22, 2025

support eplb for qwen3

df8c6cf

yizhang2077 force-pushed the qwen3-support-eplb branch from 7d53b13 to df8c6cf Compare May 22, 2025 18:24

yizhang2077 changed the title ~~[WIP] support eplb for qwen3~~ support eplb for qwen3 May 22, 2025

yizhang2077 assigned fzyzcjy May 22, 2025

fzyzcjy approved these changes May 22, 2025

View reviewed changes

fix some bugs

f23bd19

yizhang2077 force-pushed the qwen3-support-eplb branch from bd63b4b to f23bd19 Compare May 23, 2025 03:22

yizhang2077 requested review from BBuf, HaiShaw and ch-wan as code owners May 23, 2025 03:22

yizhang2077 added 3 commits May 23, 2025 05:01

fix some bugs

008c3f0

fix some bugs

f217ecd

fix some bugs

3b40e74

yizhang2077 force-pushed the qwen3-support-eplb branch from d971c70 to 3b40e74 Compare May 23, 2025 06:41

yizhang2077 requested a review from xiezhq-hermann as a code owner May 23, 2025 06:41

yizhang2077 and others added 3 commits May 23, 2025 14:42

Merge branch 'main' into qwen3-support-eplb

19589f6

Merge branch 'main' into qwen3-support-eplb

a0b08af

Merge branch 'main' into qwen3-support-eplb

a30c38b

zhyncs merged commit e6f1135 into main May 24, 2025
1 of 42 checks passed

zhyncs deleted the qwen3-support-eplb branch May 24, 2025 01:31

jinyouzhi mentioned this pull request May 28, 2025

Fix PP for Qwen3 MoE #6709

Merged

6 tasks

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

support eplb for qwen3 (sgl-project#6533)

4b1578d

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

support eplb for qwen3 (sgl-project#6533)

29f1b98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support eplb for qwen3#6533

support eplb for qwen3#6533
zhyncs merged 8 commits intomainfrom
qwen3-support-eplb

yizhang2077 commented May 22, 2025 •

edited

Loading

Uh oh!

fzyzcjy commented May 22, 2025

Uh oh!

Uh oh!

libratiger commented May 27, 2025

Uh oh!

jinyouzhi commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yizhang2077 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

fzyzcjy commented May 22, 2025

Uh oh!

Uh oh!

libratiger commented May 27, 2025

Uh oh!

jinyouzhi commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yizhang2077 commented May 22, 2025 •

edited

Loading