Skip to content

support eplb for qwen3#6533

Merged
zhyncs merged 8 commits intomainfrom
qwen3-support-eplb
May 24, 2025
Merged

support eplb for qwen3#6533
zhyncs merged 8 commits intomainfrom
qwen3-support-eplb

Conversation

@yizhang2077
Copy link
Collaborator

@yizhang2077 yizhang2077 commented May 22, 2025

Motivation

support eplb for qwen3moe, need merge #6120 first, then do other modification. (add ExpertLocationDispatchInfo)

simple test

python3 -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-FP8 --tp 8 --dp 2 --enable-dp-attention --trust-remote --enable-deepep-moe --deepep-mode normal --enable-eplb --expert-distribution-recorder-buffer-size 50  --expert-distribution-recorder-mode stat --disable-radix-cache --eplb-rebalance-num-iterations 50 --ep-dispatch-algorithm static --ep-num-redundant-experts 32

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319

Accuracy: 0.949
Invalid: 0.000
Latency: 216.240 s
Output throughput: 941.167 token/s

Modifications

Checklist

@yizhang2077 yizhang2077 changed the title support eplb for qwen3 [WIP] support eplb for qwen3 May 22, 2025
@yizhang2077 yizhang2077 force-pushed the qwen3-support-eplb branch from 7d53b13 to df8c6cf Compare May 22, 2025 18:24
@yizhang2077 yizhang2077 changed the title [WIP] support eplb for qwen3 support eplb for qwen3 May 22, 2025
@fzyzcjy
Copy link
Collaborator

fzyzcjy commented May 22, 2025

LGTM as long as tests pass

@zhyncs zhyncs merged commit e6f1135 into main May 24, 2025
1 of 42 checks passed
@zhyncs zhyncs deleted the qwen3-support-eplb branch May 24, 2025 01:31
@libratiger
Copy link
Contributor

this PR seem broken the pipeline parallelism for Qwen MoE model。

we can reproduce with the following command.

python3 -m sglang.launch_server --model Qwen/Qwen3-30B-A3B --pp 2
[2025-05-27 09:50:45 PP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2322, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, pp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 280, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 78, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 234, in __init__
    self.initialize(min_per_gpu_memory)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 269, in initialize
    self.load_model()
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 553, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 23, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 386, in load_model
    self.load_weights_and_postprocess(
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 394, in load_weights_and_postprocess
    model.load_weights(weights)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 799, in load_weights
    if isinstance(layer.mlp, Qwen3MoeSparseMoeBlock)
                  ^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
    raise AttributeError(
AttributeError: 'PPMissingLayer' object has no attribute 'mlp'

[2025-05-27 09:50:45] Received sigquit from a child process. It usually means the child failed.
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:07<00:00,  2.26it/s]

@jinyouzhi jinyouzhi mentioned this pull request May 28, 2025
6 tasks
@jinyouzhi
Copy link
Contributor

this PR seem broken the pipeline parallelism for Qwen MoE model。

we can reproduce with the following command.

python3 -m sglang.launch_server --model Qwen/Qwen3-30B-A3B --pp 2
[2025-05-27 09:50:45 PP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 2322, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, pp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/scheduler.py", line 280, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/managers/tp_worker.py", line 78, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 234, in __init__
    self.initialize(min_per_gpu_memory)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 269, in initialize
    self.load_model()
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_executor/model_runner.py", line 553, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 23, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 386, in load_model
    self.load_weights_and_postprocess(
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 394, in load_weights_and_postprocess
    model.load_weights(weights)
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/sglang/srt/models/qwen3_moe.py", line 799, in load_weights
    if isinstance(layer.mlp, Qwen3MoeSparseMoeBlock)
                  ^^^^^^^^^
  File "/home/user/miniconda/envs/sglang/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1928, in __getattr__
    raise AttributeError(
AttributeError: 'PPMissingLayer' object has no attribute 'mlp'

[2025-05-27 09:50:45] Received sigquit from a child process. It usually means the child failed.
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:07<00:00,  2.26it/s]

Hi, I can reproduce and try to fix this in #6709. I appreciate very much if you can give some comment or guidance. Thanks!

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants