[Feature] Support DeepEP normal & Redundant Experts on NPU by iforgetmyname · Pull Request #9881 · sgl-project/sglang

iforgetmyname · 2025-09-01T14:57:33Z

Motivation

This PR adds support on Ascend NPU for:

DeepEP normal mode
Redundant Experts

Along with previously merged #8355, we are now allowing both prefill and decode to run with expert parallelism on Altas 800I A3. This also means running large-scale moe models without PD disaggregation is also possible if HBM capacity allows.

Checkout our roadmap here about DeepEP-Ascend

Modifications

Fix a bug where FIA kernel argues not supporting input_seq_len smaller than tp_size
Remove AscendDeepEPLLOutput due to DeepEP-Ascend now aligns output variables with DeepSeek's DeepEP
Support intranode dispatch/combine (deepep normal mode)
Support expert distribution recorder & redundant experts

Accuracy Tests

Benchmarking and Profiling

# Prefill
# NOTE: should increase the number of P instances as D is definitely not fullfilled
export HCCL_BUFFSIZE=1536
python3 -m sglang.launch_server \
    --model-path <deepseek-model-path> \
    --trust-remote-code \
    --attention-backend ascend \
    --mem-fraction-static 0.85 \
    --quantization w8a8_int8 \
    --disable-radix-cache \
    --chunked-prefill-size 32768 \
    --tp-size 16 \
    --dp-size 1 \
    --ep-size 16 \
    --moe-a2a-backend deepep \
    --deepep-mode normal \
    --nnodes 1 \
    --node-rank 0 \
    --disaggregation-mode prefill \
    --disaggregation-transfer-backend ascend \
    --ep-num-redundant-experts 16 \
    --ep-dispatch-algorithm static \
    --init-expert-location <location-file>

# Decode
export HCCL_BUFFSIZE=500
export SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=32
python3 -m sglang.launch_server \
    --model-path <deepseek-model-path> \
    --max-running-requests 512 \
    --trust-remote-code \
    --attention-backend ascend \
    --mem-fraction-static 0.9 \
    --quantization w8a8_int8 \
    --disable-radix-cache \
    --chunked-prefill-size 32768 \
    --cuda-graph-bs 8 16 24 32 \
    --tp-size 16 \
    --dp-size 2 \
    --enable-dp-attention \
    --ep-size 16 \
    --moe-a2a-backend deepep \
    --deepep-mode low_latency \
    --nnodes 1 \
    --node-rank 0 \
    --disaggregation-mode decode \
    --disaggregation-transfer-backend ascend \
    --ep-num-redundant-experts 16 \
    --ep-dispatch-algorithm static \
    --init-expert-location <location-file>

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

* support deepep normal * change forward_npu return * fix linting * bugfixes * linting happy * lingting happy

* EPLB support * linting

…ame/sglang into feature/deepep_normal

iforgetmyname · 2025-09-04T02:18:37Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for DeepEP normal mode and redundant experts on Ascend NPUs. The changes are well-organized and align with the goal of extending hardware support. Key modifications include making parts of the codebase device-agnostic by replacing CUDA-specific calls with generic device-aware functions, and refactoring the NPU forward pass to handle both normal and low-latency DeepEP modes. The removal of AscendDeepEPLLOutput in favor of the standard DeepEPLLOutput is a good step towards unifying the implementation. Additionally, new tests for Ascend DeepEP have been added, which is great for ensuring correctness.

I have a couple of suggestions to improve maintainability and robustness:

Refactoring the forward_npu method in python/sglang/srt/layers/moe/ep_moe/layer.py to reduce code duplication.
Improving the robustness of the CI script scripts/ci/npu_ci_install_dependency.sh for locating the site-packages directory.

python/sglang/srt/layers/moe/ep_moe/layer.py

gemini-code-assist · 2025-09-04T02:21:27Z

scripts/ci/npu_ci_install_dependency.sh

+### Install sgl-kernel-npu
+SGL_KERNEL_NPU_TAG="20250901"
+git clone --depth 1 https://github.com/sgl-project/sgl-kernel-npu.git --branch ${SGL_KERNEL_NPU_TAG}
+(cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep*.whl && cd "$(pip show deep-ep | grep -E '^Location:' | awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp*.so)


The command to find the site-packages directory using pip show | grep | awk is a bit fragile and might break if the output format of pip show changes in the future. A more robust approach would be to use Python's site module to get the site-packages path directly. This avoids parsing command-line tool output.

Suggested change

(cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep*.whl && cd "$(pip show deep-ep | grep -E '^Location:' | awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp*.so)

(cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep*.whl && cd "$(python3 -c 'import site; print(site.getsitepackages()[0])')" && ln -s deep_ep/deep_ep_cpp*.so)

ErvinXie · 2025-09-07T04:07:20Z

.github/workflows/release-docker-npu-nightly.yml

          push: ${{ github.repository == 'sgl-project/sglang' && github.event_name != 'pull_request' }}
          provenance: false
          build-args: |
+            SGLANG_KERNEL_NPU_TAG=20250901


why hard code here? is this because this npu version is not releasde?

no, you can see this link

this link instead, this is a releasing tag for sgl-kernel-npu repo

ErvinXie · 2025-09-07T05:07:08Z

test/srt/ascend/test_ascend_deepep.py

+)
+
+TEST_MODEL_MATRIX = {
+    "/root/.cache/modelscope/hub/models/vllm-ascend/DeepSeek-R1-0528-W8A8": {


this path is hard coded

Because we want to accelerate the CI tests through cache file

it's still a bug with modelscope, we still can't launch models downloaded with modelscope using namespace/model format

zhyncs · 2025-09-08T16:49:14Z

@fzyzcjy @hnyls2002 please help review this pr. thanks.

iforgetmyname added 5 commits September 1, 2025 22:55

[Feature] Support DeepEP normal (sgl-project#87)

cee9ce9

* support deepep normal * change forward_npu return * fix linting * bugfixes * linting happy * lingting happy

update sgl-kernel-npu tags

6ffb23c

add deepep testcase

01e9161

linting

9154354

Merge branch 'main' into feature/deepep_normal

f5276dc

iforgetmyname marked this pull request as ready for review September 2, 2025 07:36

Merge branch 'main' into feature/deepep_normal

9b7f26f

iforgetmyname requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners September 2, 2025 07:36

add deepep to pr_test

6a8f9ab

iforgetmyname marked this pull request as draft September 2, 2025 09:43

[Feature] Support EPLB (sgl-project#85)

e5e6468

* EPLB support * linting

iforgetmyname marked this pull request as ready for review September 2, 2025 13:53

iforgetmyname requested a review from fzyzcjy as a code owner September 2, 2025 13:53

Merge branch 'main' into feature/deepep_normal

68371f7

iforgetmyname changed the title ~~[Feature] Support DeepEP normal & EPLB~~ [Feature] Support DeepEP normal & Redundant Experts on NPU Sep 2, 2025

iforgetmyname added 3 commits September 3, 2025 09:47

fix ci runner

10eac59

Merge branch 'feature/deepep_normal' of https://github.com/iforgetmyn…

fdac935

…ame/sglang into feature/deepep_normal

fix weight issue

1d25da3

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

iforgetmyname added 3 commits September 4, 2025 11:48

fix env updates

f7646f9

Merge remote-tracking branch 'upstream/main' into feature/deepep_normal

d53b511

increase server timeout

74ff90e

ErvinXie approved these changes Sep 7, 2025

View reviewed changes

ErvinXie reviewed Sep 7, 2025

View reviewed changes

Alcanderian approved these changes Sep 7, 2025

View reviewed changes

ping1jing2 and others added 4 commits September 7, 2025 14:55

Merge branch 'main' into feature/deepep_normal

d7115ca

Merge branch 'main' into feature/deepep_normal

eeb75be

Merge remote-tracking branch 'upstream/main' into feature/deepep_normal

89908ab

test cases update

8a2072b

zhyncs assigned fzyzcjy Sep 8, 2025

zhyncs added the high priority label Sep 8, 2025

zhyncs assigned hnyls2002 Sep 8, 2025

ch-wan approved these changes Sep 10, 2025

View reviewed changes

Merge branch 'main' into feature/deepep_normal

f5970ef

zhyncs merged commit 5b64f00 into sgl-project:main Sep 11, 2025
191 of 212 checks passed

iforgetmyname deleted the feature/deepep_normal branch September 13, 2025 07:29

ping1jing2 self-assigned this Dec 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support DeepEP normal & Redundant Experts on NPU#9881

[Feature] Support DeepEP normal & Redundant Experts on NPU#9881
zhyncs merged 20 commits intosgl-project:mainfrom
iforgetmyname:feature/deepep_normal

iforgetmyname commented Sep 1, 2025 •

edited

Loading

Uh oh!

iforgetmyname commented Sep 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Sep 4, 2025

Uh oh!

ErvinXie Sep 7, 2025

Uh oh!

ping1jing2 Sep 7, 2025

Uh oh!

iforgetmyname Sep 8, 2025

Uh oh!

ErvinXie Sep 7, 2025

Uh oh!

ping1jing2 Sep 7, 2025

Uh oh!

iforgetmyname Sep 8, 2025

Uh oh!

zhyncs commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

	(cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep.whl && cd "$(pip show deep-ep \| grep -E '^Location:' \| awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp.so)
	(cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep.whl && cd "$(python3 -c 'import site; print(site.getsitepackages()[0])')" && ln -s deep_ep/deep_ep_cpp.so)

Conversation

iforgetmyname commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

iforgetmyname commented Sep 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

ErvinXie Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

ping1jing2 Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

iforgetmyname Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ErvinXie Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

ping1jing2 Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

iforgetmyname Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

zhyncs commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

iforgetmyname commented Sep 1, 2025 •

edited

Loading