[Feature] Support DeepEP normal & Redundant Experts on NPU#9881
[Feature] Support DeepEP normal & Redundant Experts on NPU#9881zhyncs merged 20 commits intosgl-project:mainfrom
Conversation
* support deepep normal * change forward_npu return * fix linting * bugfixes * linting happy * lingting happy
* EPLB support * linting
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for DeepEP normal mode and redundant experts on Ascend NPUs. The changes are well-organized and align with the goal of extending hardware support. Key modifications include making parts of the codebase device-agnostic by replacing CUDA-specific calls with generic device-aware functions, and refactoring the NPU forward pass to handle both normal and low-latency DeepEP modes. The removal of AscendDeepEPLLOutput in favor of the standard DeepEPLLOutput is a good step towards unifying the implementation. Additionally, new tests for Ascend DeepEP have been added, which is great for ensuring correctness.
I have a couple of suggestions to improve maintainability and robustness:
- Refactoring the
forward_npumethod inpython/sglang/srt/layers/moe/ep_moe/layer.pyto reduce code duplication. - Improving the robustness of the CI script
scripts/ci/npu_ci_install_dependency.shfor locating the site-packages directory.
| ### Install sgl-kernel-npu | ||
| SGL_KERNEL_NPU_TAG="20250901" | ||
| git clone --depth 1 https://github.com/sgl-project/sgl-kernel-npu.git --branch ${SGL_KERNEL_NPU_TAG} | ||
| (cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep*.whl && cd "$(pip show deep-ep | grep -E '^Location:' | awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp*.so) |
There was a problem hiding this comment.
The command to find the site-packages directory using pip show | grep | awk is a bit fragile and might break if the output format of pip show changes in the future. A more robust approach would be to use Python's site module to get the site-packages path directly. This avoids parsing command-line tool output.
| (cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep*.whl && cd "$(pip show deep-ep | grep -E '^Location:' | awk '{print $2}')" && ln -s deep_ep/deep_ep_cpp*.so) | |
| (cd sgl-kernel-npu && bash ./build.sh -a deepep && pip install output/deep_ep*.whl && cd "$(python3 -c 'import site; print(site.getsitepackages()[0])')" && ln -s deep_ep/deep_ep_cpp*.so) |
| push: ${{ github.repository == 'sgl-project/sglang' && github.event_name != 'pull_request' }} | ||
| provenance: false | ||
| build-args: | | ||
| SGLANG_KERNEL_NPU_TAG=20250901 |
There was a problem hiding this comment.
why hard code here? is this because this npu version is not releasde?
There was a problem hiding this comment.
this link instead, this is a releasing tag for sgl-kernel-npu repo
| ) | ||
|
|
||
| TEST_MODEL_MATRIX = { | ||
| "/root/.cache/modelscope/hub/models/vllm-ascend/DeepSeek-R1-0528-W8A8": { |
There was a problem hiding this comment.
Because we want to accelerate the CI tests through cache file
There was a problem hiding this comment.
it's still a bug with modelscope, we still can't launch models downloaded with modelscope using namespace/model format
|
@fzyzcjy @hnyls2002 please help review this pr. thanks. |
Motivation
This PR adds support on Ascend NPU for:
Along with previously merged #8355, we are now allowing both prefill and decode to run with expert parallelism on Altas 800I A3. This also means running large-scale moe models without PD disaggregation is also possible if HBM capacity allows.
Checkout our roadmap here about DeepEP-Ascend
Modifications
input_seq_lensmaller thantp_sizeAscendDeepEPLLOutputdue to DeepEP-Ascend now aligns output variables with DeepSeek's DeepEPAccuracy Tests
Benchmarking and Profiling
Checklist