[NPU][1/N] NPU basic functions refactor and new modelslim quant type by iforgetmyname · Pull Request #13359 · sgl-project/sglang

iforgetmyname · 2025-11-16T02:47:01Z

Motivation

Due to the underlying structural difference between gpgpus and npus, we have introduced a lot of is_npu branches in current repository from previous commits. Though literarlly it helps the out-of-box experience for our end-users and matches our rapid development pace, this way of orignizing codes breaks readability and of cource maintainability of the whole sglang project. We believe this is not a long-term solution and a healthy and robust way of continously maintaining multi-hardware support, such that starting with this pr, we are trying to refactoring npu-related codes into a specific folder that hides hardware differences and only exposes simplified interfaces that can be called by different models.

Modifications

Tigger transfer_to_npu when engine is starting that mocks all supported torch.cuda calls to torch.npu
- https://github.com/sgl-project/sglang/pull/13359/files#diff-1f1b431bcaea00cda2a62efe5d5f702f8bc0445982c1ec0b44d31d025f8faf68R72
Introduce a helper function npu_format_cast that wraps data format casting on npu and condition checks for npu-only, this data format casting helps 5% performance improvement on npu
Refactor out npu-related TokenToKVPool, PagedTokenToKVPoolAllocator, top_k forward and ascend attention backend
Introduce a new quant type modelslim that supports npu-specific W8A8, W4A8-MoE, W4A16-MoE mixed quant methods

Accuracy Tests

CI should cover all.

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

sglang-bot · 2025-11-16T03:56:26Z

python/sglang/srt/_custom_ops.py

        logger.warning("Failed to import from custom_ar with %r", e)


-if not is_hip() and not is_npu():
+if not CUSTOM_ALLREDUCE_AVAILABLE:


help us clean the code

if is_cuda: ... elif is_hip: ... elif is_npu: ... else:

…KVPool

…4DynamicMoEMethod

iforgetmyname · 2025-12-02T11:49:53Z

/tag-and-rerun-ci

iforgetmyname · 2025-12-03T07:23:03Z

/tag-and-rerun-ci

…gl-project#13359)

github-actions bot added lora deepseek labels Nov 16, 2025

sglang-bot added the run-ci label Nov 16, 2025

sglang-bot approved these changes Nov 16, 2025

View reviewed changes

iforgetmyname mentioned this pull request Nov 25, 2025

[Ascend] qwen optimization #12078

Merged

4 tasks

iforgetmyname added 22 commits November 27, 2025 15:37

add set_default_server_args

b7208d4

add init_npu_backend

f1c806e

first remove of is_npu

3fac40f

NPUPagedTokenToKVPoolAllocator, NPUMHATokenToKVPool and NPUMLATokenTo…

4d416a1

…KVPool

fix missing import

f0e2a5a

second remove of is_npu

9de1ad3

refactor topk

db7675c

refactor ascend llm backend

b192c23

fix missing import

0e6e557

fix missing import

5bd1985

fix missing import

f043976

NPUW8A8LinearMethod & NPUW8A8DynamicLinearMethod

7d11eed

fix caller

03e2897

fix load warning and shape error

73cd2aa

fix warning msg typo

c572c17

renaming

472fad0

NPUW8A8Int8DynamicMoEMethod, NPUW4A8Int4DynamicMoEMethod, NPUW4A16Int…

e360b40

…4DynamicMoEMethod

fix import error

affcee9

fix import error

e72a47d

add modelslim

12edd98

refactor mla prepare&core

fccd5e6

fix import error

e01b705

iforgetmyname force-pushed the npu_refactor branch from e7d8c38 to a19ba11 Compare December 2, 2025 01:12

iforgetmyname marked this pull request as ready for review December 2, 2025 01:13

iforgetmyname requested a review from merrymercy as a code owner December 2, 2025 01:13

github-actions bot added the run-ci label Dec 2, 2025

change quant type and CODEOWNERS

bd9ec29

iforgetmyname requested a review from Kangyan-Zhou as a code owner December 2, 2025 13:28

fix typo

e434a34

iforgetmyname changed the title ~~[WIP] Ascend NPU refactor~~ [1/N] NPU basic functions refactor and new modelslim quant type Dec 2, 2025

iforgetmyname changed the title ~~[1/N] NPU basic functions refactor and new modelslim quant type~~ [NPU][1/N] NPU basic functions refactor and new modelslim quant type Dec 2, 2025

ping1jing2 self-assigned this Dec 2, 2025

iforgetmyname removed the run-ci label Dec 3, 2025

iforgetmyname added 3 commits December 3, 2025 12:44

fix deepseek_v2 lite accuracy

b66f31b

revert back topk.py change

6ae7e66

fix prefixcache start args

e4154c1

github-actions bot added the run-ci label Dec 3, 2025

iforgetmyname added 3 commits December 3, 2025 15:23

Merge branch 'main' into npu_refactor

84acb7f

fix unquantized mtp layer breaks mlapo

60a1482

Merge remote-tracking branch 'upstream/main' into npu_refactor

43f8286

hnyls2002 approved these changes Dec 4, 2025

View reviewed changes

iforgetmyname merged commit 894c0dc into sgl-project:main Dec 4, 2025
168 of 174 checks passed

iforgetmyname deleted the npu_refactor branch December 4, 2025 08:15

tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025

[NPU][1/N] NPU basic functions refactor and new modelslim quant type (s…

3d24ee9

…gl-project#13359)

OrangeRedeng mentioned this pull request Dec 4, 2025

[Feature] Ascend NPU quantization refactoring & more quantization formats support #14424

Open

30 tasks

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025

[NPU][1/N] NPU basic functions refactor and new modelslim quant type (s…

bc62a2c

…gl-project#13359)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[NPU][1/N] NPU basic functions refactor and new modelslim quant type (s…

4046652

…gl-project#13359)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[NPU][1/N] NPU basic functions refactor and new modelslim quant type (s…

81fbd62

…gl-project#13359)

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

[NPU][1/N] NPU basic functions refactor and new modelslim quant type (s…

02085a0

…gl-project#13359)

OrangeRedeng mentioned this pull request Dec 8, 2025

[NPU] NPU quantization refactoring & more quantization formats support #14504

Merged

17 tasks

Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025

[NPU][1/N] NPU basic functions refactor and new modelslim quant type (s…

fda79cf

…gl-project#13359)

thxCode mentioned this pull request Dec 16, 2025

Failed to deploy Qwen3-8B-W8A8 with SGLang 0.5.6.post2 gpustack/gpustack#3886

Closed

iforgetmyname mentioned this pull request Jan 23, 2026

[Roadmap] Ascend NPU Development (2026 Q1) #13664

Open

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU][1/N] NPU basic functions refactor and new modelslim quant type#13359

[NPU][1/N] NPU basic functions refactor and new modelslim quant type#13359
iforgetmyname merged 40 commits intosgl-project:mainfrom
iforgetmyname:npu_refactor

iforgetmyname commented Nov 16, 2025 •

edited

Loading

Uh oh!

sglang-bot Nov 16, 2025

Uh oh!

iforgetmyname commented Dec 2, 2025

Uh oh!

iforgetmyname commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

iforgetmyname commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

sglang-bot Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

iforgetmyname commented Dec 2, 2025

Uh oh!

iforgetmyname commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

iforgetmyname commented Nov 16, 2025 •

edited

Loading