[feat] npu support enable_torch_compile by XDaoHong · Pull Request #12371 · sgl-project/sglang

XDaoHong · 2025-10-30T04:01:24Z

Motivation

TorchAir (Torch Ascend Intermediate Representation) is an extension library that provides graph mode capabilities for torch_npu. It enables users to perform graph-mode inference on NPU using PyTorch and torch_npu. TorchAir externally offers a torch.compile backend for NPU, which interfaces with torch._dynamo. Through the following features, performance optimization and capability enhancement of the torch fx graph can be achieved.

Main Features:

Basic Features:

Enable NPU kernels that depend on host-value tiling operators (e.g., FIA) to support npugraph
Graph input copy optimization
Memory reuse across multi-graphs

FX Pass:

In-place optimization
Redundant operator elimination
NPU fused operator passes

Advanced Features:

Static shape kernel compilation
Multi-stream within single graphs
Compilation caching

Modifications

Rewrite the capture function;
Encapsulate the kvcache input (input needs all kvcache);
Pad the block table to the max length;
TorchAir input preparation;

The calling process is as follows.

Accuracy Tests

python3 few_shot_gsm8k.py --data-path "/path/to/model/test.jsonl.txt” --parallel 32 --num-questions 200

Accuracy: 0.865
Invalid: 0.000
Latency: 43.077 s
Output throughput: 795.877 token/s

Benchmarking and Profiling

Future roadmaps

In the torch_npu 7.2.0 version, the reduce-overhead mode of the torchair backend will support torch.compile(model, dynamic=True). This mode will be set as the default in get_compile_backend(), enabling support for methods wrapped by the @torch.compile() decorator.
In the torch_npu 7.3.0 version, the capture and replay of NPUGraph currently integrated in the torchair backend will be changed to optional execution. The torchair backend will only perform optimizations such as fx pass optimization and static kernel compilation, while the capture and replay of NPUGraph will be implemented independently. This design is closer to the implementation of CudaGraphRunner, decoupling fx graph optimization from graph offloading.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-30T04:01:27Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ssshinigami · 2025-10-31T08:46:40Z

Could you please add description and motivation
And accuracy and performance measurements

iforgetmyname · 2025-11-05T12:39:20Z

python/sglang/srt/utils/common.py

-        for k, v in predefined_config.items():
-            setattr(compiler_config.experimental_config, k, v)
-
+        compiler_config.mode = "max-autotune" if mode is None else mode


please add a comment "TODO(iforgetmyname): Change this default value once CANN version 8.3.RC1" to help me remember to change this default value to reduce-overhead

ZhengdQin · 2025-11-08T09:34:53Z

python/sglang/srt/layers/communicator.py

-                    not get_moe_a2a_backend().is_none()
-                    or should_use_flashinfer_cutlass_moe_fp4_allgather()
+                    (
+                        not get_moe_a2a_backend().is_none()


revert this part

eshoguli

General comment about NPUGraphRunner updates. NPUGraphRunner is inherited from CudaGraphRunner and reuse capturing (ForwardBatch instantiation, initialization and capturing functionality) and partially replay pipeline from inherited type. NPUGraph python type is used.

Your changes don't use anything from that. You have custom implementation of capturing and don't use NPUGraph python type for inference. As result, you need to implement separate runner: TorchAirRunner or something like that.

Can you, please, explain offline/online: why you need to use NpuGraphRunner? thanks!

…E_FIA Co-authored-by: ZhengdQin <zhengdqin@gmail.com>

XDaoHong requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, hnyls2002, ispobock, kushanam, merrymercy, ping1jing2, xiezhq-hermann and zhyncs as code owners October 30, 2025 04:01

ping1jing2 marked this pull request as draft October 30, 2025 09:39

XDaoHong force-pushed the main branch from 326c3fc to ed4571a Compare October 31, 2025 03:35

XDaoHong force-pushed the main branch 4 times, most recently from 0069c10 to cd25a86 Compare November 3, 2025 02:06

iforgetmyname marked this pull request as ready for review November 5, 2025 03:12

iforgetmyname added the run-ci label Nov 5, 2025

iforgetmyname marked this pull request as draft November 5, 2025 06:47

iforgetmyname reviewed Nov 5, 2025

View reviewed changes

XDaoHong force-pushed the main branch 2 times, most recently from e14ec76 to 9f46b8a Compare November 7, 2025 04:07

ZhengdQin reviewed Nov 8, 2025

View reviewed changes

XDaoHong force-pushed the main branch from 9f46b8a to 34a8f39 Compare November 8, 2025 09:46

eshoguli suggested changes Nov 14, 2025

View reviewed changes

eshoguli mentioned this pull request Nov 20, 2025

[NPU]ACLGraph Compilation support and PassManager with AddRmsNorm & Quantize fuse. TorchAir compiler backend support. #11104

Open

4 tasks

ping1jing2 self-assigned this Dec 2, 2025

eshoguli mentioned this pull request Dec 17, 2025

[NPU] Piecewise Graph for decode with PassManager & fuses #15332

Draft

4 tasks

[Bugfix] fix npu get kv_item_lens in PD separation when use ASCEND_US…

1f3e1e1

…E_FIA Co-authored-by: ZhengdQin <zhengdqin@gmail.com>

XDaoHong force-pushed the main branch from 34a8f39 to 1f3e1e1 Compare December 26, 2025 01:44

github-actions bot added the npu label Dec 26, 2025

XDaoHong closed this Dec 26, 2025

iforgetmyname mentioned this pull request Jan 23, 2026

[Roadmap] Ascend NPU Development (2026 Q1) #13664

Open

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] npu support enable_torch_compile#12371

[feat] npu support enable_torch_compile#12371
XDaoHong wants to merge 1 commit intosgl-project:mainfrom
XDaoHong:main

XDaoHong commented Oct 30, 2025 •

edited by ZhengdQin

Loading

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Uh oh!

ssshinigami commented Oct 31, 2025

Uh oh!

iforgetmyname Nov 5, 2025

Uh oh!

XDaoHong Nov 7, 2025

Uh oh!

ZhengdQin Nov 8, 2025

Uh oh!

eshoguli left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

XDaoHong commented Oct 30, 2025 • edited by ZhengdQin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Future roadmaps

Checklist

Uh oh!

gemini-code-assist bot commented Oct 30, 2025

Uh oh!

ssshinigami commented Oct 31, 2025

Uh oh!

iforgetmyname Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

XDaoHong Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ZhengdQin Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

eshoguli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

XDaoHong commented Oct 30, 2025 •

edited by ZhengdQin

Loading

eshoguli left a comment •

edited

Loading