[Ascend] torch_npu.npu_mrope for MRotaryEmbedding #10907

Makcum888e · 2025-09-25T09:35:13Z

Motivation

use of torch_npu implementation of mrope instead of torch native implementation with many small operations can give a significant increase of performance

Modifications

forward renamed to forward_native
overridden functions forward_cpu and forward_cuda to not use it from base class
overridden function forward_npu with torch_npu implementation

Accuracy Tests

tested on Qwen3-VL-4B-Instruct

GSM8k:
before
Accuracy: 0.729
Invalid: 0.122
Latency: 216.923 s
Output throughput: 961.137 token/s

after
Accuracy: 0.728
Invalid: 0.124
Latency: 178.438 s
Output throughput: 1155.518 token/s

MMMU:
before
Benchmark time: 221.995
Overall accuracy: 0.499

after:
Benchmark time: 208.065
Overall accuracy: 0.496

Benchmarking and Profiling

bench_serving num prompts 64 random input len 2048 random output len 2048
before
Mean TTFT (ms): 3667.45
Mean ITL (ms): 49.65

after
Mean TTFT (ms): 3494.33
Mean ITL (ms): 43.61

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-09-25T09:35:25Z

Summary of Changes

Hello @Makcum888e, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces an optimization for the MRotaryEmbedding component by integrating the torch_npu implementation of mrope. This change aims to leverage specialized hardware acceleration on Ascend NPUs, leading to improved computational efficiency and reduced latency for AI workloads. The core logic has been refactored to accommodate this NPU-specific path while maintaining existing CPU and CUDA functionalities.

Highlights

Performance Optimization: The primary goal of this PR is to significantly improve the performance of MRotaryEmbedding by utilizing the torch_npu implementation of mrope, which is optimized for Ascend NPUs.
Code Refactoring: The existing forward method has been renamed to forward_native. New methods, forward_cpu and forward_cuda, have been introduced to explicitly call forward_native, ensuring compatibility while allowing forward_npu to be used from the base class for NPU-specific execution.
Accuracy and Performance Benchmarks: Accuracy tests on GSM8k and MMMU show minimal changes in accuracy (0.729 -> 0.728 for GSM8k, 0.499 -> 0.496 for MMMU), while demonstrating notable performance gains. GSM8k latency improved from 216.923s to 178.438s, and output throughput increased from 961.137 token/s to 1155.518 token/s. MMMU benchmark time decreased from 221.995s to 208.065s.
Serving Benchmarks: Benchmarking with bench_serving (64 prompts, random input/output length 2048) shows improvements in Mean TTFT (3667.45ms -> 3494.33ms) and Mean ITL (49.65ms -> 43.61ms).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to optimize MRotaryEmbedding for NPU by using torch_npu.npu_mrope. The changes involve renaming forward to forward_native and adding device-specific forward_cpu and forward_cuda methods. However, I've found some critical issues. The new forward_cpu and forward_cuda methods have incorrect signatures, which will lead to runtime errors. More importantly, the NPU implementation for MRotaryEmbedding seems to be missing. It inherits forward_npu from the base class, which is incorrect for MRotaryEmbedding and will not work as intended. I've provided suggestions to fix these issues.

python/sglang/srt/layers/rotary_embedding.py

ssshinigami

LGTM

ssshinigami

LGTM

python/sglang/srt/layers/rotary_embedding.py

Makcum888e added 2 commits September 25, 2025 09:15

torch_npu mrope for MRotaryEvbedding

ff2dd4b

use implementation from base class

f890bae

Makcum888e requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners September 25, 2025 09:35

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

python/sglang/srt/layers/rotary_embedding.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/rotary_embedding.py Outdated Show resolved Hide resolved

Makcum888e added 3 commits September 25, 2025 12:50

fix comments

948519c

make it more readable

3663494

Merge remote-tracking branch 'base/main' into mrope

526a623

ssshinigami approved these changes Oct 28, 2025

View reviewed changes

ssshinigami approved these changes Oct 30, 2025

View reviewed changes

ping1jing2 reviewed Nov 2, 2025

View reviewed changes

python/sglang/srt/layers/rotary_embedding.py Show resolved Hide resolved

merrymercy reviewed Nov 2, 2025

View reviewed changes

python/sglang/srt/layers/rotary_embedding.py Outdated Show resolved Hide resolved

Makcum888e and others added 4 commits November 5, 2025 09:59

move if

6260ca4

comment to magic number

d0283b5

Merge branch 'sgl-project:main' into mrope

39f5bac

row return

ccf0e2f

ping1jing2 reviewed Nov 5, 2025

View reviewed changes

python/sglang/srt/layers/rotary_embedding.py Show resolved Hide resolved

Merge branch 'main' into mrope

851b81a

ping1jing2 requested a review from Fridge003 as a code owner November 11, 2025 05:57

ping1jing2 added the run-ci label Nov 11, 2025

Merge branch 'main' into mrope

1bac4a2

ssshinigami approved these changes Nov 12, 2025

View reviewed changes

hnyls2002 approved these changes Nov 12, 2025

View reviewed changes

hnyls2002 merged commit c2e56da into sgl-project:main Nov 12, 2025
71 of 84 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Ascend] torch_npu.npu_mrope for MRotaryEmbedding #10907

[Ascend] torch_npu.npu_mrope for MRotaryEmbedding #10907

Uh oh!

Makcum888e commented Sep 25, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Sep 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ssshinigami left a comment

Uh oh!

ssshinigami left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Ascend] torch_npu.npu_mrope for MRotaryEmbedding #10907

[Ascend] torch_npu.npu_mrope for MRotaryEmbedding #10907

Uh oh!

Conversation

Makcum888e commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Sep 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ssshinigami left a comment

Choose a reason for hiding this comment

Uh oh!

ssshinigami left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Makcum888e commented Sep 25, 2025 •

edited

Loading