Skip to content

Fix dpsk-r1-fp4 launching crash#12063

Merged
hnyls2002 merged 1 commit intosgl-project:mainfrom
Qiaolin-Yu:fix_moe
Oct 24, 2025
Merged

Fix dpsk-r1-fp4 launching crash#12063
hnyls2002 merged 1 commit intosgl-project:mainfrom
Qiaolin-Yu:fix_moe

Conversation

@Qiaolin-Yu
Copy link
Collaborator

Motivation

Closes #12059, which is introduced by #11795

This pr partly reverted the changes by #11795 for a quick fix. Maybe refine this in the future.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Qiaolin-Yu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical launching crash identified in dpsk-r1-fp4 configurations. The fix involves streamlining the selection mechanism for the Mixture of Experts (MoE) runner backend by removing an automatic detection step. Instead, the backend is now explicitly chosen based on a configuration flag, effectively stabilizing the system and preventing the crash.

Highlights

  • Crash Fix: This pull request resolves a launching crash specifically affecting dpsk-r1-fp4 models, which was introduced by a previous change (PR Refactor Triton-kernel MoE runner integration  #11795).
  • MoE Runner Backend Selection: The dynamic auto-detection logic for the Mixture of Experts (MoE) runner backend has been removed. The backend is now directly assigned to either TRITON_KERNELS or TRITON based on the self.use_triton_kernels flag.
  • Partial Revert: To provide a quick fix for the crash, this PR partially reverts some changes that were part of PR Refactor Triton-kernel MoE runner integration  #11795, simplifying the backend initialization process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@JustinTong0323 JustinTong0323 added the express-lane A PR may be merged without a full CI check label Oct 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a crash in dpsk-r1-fp4 launching by partially reverting a previous change. The modification hardcodes the MoE runner backend selection in Mxfp4MoEMethod and UnquantizedFusedMoEMethod to either TRITON or TRITON_KERNELS. This prevents a NotImplementedError when an unsupported backend is configured.

While this is an effective quick fix as intended, I've suggested adding comments to the code to document this temporary solution and link to the relevant issue. This will help future developers understand the context and the need for a more robust refactoring.

Comment on lines +590 to +594
backend = (
MoeRunnerBackend.TRITON_KERNELS
if self.use_triton_kernels
else MoeRunnerBackend.TRITON
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This hardcodes the backend selection to avoid a crash with unsupported backends. As noted in the PR description, this is a temporary fix. It would be beneficial to add a comment here to inform future developers about this, and that it should be refactored for more flexible backend selection.

Suggested change
backend = (
MoeRunnerBackend.TRITON_KERNELS
if self.use_triton_kernels
else MoeRunnerBackend.TRITON
)
# TODO(https://github.com/sgl-project/sglang/issues/12059): This is a temporary fix to avoid a crash with unsupported backends.
# This should be refactored for more flexible backend selection in the future.
backend = (
MoeRunnerBackend.TRITON_KERNELS
if self.use_triton_kernels
else MoeRunnerBackend.TRITON
)

Comment on lines +223 to +227
backend = (
MoeRunnerBackend.TRITON_KERNELS
if self.use_triton_kernels
else MoeRunnerBackend.TRITON
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This hardcodes the backend selection to avoid a crash with unsupported backends. As noted in the PR description, this is a temporary fix. It would be beneficial to add a comment here to inform future developers about this, and that it should be refactored for more flexible backend selection.

Suggested change
backend = (
MoeRunnerBackend.TRITON_KERNELS
if self.use_triton_kernels
else MoeRunnerBackend.TRITON
)
# TODO(https://github.com/sgl-project/sglang/issues/12059): This is a temporary fix to avoid a crash with unsupported backends.
# This should be refactored for more flexible backend selection in the future.
backend = (
MoeRunnerBackend.TRITON_KERNELS
if self.use_triton_kernels
else MoeRunnerBackend.TRITON
)

@FlamingoPg
Copy link
Collaborator

Looks good!

@hnyls2002 hnyls2002 merged commit 71d4121 into sgl-project:main Oct 24, 2025
19 of 159 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

express-lane A PR may be merged without a full CI check high priority run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] DeepSeek FP4 Launching fail

8 participants

Comments