fix: add server args check for moe_dense_tp_size by UnlceYang · Pull Request #6663 · sgl-project/sglang

UnlceYang · 2025-05-27T09:13:14Z

Motivation

When moe-dense-tp-size = 1, sometimes forget to turn enable-dp-lm-head on. This will cause a bad invalid memory access in large batch req senario though sglang server launchs successfully.

Modifications

when moe-dense-tp-size set to 1, enable enable-dp-lm-head option by default to avoid invalid memory access.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Hello @UnlceYang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request. This PR, authored by UnlceYang, aims to address a potential issue in the server configuration related to Mixture-of-Experts (MoE) models. Specifically, it fixes a bug where setting moe_dense_tp_size to 1 without also enabling enable_dp_lm_head could lead to invalid memory access, particularly in large batch request scenarios, even if the server starts successfully. The proposed solution is to automatically enable enable_dp_lm_head whenever moe_dense_tp_size is set to 1, preventing this configuration error and the subsequent runtime issue.

Highlights

Server Configuration Safety: Improves the robustness of server argument parsing by automatically enforcing a required configuration (enable_dp_lm_head=True) when a specific setting (moe_dense_tp_size=1) is used.

Changelog

python/sglang/srt/server_args.py
- Added a check within the __post_init__ method of the server arguments class.
- If self.moe_dense_tp_size is found to be 1, self.enable_dp_lm_head is automatically set to True.
- A warning message is logged to inform the user that enable_dp_lm_head was automatically enabled due to the moe_dense_tp_size setting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Config files can hide,
A bug that makes the server chide.
Check the size is one,
Let DP run,
Memory access now won't slide.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a potential invalid memory access issue when moe_dense_tp_size is 1 by automatically enabling enable_dp_lm_head. This is a good defensive measure. The core logic is sound and effectively prevents the original problem by either configuring correctly or failing fast with an assertion if related flags are inconsistent.

I have a couple of suggestions to improve clarity and robustness:

Enhance the warning message to be more informative about why the flag is being set and its dependency on enable_dp_attention.
Add unit tests for this new conditional logic in ServerArgs.

Summary of Findings

Clarity of Warning Message: The warning message issued when enable_dp_lm_head is automatically enabled could be more explicit about the reason for the change, the fact that it might override user settings, and its crucial dependency on enable_dp_attention.
Unit Testing: The new conditional logic in ServerArgs.__post_init__ that automatically sets enable_dp_lm_head based on moe_dense_tp_size would benefit from dedicated unit tests to cover various input combinations and ensure the intended behavior, including interactions with enable_dp_attention.
Documentation Update (Low Severity - Not Commented): Consider updating the CLI help strings or other relevant documentation for moe_dense_tp_size and/or enable_dp_lm_head to inform users about this new automatic behavior. This was not added as a direct review comment due to its low severity and the current review settings.

Merge Readiness

The pull request effectively addresses the reported issue. However, to improve clarity for users and ensure long-term maintainability, I recommend addressing the suggested improvements to the warning message and adding unit tests for the new logic. Therefore, I am requesting changes before this PR is merged. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from team members.

python/sglang/srt/server_args.py

ch-wan · 2025-05-27T21:28:59Z

@UnlceYang Thank you for your contribution. This design is on purpose. We need to allow our users to control how to parallelize lm heads. See discussion here.

fix: add server args check for moe_dense_tp_size (#sgl-project#6644)

301786e

UnlceYang requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners May 27, 2025 09:13

gemini-code-assist bot reviewed May 27, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 27, 2025

View reviewed changes

python/sglang/srt/server_args.py Show resolved Hide resolved

python/sglang/srt/server_args.py Show resolved Hide resolved

UnlceYang mentioned this pull request May 27, 2025

[Bug] invalid memory access when using EPLB in multi batch scenario. #6644

Closed

5 tasks

UnlceYang changed the title ~~fix: add server args check for moe_dense_tp_size (#6644)~~ fix: add server args check for moe_dense_tp_size May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add server args check for moe_dense_tp_size#6663

fix: add server args check for moe_dense_tp_size#6663
UnlceYang wants to merge 1 commit intosgl-project:mainfrom
UnlceYang:yzd/fix-6644

UnlceYang commented May 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ch-wan commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

UnlceYang commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

ch-wan commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UnlceYang commented May 27, 2025 •

edited

Loading