Skip to content

fix: add server args check for moe_dense_tp_size#6663

Open
UnlceYang wants to merge 1 commit intosgl-project:mainfrom
UnlceYang:yzd/fix-6644
Open

fix: add server args check for moe_dense_tp_size#6663
UnlceYang wants to merge 1 commit intosgl-project:mainfrom
UnlceYang:yzd/fix-6644

Conversation

@UnlceYang
Copy link

@UnlceYang UnlceYang commented May 27, 2025

…//issues/6644)

Motivation

When moe-dense-tp-size = 1, sometimes forget to turn enable-dp-lm-head on. This will cause a bad invalid memory access in large batch req senario though sglang server launchs successfully.

Modifications

when moe-dense-tp-size set to 1, enable enable-dp-lm-head option by default to avoid invalid memory access.

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @UnlceYang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request. This PR, authored by UnlceYang, aims to address a potential issue in the server configuration related to Mixture-of-Experts (MoE) models. Specifically, it fixes a bug where setting moe_dense_tp_size to 1 without also enabling enable_dp_lm_head could lead to invalid memory access, particularly in large batch request scenarios, even if the server starts successfully. The proposed solution is to automatically enable enable_dp_lm_head whenever moe_dense_tp_size is set to 1, preventing this configuration error and the subsequent runtime issue.

Highlights

  • Server Configuration Safety: Improves the robustness of server argument parsing by automatically enforcing a required configuration (enable_dp_lm_head=True) when a specific setting (moe_dense_tp_size=1) is used.

Changelog

  • python/sglang/srt/server_args.py
    • Added a check within the __post_init__ method of the server arguments class.
    • If self.moe_dense_tp_size is found to be 1, self.enable_dp_lm_head is automatically set to True.
    • A warning message is logged to inform the user that enable_dp_lm_head was automatically enabled due to the moe_dense_tp_size setting.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Config files can hide,
A bug that makes the server chide.
Check the size is one,
Let DP run,
Memory access now won't slide.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential invalid memory access issue when moe_dense_tp_size is 1 by automatically enabling enable_dp_lm_head. This is a good defensive measure. The core logic is sound and effectively prevents the original problem by either configuring correctly or failing fast with an assertion if related flags are inconsistent.

I have a couple of suggestions to improve clarity and robustness:

  1. Enhance the warning message to be more informative about why the flag is being set and its dependency on enable_dp_attention.
  2. Add unit tests for this new conditional logic in ServerArgs.

Summary of Findings

  • Clarity of Warning Message: The warning message issued when enable_dp_lm_head is automatically enabled could be more explicit about the reason for the change, the fact that it might override user settings, and its crucial dependency on enable_dp_attention.
  • Unit Testing: The new conditional logic in ServerArgs.__post_init__ that automatically sets enable_dp_lm_head based on moe_dense_tp_size would benefit from dedicated unit tests to cover various input combinations and ensure the intended behavior, including interactions with enable_dp_attention.
  • Documentation Update (Low Severity - Not Commented): Consider updating the CLI help strings or other relevant documentation for moe_dense_tp_size and/or enable_dp_lm_head to inform users about this new automatic behavior. This was not added as a direct review comment due to its low severity and the current review settings.

Merge Readiness

The pull request effectively addresses the reported issue. However, to improve clarity for users and ensure long-term maintainability, I recommend addressing the suggested improvements to the warning message and adding unit tests for the new logic. Therefore, I am requesting changes before this PR is merged. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from team members.

@UnlceYang UnlceYang changed the title fix: add server args check for moe_dense_tp_size (#6644) fix: add server args check for moe_dense_tp_size May 27, 2025
@ch-wan
Copy link
Collaborator

ch-wan commented May 27, 2025

@UnlceYang Thank you for your contribution. This design is on purpose. We need to allow our users to control how to parallelize lm heads. See discussion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants