fix: add server args check for moe_dense_tp_size#6663
fix: add server args check for moe_dense_tp_size#6663UnlceYang wants to merge 1 commit intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Hello @UnlceYang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, Gemini here with a summary of this pull request. This PR, authored by UnlceYang, aims to address a potential issue in the server configuration related to Mixture-of-Experts (MoE) models. Specifically, it fixes a bug where setting moe_dense_tp_size to 1 without also enabling enable_dp_lm_head could lead to invalid memory access, particularly in large batch request scenarios, even if the server starts successfully. The proposed solution is to automatically enable enable_dp_lm_head whenever moe_dense_tp_size is set to 1, preventing this configuration error and the subsequent runtime issue.
Highlights
- Server Configuration Safety: Improves the robustness of server argument parsing by automatically enforcing a required configuration (
enable_dp_lm_head=True) when a specific setting (moe_dense_tp_size=1) is used.
Changelog
- python/sglang/srt/server_args.py
- Added a check within the
__post_init__method of the server arguments class. - If
self.moe_dense_tp_sizeis found to be 1,self.enable_dp_lm_headis automatically set toTrue. - A warning message is logged to inform the user that
enable_dp_lm_headwas automatically enabled due to themoe_dense_tp_sizesetting.
- Added a check within the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Config files can hide,
A bug that makes the server chide.
Check the size is one,
Let DP run,
Memory access now won't slide.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request addresses a potential invalid memory access issue when moe_dense_tp_size is 1 by automatically enabling enable_dp_lm_head. This is a good defensive measure. The core logic is sound and effectively prevents the original problem by either configuring correctly or failing fast with an assertion if related flags are inconsistent.
I have a couple of suggestions to improve clarity and robustness:
- Enhance the warning message to be more informative about why the flag is being set and its dependency on
enable_dp_attention. - Add unit tests for this new conditional logic in
ServerArgs.
Summary of Findings
- Clarity of Warning Message: The warning message issued when
enable_dp_lm_headis automatically enabled could be more explicit about the reason for the change, the fact that it might override user settings, and its crucial dependency onenable_dp_attention. - Unit Testing: The new conditional logic in
ServerArgs.__post_init__that automatically setsenable_dp_lm_headbased onmoe_dense_tp_sizewould benefit from dedicated unit tests to cover various input combinations and ensure the intended behavior, including interactions withenable_dp_attention. - Documentation Update (Low Severity - Not Commented): Consider updating the CLI help strings or other relevant documentation for
moe_dense_tp_sizeand/orenable_dp_lm_headto inform users about this new automatic behavior. This was not added as a direct review comment due to its low severity and the current review settings.
Merge Readiness
The pull request effectively addresses the reported issue. However, to improve clarity for users and ensure long-term maintainability, I recommend addressing the suggested improvements to the warning message and adding unit tests for the new logic. Therefore, I am requesting changes before this PR is merged. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from team members.
|
@UnlceYang Thank you for your contribution. This design is on purpose. We need to allow our users to control how to parallelize lm heads. See discussion here. |
…//issues/6644)
Motivation
When moe-dense-tp-size = 1, sometimes forget to turn enable-dp-lm-head on. This will cause a bad invalid memory access in large batch req senario though sglang server launchs successfully.
Modifications
when moe-dense-tp-size set to 1, enable enable-dp-lm-head option by default to avoid invalid memory access.
Checklist