Skip to content

Conversation

@stmatengss
Copy link
Collaborator

Motivation

Support user-defined per-TP IB devicecs config.

"device_name": "{0: "ib0,ib1", 1: "ib2", 2: "ib3,ib4"}"

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 12, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @stmatengss, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Mooncake storage backend by providing a more flexible and granular way to configure InfiniBand devices, especially in tensor parallel deployments. It allows users to explicitly define which IB devices should be used by each tensor parallel rank through a JSON configuration, improving resource management and potentially optimizing data transfer performance in distributed environments. The changes include both the implementation of the parsing logic and comprehensive documentation for the new feature.

Highlights

  • Per-TP IB Device Configuration: Introduced support for user-defined per-Tensor Parallel (TP) InfiniBand (IB) device configurations, allowing specific IB devices to be mapped to individual TP ranks using a JSON format.
  • Dynamic Device Name Resolution: Implemented logic in mooncake_store.py to parse the JSON device_name configuration and dynamically select the appropriate IB device(s) based on the current tp_rank.
  • Documentation Update: Updated the README.md to include detailed instructions and examples for configuring device_name using the new JSON format, both directly and via environment variables.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for user-defined, per-tensor-parallel-rank InfiniBand device configurations for the Mooncake storage backend. This is achieved by allowing the device_name parameter to be a JSON string that maps ranks to device names. The implementation correctly parses this configuration to select the appropriate device for the current rank, and the documentation has been updated accordingly.

My review includes a correction for an invalid JSON example in the README and a suggestion to simplify the device lookup logic in the Python code to remove a redundant check. Overall, the changes are well-implemented and introduce a valuable feature.

- For tensor parallel deployments where different ranks should use different devices, you can specify device configurations using JSON format:
```json
{
"device_name": "{0: \"ib0,ib1\", 1: \"ib2,ib3\", 2: \"ib4,ib5\"}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The JSON example for device_name is invalid. The value for device_name should be a string containing valid JSON, which requires keys to be double-quoted strings. The current example is a string representation of a Python dictionary and will fail to parse with json.loads.

For consistency with the environment variable example, please update the JSON example.

Suggested change
"device_name": "{0: \"ib0,ib1\", 1: \"ib2,ib3\", 2: \"ib4,ib5\"}"
"device_name": "{\"0\": \"ib0,ib1\", \"1\": \"ib2,ib3\", \"2\": \"ib4,ib5\"}"

Comment on lines +201 to +203
device_name = device_config.get(tp_rank, "")
if not device_name:
device_name = device_config.get(str(tp_rank), "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The json.loads function always produces a dictionary with string keys. Therefore, looking up with an integer tp_rank will always fail. You can simplify this logic to only use str(tp_rank) for the lookup.

Additionally, the comment on line 200 is misleading and should be updated or removed.

Suggested change
device_name = device_config.get(tp_rank, "")
if not device_name:
device_name = device_config.get(str(tp_rank), "")
device_name = device_config.get(str(tp_rank), "")

@stmatengss stmatengss added the hicache Hierarchical Caching for SGLang label Nov 12, 2025
@huangtingwei9988
Copy link
Collaborator

cc @ykwd

@ykwd
Copy link
Contributor

ykwd commented Nov 13, 2025

cc @ykwd

Thanks

@stmatengss stmatengss added the ready-to-merge The PR is ready to merge after the CI is green. label Nov 13, 2025
@ShangmingCai ShangmingCai merged commit 290fcd8 into sgl-project:main Nov 17, 2025
307 of 360 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation hicache Hierarchical Caching for SGLang ready-to-merge The PR is ready to merge after the CI is green. run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants