[HiCache] add GPU id to IB dev topo for mooncake storage backend #13112

stmatengss · 2025-11-12T03:13:33Z

Motivation

Support user-defined per-TP IB devicecs config.

"device_name": "{0: "ib0,ib1", 1: "ib2", 2: "ib3,ib4"}"

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-11-12T03:13:46Z

Summary of Changes

Hello @stmatengss, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Mooncake storage backend by providing a more flexible and granular way to configure InfiniBand devices, especially in tensor parallel deployments. It allows users to explicitly define which IB devices should be used by each tensor parallel rank through a JSON configuration, improving resource management and potentially optimizing data transfer performance in distributed environments. The changes include both the implementation of the parsing logic and comprehensive documentation for the new feature.

Highlights

Per-TP IB Device Configuration: Introduced support for user-defined per-Tensor Parallel (TP) InfiniBand (IB) device configurations, allowing specific IB devices to be mapped to individual TP ranks using a JSON format.
Dynamic Device Name Resolution: Implemented logic in mooncake_store.py to parse the JSON device_name configuration and dynamically select the appropriate IB device(s) based on the current tp_rank.
Documentation Update: Updated the README.md to include detailed instructions and examples for configuring device_name using the new JSON format, both directly and via environment variables.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for user-defined, per-tensor-parallel-rank InfiniBand device configurations for the Mooncake storage backend. This is achieved by allowing the device_name parameter to be a JSON string that maps ranks to device names. The implementation correctly parses this configuration to select the appropriate device for the current rank, and the documentation has been updated accordingly.

My review includes a correction for an invalid JSON example in the README and a suggestion to simplify the device lookup logic in the Python code to remove a redundant check. Overall, the changes are well-implemented and introduce a valuable feature.

gemini-code-assist · 2025-11-12T03:15:10Z

python/sglang/srt/mem_cache/storage/mooncake_store/README.md

+  - For tensor parallel deployments where different ranks should use different devices, you can specify device configurations using JSON format:
+    ```json
+    {
+    "device_name": "{0: \"ib0,ib1\", 1: \"ib2,ib3\", 2: \"ib4,ib5\"}"


The JSON example for device_name is invalid. The value for device_name should be a string containing valid JSON, which requires keys to be double-quoted strings. The current example is a string representation of a Python dictionary and will fail to parse with json.loads.

For consistency with the environment variable example, please update the JSON example.

Suggested change

"device_name": "{0: \"ib0,ib1\", 1: \"ib2,ib3\", 2: \"ib4,ib5\"}"

"device_name": "{\"0\": \"ib0,ib1\", \"1\": \"ib2,ib3\", \"2\": \"ib4,ib5\"}"

gemini-code-assist · 2025-11-12T03:15:10Z

python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_store.py

+                        device_name = device_config.get(tp_rank, "")
+                        if not device_name:
+                            device_name = device_config.get(str(tp_rank), "")


The json.loads function always produces a dictionary with string keys. Therefore, looking up with an integer tp_rank will always fail. You can simplify this logic to only use str(tp_rank) for the lookup.

Additionally, the comment on line 200 is misleading and should be updated or removed.

Suggested change

device_name = device_config.get(tp_rank, "")

if not device_name:

device_name = device_config.get(str(tp_rank), "")

device_name = device_config.get(str(tp_rank), "")

huangtingwei9988 · 2025-11-13T05:39:57Z

cc @ykwd

ykwd · 2025-11-13T06:02:33Z

cc @ykwd

Thanks

stmatengss added 3 commits November 12, 2025 10:55

[HiCache] add GPU id to IB dev topo for Mooncake

92aa4ec

update doc

4f9cebc

update doc

62f6f45

stmatengss requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners November 12, 2025 03:13

stmatengss requested a review from huangtingwei9988 November 12, 2025 03:13

github-actions bot added the documentation Improvements or additions to documentation label Nov 12, 2025

gemini-code-assist bot reviewed Nov 12, 2025

View reviewed changes

stmatengss added the hicache Hierarchical Caching for SGLang label Nov 12, 2025

lint

615f510

stmatengss assigned xiezhq-hermann Nov 12, 2025

stmatengss added the run-ci label Nov 13, 2025

huangtingwei9988 approved these changes Nov 13, 2025

View reviewed changes

stmatengss added the ready-to-merge The PR is ready to merge after the CI is green. label Nov 13, 2025

xiezhq-hermann approved these changes Nov 14, 2025

View reviewed changes

ShangmingCai merged commit 290fcd8 into sgl-project:main Nov 17, 2025
307 of 360 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[HiCache] add GPU id to IB dev topo for mooncake storage backend #13112

[HiCache] add GPU id to IB dev topo for mooncake storage backend #13112

Uh oh!

stmatengss commented Nov 12, 2025

Uh oh!

gemini-code-assist bot commented Nov 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Uh oh!

gemini-code-assist bot Nov 12, 2025

Uh oh!

huangtingwei9988 commented Nov 13, 2025

Uh oh!

ykwd commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	"device_name": "{0: \"ib0,ib1\", 1: \"ib2,ib3\", 2: \"ib4,ib5\"}"
	"device_name": "{\"0\": \"ib0,ib1\", \"1\": \"ib2,ib3\", \"2\": \"ib4,ib5\"}"

[HiCache] add GPU id to IB dev topo for mooncake storage backend #13112

[HiCache] add GPU id to IB dev topo for mooncake storage backend #13112

Uh oh!

Conversation

stmatengss commented Nov 12, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

huangtingwei9988 commented Nov 13, 2025

Uh oh!

ykwd commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants