Clean match_prefix and prepare_for_extend for mem cache V2 by cctry · Pull Request #11200 · sgl-project/sglang

cctry · 2025-10-03T20:46:21Z

Motivation

Preparation for mem_cache V2.
This PR tries to clean the existing code to seperate memory operations from other procedures.

Modifications

Two function implementations are changed

init_next_round : remove adjust_max_prefix_ids to make most field setting before match_prefix
prepare_for_extend:

move all memory related code together
Change the writing to req_to_token_pool to include both the setting of prefix_indices and out_cache_loc.
Therefore, the get_last_loc function is replaced by a simpler implementation without searching the req_to_token_pool

Also

for req.prefix_indices, the default value is changed from [] to torch.empty((0,), dtype=torch.int64) to respect type hint
fix chunk_cache with correct token indices dtype (int64)

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-03T20:46:38Z

Summary of Changes

Hello @cctry, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on internal code cleanup and refactoring, primarily within the schedule_batch.py and schedule_policy.py files. The main goal is to prepare the codebase for mem_cache V2 by separating memory allocation operations from other procedures. This involves streamlining prefix matching logic, centralizing memory writing functions, and reorganizing the prepare_for_extend method for improved clarity and maintainability.

Highlights

Refactored Prefix Matching Logic: The logic for calculating the maximum prefix length and token IDs has been moved directly into the init_next_round_input method, eliminating the separate adjust_max_prefix_ids function for cleaner code and better flow.
Centralized Memory Writing: A new write_cache_indices method has been introduced to centralize the process of writing both prefix tokens and allocated output cache locations (out_cache_loc) to the req_to_token_pool, simplifying memory management within prepare_for_extend.
Streamlined prepare_for_extend: The prepare_for_extend method has been significantly refactored to consolidate memory allocation operations and delegate the complex req_to_token_pool writing logic to the new write_cache_indices method, improving modularity and readability.
Enhanced Triton Kernel for Prefix Handling: The write_req_to_token_pool_triton kernel has been updated to explicitly handle and write prefix tensors, ensuring consistent behavior across different memory writing paths (Triton and non-Triton).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors init_next_round and prepare_for_extend to improve code structure and separate memory operations. The changes are generally good and improve code clarity. I've provided a few suggestions to further enhance maintainability and performance, including simplifying tensor creation logic and optimizing a Triton kernel by pre-computing a cumulative sum.

python/sglang/srt/managers/schedule_batch.py

…ct#11200)

cctry added 5 commits October 3, 2025 12:08

match_prefix

9f238a4

fix

c0fc6a0

prepare_for_extend: adjust order

baa34d7

prepare_for_extend: unify indices writing

ffc57c0

clean

75e3cf7

cctry requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners October 3, 2025 20:46

sglang-bot added the run-ci label Oct 3, 2025

gemini-code-assist bot reviewed Oct 3, 2025

View reviewed changes

python/sglang/srt/managers/schedule_batch.py Show resolved Hide resolved

python/sglang/srt/managers/schedule_batch.py Outdated Show resolved Hide resolved

cctry added 2 commits October 3, 2025 14:00

clean

3fe0dd6

set prefix_indices default

ebb29fc

cctry requested a review from zhyncs as a code owner October 3, 2025 22:37

cctry changed the title ~~Clean match_prefix and prepare_for_extend~~ Clean match_prefix and prepare_for_extend for mem cache V2 Oct 3, 2025

fix chunk cache

5fa6bbf

merrymercy approved these changes Oct 8, 2025

View reviewed changes

cctry merged commit f3764c2 into main Oct 8, 2025
96 of 110 checks passed

cctry deleted the shiyang/mem_v2/clean branch October 8, 2025 00:54

ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025

Clean match_prefix and prepare_for_extend for mem cache V2 (sgl-proje…

8804765

…ct#11200)

lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025

Clean match_prefix and prepare_for_extend for mem cache V2 (sgl-proje…

ffcc8f7

…ct#11200)

hnyls2002 mentioned this pull request Nov 4, 2025

[Feature] Memory Cache System Refactoring Road Map (Mem Cache V2) #12587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean match_prefix and prepare_for_extend for mem cache V2#11200

Clean match_prefix and prepare_for_extend for mem cache V2#11200
cctry merged 8 commits intomainfrom
shiyang/mem_v2/clean

cctry commented Oct 3, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 3, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

cctry commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 3, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

cctry commented Oct 3, 2025 •

edited

Loading