feat: basic support for server-level multimodal cache by mickqian · Pull Request #10775 · sgl-project/sglang

mickqian · 2025-09-23T04:31:17Z

Motivation

Previously the MultimodalCache is only for chunked_prefill, and the embedding's lifetime ends with prefill.

Modifications

This PR provides the basic support for a server-level multimodal cache, with LRU as evict policy

TODOs

make cache size and evict policy configurable

Accuracy Tests

testing with qwen/qwen2.5-VL-7B-Instruct

Before

Benchmark time: 119.34484012098983
Overall accuracy: 0.514

After

Benchmark time: 119.47181225300301
Overall accuracy: 0.514

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-09-23T04:31:20Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

merrymercy · 2025-10-23T18:31:42Z

@mickqian please resolve the conflicts.

zhangml · 2025-11-13T11:38:06Z

Hello, could you please provide a command for using server-level multimodal cache?

mickqian requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners September 23, 2025 04:31

sglang-bot added the run-ci label Sep 23, 2025

mickqian force-pushed the server-level-mm-cache branch from 96722b0 to 4471af1 Compare October 20, 2025 07:26

merrymercy approved these changes Oct 23, 2025

View reviewed changes

mickqian added 2 commits October 24, 2025 09:14

feat: server level multimodal cache

4147414

feat: basic support for server-level multimodal cache

f5c32c5

mickqian force-pushed the server-level-mm-cache branch from 4471af1 to f5c32c5 Compare October 24, 2025 01:15

mickqian added 2 commits October 24, 2025 09:34

lint

021bd7d

lint

2c362eb

mickqian requested a review from zhyncs as a code owner October 24, 2025 02:14

hnyls2002 approved these changes Nov 9, 2025

View reviewed changes

hnyls2002 merged commit f5b3ccd into sgl-project:main Nov 9, 2025
99 of 107 checks passed

hnyls2002 mentioned this pull request Nov 9, 2025

[lint] tiny fix unimported packages. #12927

Merged

ZhengWG mentioned this pull request Nov 26, 2025

[CI][VLM]: Add UT for multimodal cache eviction #13370

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: basic support for server-level multimodal cache#10775

feat: basic support for server-level multimodal cache#10775
hnyls2002 merged 4 commits intosgl-project:mainfrom
mickqian:server-level-mm-cache

mickqian commented Sep 23, 2025

Uh oh!

gemini-code-assist bot commented Sep 23, 2025

Uh oh!

merrymercy commented Oct 23, 2025

Uh oh!

Uh oh!

zhangml commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

mickqian commented Sep 23, 2025

Motivation

Modifications

TODOs

Accuracy Tests

Before

After

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Sep 23, 2025

Uh oh!

merrymercy commented Oct 23, 2025

Uh oh!

Uh oh!

zhangml commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments