Encoder Global Cache Manager#16137

Merged

Kangyan-Zhou merged 21 commits intosgl-project:mainfrom

liusy58:global_cache

Feb 25, 2026

Collaborator

liusy58 commented Dec 30, 2025 •

edited

Loading

Motivation

This PR introduces a multi-level multimodal embedding cache powered by Mooncake distributed store, enabling cross-instance sharing of Vision Transformer (ViT) embeddings to avoid redundant GPU computation for previously processed images.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
After green CI and required approvals, ask Merge Oncalls to merge.


          global cache

4654ca6

liusy58 requested review from ByronHsu, ShangmingCai, Ying1123, hanming-lu, hnyls2002, merrymercy, xiezhq-hermann and yizhang2077 as code owners

December 30, 2025 06:11

Contributor

gemini-code-assist bot commented Dec 30, 2025

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Collaborator

stmatengss commented Dec 30, 2025

/tag-and-rerun-ci

github-actions bot added the run-ci label

XucSh mentioned this pull request

[Feature] Introduce an Encoder Cache Manager #16957

Closed

huangtingwei9988 self-assigned this

liusy58 added 3 commits

January 16, 2026 23:25


          Merge branch 'main' into global_cache

586339e

fix

42a3f5a

add

4bbce2f

liusy58 changed the title ~~[WIP] Encoder Global Cache Manager~~ Encoder Global Cache Manager

stmatengss requested a review from Copilot

January 18, 2026 06:22

Copilot started reviewing on behalf of stmatengss

January 18, 2026 06:22

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR introduces an Encoder Global Cache Manager feature that enables caching of encoder embeddings using Mooncake distributed storage backend. The implementation aims to reduce redundant GPU encoding by caching computed embeddings across multiple requests and nodes.

Changes:

Added --enable-mm-global-cache CLI argument to enable the global cache feature
Implemented MooncakeEmbeddingStore for distributed storage of embeddings
Created EmbeddingCacheController to manage local memory allocation and coordinate with Mooncake backend
Integrated cache checking and prefetching into the encode server workflow

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 29 comments.

Show a summary per file

File	Description
python/sglang/srt/server_args.py	Adds new CLI flag to enable multimodal global cache
python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_embedding_store.py	New file implementing Mooncake-based distributed embedding storage
python/sglang/srt/managers/embedding_cache_controller.py	New file implementing cache controller with memory management and async I/O
python/sglang/srt/managers/scheduler.py	Integrates cache controller into scheduler initialization
python/sglang/srt/managers/schedule_batch.py	Adds debug print statement (should be removed)
python/sglang/srt/disaggregation/encode_server.py	Implements cache-aware encoding workflow with hit/miss handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_embedding_store.py Outdated Show resolved Hide resolved

python/sglang/srt/mem_cache/storage/mooncake_store/embedding_cache_controller.py Outdated Show resolved Hide resolved

python/sglang/srt/disaggregation/encode_server.py Show resolved Hide resolved

python/sglang/srt/disaggregation/encode_server.py Outdated Show resolved Hide resolved

python/sglang/srt/disaggregation/encode_server.py Outdated Show resolved Hide resolved

python/sglang/srt/disaggregation/encode_server.py Show resolved Hide resolved

python/sglang/srt/mem_cache/storage/mooncake_store/embedding_cache_controller.py Outdated Show resolved Hide resolved

python/sglang/srt/mem_cache/storage/mooncake_store/embedding_cache_controller.py Outdated

Comment on lines +19 to +27

    
              import logging

              import threading

              from typing import List, Optional

              import torch

              logger = logging.getLogger(__name__)

Copilot AI Jan 18, 2026

This import of module threading is redundant, as it was previously imported on line 2.

Suggested change

      
            import logging
          
            import threading
          
            from typing import List, Optional
          
            import torch
          
            logger = logging.getLogger(__name__)
          
            from typing import Optional

Copilot uses AI. Check for mistakes.

python/sglang/srt/mem_cache/storage/mooncake_store/embedding_cache_controller.py

    
                              op.mark_done(all(results))

                              self.prefetch_queue.task_done()

                              processed_any = True

                          except Empty:

Copilot AI Jan 18, 2026

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.

python/sglang/srt/mem_cache/storage/mooncake_store/embedding_cache_controller.py

    
                              )

                              self.insert_queue.task_done()

                              processed_any = True

                          except Empty:

Copilot AI Jan 18, 2026

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.

hzh0425 self-assigned this


          Merge branch 'main' into global_cache

666e5ba

hzh0425 assigned xiezhq-hermann

ZhengWG reviewed

View reviewed changes

python/sglang/srt/disaggregation/encode_server.py Outdated Show resolved Hide resolved

python/sglang/srt/mem_cache/storage/mooncake_store/embedding_cache_controller.py Show resolved Hide resolved

ZhengWG reviewed

View reviewed changes

python/sglang/srt/managers/embedding_cache_controller.py Outdated Show resolved Hide resolved

stmatengss self-assigned this

stmatengss reviewed

View reviewed changes

python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_embedding_store.py Outdated Show resolved Hide resolved

liusy58 added 2 commits

January 24, 2026 00:36


          Merge branch 'main' of https://github.com/sgl-project/sglang into glo…

2cb3052

…bal_cache


          fix conflict

1572ec8

liusy58 added 3 commits

February 12, 2026 00:15


          Merge branch 'global_cache' of github.com:liusy58/sglang into global_…

7000ff3

…cache

fix

bda58a8

fix

17f133b

Collaborator Author

liusy58 commented Feb 12, 2026

/rerun-failed-ci

3 similar comments

Collaborator Author

liusy58 commented Feb 12, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci


          Merge branch 'main' into global_cache

b78d6c2

Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

7 similar comments

Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 14, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 15, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 15, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 16, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 18, 2026

/rerun-failed-ci

liusy58 requested a review from ShangmingCai

February 24, 2026 01:40

liusy58 added 2 commits

February 24, 2026 14:24


          clean code

571e518


          clean code

09ef470

Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

5 similar comments

Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

Collaborator Author

liusy58 commented Feb 25, 2026

/rerun-failed-ci

Kangyan-Zhou merged commit 245430e into sgl-project:main

204 of 220 checks passed

stmatengss mentioned this pull request

[HiCache] Support DeepSeek V3.2 L3 offloading #18637

Closed

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request


          Encoder Global Cache Manager (sgl-project#16137)

25c6957

Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request


          Encoder Global Cache Manager (sgl-project#16137)

ec916df

Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request


          Encoder Global Cache Manager (sgl-project#16137)

35b7dc9

Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Copilot code review Copilot Copilot left review comments

stmatengss stmatengss approved these changes

xiezhq-hermann xiezhq-hermann approved these changes

hzh0425 hzh0425 approved these changes

merrymercy Awaiting requested review from merrymercy merrymercy is a code owner

Ying1123 Awaiting requested review from Ying1123 Ying1123 is a code owner

hnyls2002 Awaiting requested review from hnyls2002 hnyls2002 is a code owner

hanming-lu Awaiting requested review from hanming-lu hanming-lu is a code owner

yizhang2077 Awaiting requested review from yizhang2077 yizhang2077 is a code owner

ByronHsu Awaiting requested review from ByronHsu ByronHsu is a code owner

ZhengWG Awaiting requested review from ZhengWG

ShangmingCai Awaiting requested review from ShangmingCai ShangmingCai is a code owner

Labels