Skip to content

Encoder Global Cache Manager#16137

Merged
Kangyan-Zhou merged 21 commits intosgl-project:mainfrom
liusy58:global_cache
Feb 25, 2026
Merged

Encoder Global Cache Manager#16137
Kangyan-Zhou merged 21 commits intosgl-project:mainfrom
liusy58:global_cache

Conversation

@liusy58
Copy link
Copy Markdown
Collaborator

@liusy58 liusy58 commented Dec 30, 2025

Motivation

This PR introduces a multi-level multimodal embedding cache powered by Mooncake distributed store, enabling cross-instance sharing of Vision Transformer (ViT) embeddings to avoid redundant GPU computation for previously processed images.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@stmatengss
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@liusy58 liusy58 changed the title [WIP] Encoder Global Cache Manager Encoder Global Cache Manager Jan 17, 2026
@stmatengss stmatengss requested a review from Copilot January 18, 2026 06:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an Encoder Global Cache Manager feature that enables caching of encoder embeddings using Mooncake distributed storage backend. The implementation aims to reduce redundant GPU encoding by caching computed embeddings across multiple requests and nodes.

Changes:

  • Added --enable-mm-global-cache CLI argument to enable the global cache feature
  • Implemented MooncakeEmbeddingStore for distributed storage of embeddings
  • Created EmbeddingCacheController to manage local memory allocation and coordinate with Mooncake backend
  • Integrated cache checking and prefetching into the encode server workflow

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 29 comments.

Show a summary per file
File Description
python/sglang/srt/server_args.py Adds new CLI flag to enable multimodal global cache
python/sglang/srt/mem_cache/storage/mooncake_store/mooncake_embedding_store.py New file implementing Mooncake-based distributed embedding storage
python/sglang/srt/managers/embedding_cache_controller.py New file implementing cache controller with memory management and async I/O
python/sglang/srt/managers/scheduler.py Integrates cache controller into scheduler initialization
python/sglang/srt/managers/schedule_batch.py Adds debug print statement (should be removed)
python/sglang/srt/disaggregation/encode_server.py Implements cache-aware encoding workflow with hit/miss handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +19 to +27
import logging
import threading
from typing import List, Optional

import torch

logger = logging.getLogger(__name__)


Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of module threading is redundant, as it was previously imported on line 2.

Suggested change
import logging
import threading
from typing import List, Optional
import torch
logger = logging.getLogger(__name__)
from typing import Optional

Copilot uses AI. Check for mistakes.
op.mark_done(all(results))
self.prefetch_queue.task_done()
processed_any = True
except Empty:
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
)
self.insert_queue.task_done()
processed_any = True
except Empty:
Copy link

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
@hzh0425 hzh0425 self-assigned this Jan 19, 2026
@stmatengss stmatengss self-assigned this Jan 21, 2026
@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 12, 2026

/rerun-failed-ci

3 similar comments
@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 12, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

7 similar comments
@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 13, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 14, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 15, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 15, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 16, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 18, 2026

/rerun-failed-ci

@liusy58 liusy58 requested a review from ShangmingCai February 24, 2026 01:40
@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

5 similar comments
@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 24, 2026

/rerun-failed-ci

@liusy58
Copy link
Copy Markdown
Collaborator Author

liusy58 commented Feb 25, 2026

/rerun-failed-ci

@Kangyan-Zhou Kangyan-Zhou merged commit 245430e into sgl-project:main Feb 25, 2026
204 of 220 checks passed
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Co-authored-by: Zheng Wengang <zwg0606@gmail.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants