[1/N][Sparse With Hicache]: Add Sparse Interface by hzh0425 · Pull Request #14741 · sgl-project/sglang

hzh0425 · 2025-12-09T13:33:16Z

Motivation

This PR primarily introduces a generic Retrievable Sparse Algorithm interface.
Upstream PR：#14619

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-09T13:33:34Z

Summary of Changes

Hello @hzh0425, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a foundational interface for integrating diverse sparse attention algorithms within the system. By introducing a BaseSparseAlgorithm and SparseMode enum, it provides a standardized way to implement and manage KV cache compression techniques. The PR also includes two initial algorithm implementations, DeepSeekNSAAlgorithm and KnormPageAlgorithm, demonstrating the extensibility of this new framework. This change is crucial for enabling more efficient memory usage and potentially faster inference by selectively attending to important KV cache entries.

Highlights

Introduced BaseSparseAlgorithm: A new abstract base class for defining various retrievable KV cache compression algorithms, supporting both page-wise and token-wise sparsity.
Defined SparseMode Enum: An enumeration to categorize sparse attention granularity modes, including PAGE_WISE, TOKEN_WISE, and DEEPSEEK_TOKEN_WISE.
Added DeepSeekNSAAlgorithm: A concrete implementation for handling DeepSeek NSA specific sparse attention logic.
Implemented KnormPageAlgorithm: An experimental page-wise sparse attention algorithm utilizing ChunkKV-style scoring based on key L2 norms.
Included Unit Tests: Comprehensive unit tests for the KnormPageAlgorithm to ensure its correctness and functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a valuable abstraction for sparse attention algorithms by adding a generic BaseSparseAlgorithm interface. The changes include the base class, two initial implementations (DeepSeekNSAAlgorithm and KnormPageAlgorithm), and unit tests for one of them. My review identifies a critical issue in DeepSeekNSAAlgorithm that would lead to a runtime error due to an incorrect return value. I've also pointed out several medium-severity issues in KnormPageAlgorithm related to hardcoded values, dead code, incorrect type hints, and verbose logging that could impact maintainability and performance. Overall, this is a solid architectural improvement, and addressing these points will enhance the robustness and quality of the new sparse attention framework.

python/sglang/srt/mem_cache/sparsity/algorithms/deepseek_nsa.py

python/sglang/srt/mem_cache/sparsity/algorithms/page_wise_algorithm.py

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>

python/sglang/srt/mem_cache/sparsity/algorithms/base_algorithm.py

python/sglang/srt/mem_cache/sparsity/algorithms/deepseek_nsa.py

hzh0425 · 2025-12-20T12:47:31Z

we performed a refactoring by removing the distinction between PageWise and TokenWise, as TokenWise is essentially a special case of PageWise with PageSize = 1.

In base_algorithm.py, I implemented a BaseSparseAlgorithmImpl as a common base class. Subclasses can integrate into the pipeline by implementing a few key methods, and they also have the flexibility to override critical methods such as retrieve_topk and update_representations to customize their specific logic.

Additionally, we plan to improve the performance of BaseSparseAlgorithmImpl in the next PR.
This PR focuses on defining the foundational interfaces and Implement a reasonable sparse Algo.

Could you please review it again when you have time? @xiezhq-hermann

The Accuracy Test for Quest Sparse Algo(Implemented by @magicYang1573):

GSM8K

llama 8B Test: (The accuracy (ACC) of the original model is 0.94.)

CompressionRate 0.3 0.5 0.7

Accuracy ACC 0.92 0.92 0.92

aime25

Qwen32B Test:

Original Mode：

Quest Sparse Mode with 0.5 compresstion rate

hzh0425 · 2025-12-23T08:25:53Z

/rerun-failed-ci

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: MagicYang1573 <1328657938@qq.com>

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: MagicYang1573 <1328657938@qq.com> (cherry picked from commit a89e85e)

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: MagicYang1573 <1328657938@qq.com>

github-actions bot added the deepseek label Dec 9, 2025

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

hzh0425 added the run-ci label Dec 9, 2025

hzh0425 force-pushed the sparse/algorithm_interface1 branch from 199f115 to 9c9d83a Compare December 9, 2025 13:45

Init interface

9c9d83a

Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>

xiezhq-hermann self-assigned this Dec 10, 2025

Merge branch 'main' into sparse/algorithm_interface1

5dd81ed

hzh0425 marked this pull request as ready for review December 11, 2025 07:47

hzh0425 requested review from Ying1123, hanming-lu, hnyls2002, merrymercy, xiezhq-hermann and yizhang2077 as code owners December 11, 2025 07:47

hzh0425 force-pushed the sparse/algorithm_interface1 branch from 37c9b5c to 4d1a182 Compare December 11, 2025 08:45

Refactor page wise algo

4d1a182

xiezhq-hermann reviewed Dec 12, 2025

View reviewed changes

python/sglang/srt/mem_cache/sparsity/algorithms/base_algorithm.py Outdated Show resolved Hide resolved

xiezhq-hermann reviewed Dec 12, 2025

View reviewed changes

python/sglang/srt/mem_cache/sparsity/algorithms/deepseek_nsa.py Outdated Show resolved Hide resolved

xiezhq-hermann reviewed Dec 12, 2025

View reviewed changes

python/sglang/srt/mem_cache/sparsity/algorithms/deepseek_nsa.py Outdated Show resolved Hide resolved

Refactor structure

205c2bc

magicYang1573 requested review from BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, ch-wan, fzyzcjy, ispobock and zhyncs as code owners December 17, 2025 10:25

github-actions bot added the sgl-kernel label Dec 17, 2025

hzh0425 force-pushed the sparse/algorithm_interface1 branch from a07afd8 to 205c2bc Compare December 17, 2025 10:47

hzh0425 removed request for BBuf, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, ch-wan, fzyzcjy, ispobock and zhyncs December 17, 2025 10:48

hzh0425 added hicache Hierarchical Caching for SGLang and removed deepseek sgl-kernel labels Dec 17, 2025

Add Implementation of quest_algorithm.py

0d52ca1

github-actions bot added the deepseek label Dec 17, 2025

Remove KnormPageAlgorithm

86e3a48

xiezhq-hermann added the ready-to-merge The PR is ready to merge after the CI is green. label Dec 23, 2025

xiezhq-hermann approved these changes Dec 23, 2025

View reviewed changes

Merge branch 'main' into sparse/algorithm_interface1

2eba47f

hzh0425 mentioned this pull request Dec 23, 2025

[Sparse & HICache]: Enables hierarchical sparse KV cache management and scheduling for DeepSeek V32. #14619

Open

12 tasks

xiezhq-hermann merged commit a89e85e into sgl-project:main Dec 25, 2025
177 of 189 checks passed

hzh0425 mentioned this pull request Dec 29, 2025

[3/N][Sparse With Hicache]: Init sparse coordinator #16086

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N][Sparse With Hicache]: Add Sparse Interface#14741

[1/N][Sparse With Hicache]: Add Sparse Interface#14741
xiezhq-hermann merged 7 commits intosgl-project:mainfrom
hzh0425:sparse/algorithm_interface1

hzh0425 commented Dec 9, 2025

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hzh0425 commented Dec 20, 2025 •

edited

Loading

Uh oh!

hzh0425 commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hzh0425 commented Dec 9, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Dec 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hzh0425 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GSM8K

aime25

Uh oh!

hzh0425 commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hzh0425 commented Dec 20, 2025 •

edited

Loading