[Bug] [Hisparse] Hisparse CPU Memory Allocation Checker

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

I'd like to report a severe CPU memory allocation bug in the v0.5.10 feature Hisparse(DSA CPU-offload decoding) 

During Hisparse, for each new token generated, it is backed-up in CPU, causing additional CPU memory usage. This is different from non-Hisparse methods as they do not back-up newly generated kv-cache on CPU unless a sequence is retracted.

However, the memory pool system still uses the non-Hisparse checking method that accepts the request if input_length+a small number< CPU memory. 

So when the CPU memory pool is about to become full and the generation length is long, there may be CPU overflow as decode length increases. 

 

### Reproduction

Though the overflow may happen at any setting with rare chance, a way to increase the chance of reproducing it is to 


1: Use a DSA model(Deepseek-V3.2-Speciale/GLM-5.1) and PD disaggregation, set up a small ``host_to_device_ratio`` in ``hisparse-config``, for example, ``--hisparse-config "{'top_k':2048,'device_buffer_size':4096,'host_to_device_ratio':1}"`` 

2: Prepare long hard prompts that require the model to think a lot of tokens.(input is better at 50-100k range, generation shall be more than 4k each)

3: Arrange a lot of prefill servers and only one decode server to give larger pressure to the decoding side.

The decode server is likely to crash due to short of CPU memory because of the bug described.






### Environment

sglang-0.5.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Hisparse] Hisparse CPU Memory Allocation Checker #23704

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] [Hisparse] Hisparse CPU Memory Allocation Checker #23704

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions