Skip to content

[Bug] [Hisparse] Hisparse CPU Memory Allocation Checker #23704

@xz-keg

Description

@xz-keg

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

I'd like to report a severe CPU memory allocation bug in the v0.5.10 feature Hisparse(DSA CPU-offload decoding)

During Hisparse, for each new token generated, it is backed-up in CPU, causing additional CPU memory usage. This is different from non-Hisparse methods as they do not back-up newly generated kv-cache on CPU unless a sequence is retracted.

However, the memory pool system still uses the non-Hisparse checking method that accepts the request if input_length+a small number< CPU memory.

So when the CPU memory pool is about to become full and the generation length is long, there may be CPU overflow as decode length increases.

Reproduction

Though the overflow may happen at any setting with rare chance, a way to increase the chance of reproducing it is to

1: Use a DSA model(Deepseek-V3.2-Speciale/GLM-5.1) and PD disaggregation, set up a small host_to_device_ratio in hisparse-config, for example, --hisparse-config "{'top_k':2048,'device_buffer_size':4096,'host_to_device_ratio':1}"

2: Prepare long hard prompts that require the model to think a lot of tokens.(input is better at 50-100k range, generation shall be more than 4k each)

3: Arrange a lot of prefill servers and only one decode server to give larger pressure to the decoding side.

The decode server is likely to crash due to short of CPU memory because of the bug described.

Environment

sglang-0.5.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions