-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
In production, we observed intermittent throughput drops and tail-latency spikes during bursts of long-context requests. We lacked visibility into whether the regressions were compute-bound or due to KV cache movement between GPU and CPU. To address this, we added four metrics to improve diagnosis and tuning:
- sglang:new_token_ratio (Gauge): Tracks the proportion of newly generated tokens relative to reused tokens, helping distinguish steady decoding vs. frequent context rebuilds or cache misses.
- sglang:eviction_duration_seconds (Histogram): Time spent evicting memory from GPU to CPU, exposing memory-pressure and paging overhead.
- sglang:load_back_duration_seconds (Histogram): Time spent loading memory from CPU back to GPU, highlighting thrash and hot-cache misses.
- sglang:chunked_prefill_loop_count (Histogram): Number of loops in chunked prefill, indicating how fragmented or oversized prefill segments are under load.
With these, we were able to pinpoint periods where eviction/load-back dominated latency (e.g., P95 eviction ~80 ms, load-back ~120 ms during spikes) and where prefill loops frequently exceeded expected counts, signaling suboptimal chunk sizing. After tuning chunk sizes and eviction thresholds, we saw improved throughput and reduced p99 latency in our internal deployments.
Example usages:
- Alert if eviction_duration_seconds or load_back_duration_seconds P95 > 50 ms sustained for N minutes.
- Track new_token_ratio drops below an expected band (e.g., <0.3) during bursts, signaling excessive cache rebuilds.
- Watch chunked_prefill_loop_count > 5 as a heuristic for suboptimal prefill chunking.
Related resources
No response
hzh0425 and Edenzzzz