[model-gateway] reduce cpu overhead by slin1237 · Pull Request #15316 · sgl-project/sglang

slin1237 · 2025-12-17T08:18:10Z

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-17T08:18:34Z

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance and reducing the CPU overhead of the sgl-model-gateway. It achieves this by refactoring the circuit breaker to use lock-free atomic operations for its state, which eliminates contention from read-write locks. Additionally, it introduces a comprehensive optimization for metric label handling, ensuring that frequently used string labels are represented by static references rather than being allocated dynamically for every metric event, leading to fewer memory allocations and improved efficiency across the system.

Highlights

Circuit Breaker Optimization: The circuit breaker implementation has been refactored to use lock-free atomic operations (AtomicU8, AtomicU64) instead of RwLock for managing its state and timestamps. This change significantly reduces contention and CPU overhead on hot paths, improving the performance and scalability of the circuit breaker.
Metric Label String Optimization: Introduced new helper functions and constants to handle metric labels more efficiently. Common HTTP methods, status codes, and boolean flags (like 'streaming') are now converted to static string references or Cow<'static, str> to avoid dynamic string allocations for every metric event, leading to reduced memory allocations and improved performance.
Performance Improvements: Applied #[inline] attributes to critical functions within the circuit breaker and metric handling to encourage the Rust compiler to inline these functions, potentially leading to further performance gains by reducing function call overhead.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly refactors the circuit breaker implementation to improve performance by replacing Arc<RwLock<...>> with lock-free atomic operations (AtomicU8, AtomicU32, AtomicU64) for state management and time tracking. This change affects how the circuit breaker's state, last failure time, and last state change time are stored and accessed, making hot paths more efficient through atomic loads, stores, and compare-and-swap operations. Additionally, the pull request introduces extensive optimizations for metric recording across the codebase. This includes new utility functions like bool_to_static_str, status_code_to_static_str, method_to_static_str, and status_code_to_cow to convert frequently used string labels (e.g., HTTP methods, status codes, boolean flags) into static string references or Cow types, thereby reducing heap allocations during metric collection. These metric-related changes are applied to HTTP request/response recording, router requests, upstream responses, and worker retry backoff metrics, ensuring that common label values avoid dynamic string allocations.

The get_model! macro was ineffective as both branches performed clone(). Simplify to straightforward clones where needed, with a move on final use. This addresses code review feedback about confusing dead code. Optimize HTTP metrics with static strings Reduce allocations in hot paths by using static strings: - Add method_to_static_str() for HTTP methods (GET, POST, etc.) - Update record_http_request/record_http_duration to use &'static str - Optimize record_router_upstream_response with status_code_to_cow - Optimize record_worker_retry_backoff for common attempt values (1-5) - Update HttpMetricsMiddleware to use static method strings This eliminates string allocations on every HTTP request for method and status_class labels. Replace CircuitBreaker RwLock with lock-free atomics Convert CircuitBreaker to use atomic operations instead of RwLock for significantly reduced contention under high concurrency: - Replace Arc<RwLock<CircuitState>> with AtomicU8 - Replace Arc<RwLock<Instant>> with AtomicU64 (ms timestamps) - Use compare-and-swap for state transitions - Hot path (can_execute/state) is now fully lock-free Also update CPU_OVERHEAD_ANALYSIS.md with high-concurrency analysis documenting lock contention issues in metrics, TokenBucket, and WorkerRegistry. Estimated improvement: 20-30ms latency reduction under high concurrency. Reduce CPU overhead in metrics system Key optimizations: - Add bool_to_static_str() for zero-cost bool-to-string conversion - Add status_code_to_static_str() lookup table for common HTTP codes - Change record_router_request() streaming param from bool to &'static str - Optimize record_streaming_metrics() to minimize string clones - Update all router call sites to use static strings This eliminates allocations for the streaming label (was bool.to_string()) and reduces allocations in record_streaming_metrics from 5 clones to max 3. Common HTTP status codes (200, 400, 500, etc.) now use static strings. Add CPU overhead analysis for gateway 0.2.3 vs 0.2.4 Detailed analysis comparing metrics, middleware, and other changes between gateway versions. Key findings: - String allocations in metrics (HIGH impact): Every metric call with model_id allocates via .to_string() - High label cardinality (MEDIUM): 5-6 labels per router metric - New HTTP metrics layer (MEDIUM): Additional per-request processing - OTEL tracing (MEDIUM when enabled): Trace context propagation - WASM plugin support (LOW): Only when plugins loaded Includes recommendations for addressing overhead issues.

slin1237 requested review from CatherineSue and key4ng as code owners December 17, 2025 08:18

github-actions bot added the model-gateway label Dec 17, 2025

slin1237 added the run-ci label Dec 17, 2025

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

slin1237 force-pushed the metric-n/13 branch 2 times, most recently from f7d702d to 7cf5a5b Compare December 17, 2025 17:37

slin1237 force-pushed the metric-n/13 branch from 7cf5a5b to 2fb2d3a Compare December 17, 2025 17:40

slin1237 merged commit d747147 into main Dec 17, 2025
50 of 57 checks passed

slin1237 deleted the metric-n/13 branch December 17, 2025 17:42

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025

[model-gateway] reduce cpu overhead (sgl-project#15316)

76e2a6a

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[model-gateway] reduce cpu overhead (sgl-project#15316)

68db651

GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026

[model-gateway] reduce cpu overhead (sgl-project#15316)

ab7cf5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model-gateway] reduce cpu overhead#15316

[model-gateway] reduce cpu overhead#15316
slin1237 merged 1 commit intomainfrom
metric-n/13

slin1237 commented Dec 17, 2025

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

slin1237 commented Dec 17, 2025

Checklist

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments