Skip to content

[model-gateway] reduce cpu overhead#15316

Merged
slin1237 merged 1 commit intomainfrom
metric-n/13
Dec 17, 2025
Merged

[model-gateway] reduce cpu overhead#15316
slin1237 merged 1 commit intomainfrom
metric-n/13

Conversation

@slin1237
Copy link
Collaborator

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance and reducing the CPU overhead of the sgl-model-gateway. It achieves this by refactoring the circuit breaker to use lock-free atomic operations for its state, which eliminates contention from read-write locks. Additionally, it introduces a comprehensive optimization for metric label handling, ensuring that frequently used string labels are represented by static references rather than being allocated dynamically for every metric event, leading to fewer memory allocations and improved efficiency across the system.

Highlights

  • Circuit Breaker Optimization: The circuit breaker implementation has been refactored to use lock-free atomic operations (AtomicU8, AtomicU64) instead of RwLock for managing its state and timestamps. This change significantly reduces contention and CPU overhead on hot paths, improving the performance and scalability of the circuit breaker.
  • Metric Label String Optimization: Introduced new helper functions and constants to handle metric labels more efficiently. Common HTTP methods, status codes, and boolean flags (like 'streaming') are now converted to static string references or Cow<'static, str> to avoid dynamic string allocations for every metric event, leading to reduced memory allocations and improved performance.
  • Performance Improvements: Applied #[inline] attributes to critical functions within the circuit breaker and metric handling to encourage the Rust compiler to inline these functions, potentially leading to further performance gains by reducing function call overhead.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the circuit breaker implementation to improve performance by replacing Arc<RwLock<...>> with lock-free atomic operations (AtomicU8, AtomicU32, AtomicU64) for state management and time tracking. This change affects how the circuit breaker's state, last failure time, and last state change time are stored and accessed, making hot paths more efficient through atomic loads, stores, and compare-and-swap operations. Additionally, the pull request introduces extensive optimizations for metric recording across the codebase. This includes new utility functions like bool_to_static_str, status_code_to_static_str, method_to_static_str, and status_code_to_cow to convert frequently used string labels (e.g., HTTP methods, status codes, boolean flags) into static string references or Cow types, thereby reducing heap allocations during metric collection. These metric-related changes are applied to HTTP request/response recording, router requests, upstream responses, and worker retry backoff metrics, ensuring that common label values avoid dynamic string allocations.

@slin1237 slin1237 force-pushed the metric-n/13 branch 2 times, most recently from f7d702d to 7cf5a5b Compare December 17, 2025 17:37
The get_model! macro was ineffective as both branches performed clone().
Simplify to straightforward clones where needed, with a move on final use.

This addresses code review feedback about confusing dead code.

Optimize HTTP metrics with static strings

Reduce allocations in hot paths by using static strings:

- Add method_to_static_str() for HTTP methods (GET, POST, etc.)
- Update record_http_request/record_http_duration to use &'static str
- Optimize record_router_upstream_response with status_code_to_cow
- Optimize record_worker_retry_backoff for common attempt values (1-5)
- Update HttpMetricsMiddleware to use static method strings

This eliminates string allocations on every HTTP request for method
and status_class labels.

Replace CircuitBreaker RwLock with lock-free atomics

Convert CircuitBreaker to use atomic operations instead of RwLock for
significantly reduced contention under high concurrency:

- Replace Arc<RwLock<CircuitState>> with AtomicU8
- Replace Arc<RwLock<Instant>> with AtomicU64 (ms timestamps)
- Use compare-and-swap for state transitions
- Hot path (can_execute/state) is now fully lock-free

Also update CPU_OVERHEAD_ANALYSIS.md with high-concurrency analysis
documenting lock contention issues in metrics, TokenBucket, and
WorkerRegistry.

Estimated improvement: 20-30ms latency reduction under high concurrency.

Reduce CPU overhead in metrics system

Key optimizations:
- Add bool_to_static_str() for zero-cost bool-to-string conversion
- Add status_code_to_static_str() lookup table for common HTTP codes
- Change record_router_request() streaming param from bool to &'static str
- Optimize record_streaming_metrics() to minimize string clones
- Update all router call sites to use static strings

This eliminates allocations for the streaming label (was bool.to_string())
and reduces allocations in record_streaming_metrics from 5 clones to max 3.
Common HTTP status codes (200, 400, 500, etc.) now use static strings.

Add CPU overhead analysis for gateway 0.2.3 vs 0.2.4

Detailed analysis comparing metrics, middleware, and other changes
between gateway versions. Key findings:

- String allocations in metrics (HIGH impact): Every metric call
  with model_id allocates via .to_string()
- High label cardinality (MEDIUM): 5-6 labels per router metric
- New HTTP metrics layer (MEDIUM): Additional per-request processing
- OTEL tracing (MEDIUM when enabled): Trace context propagation
- WASM plugin support (LOW): Only when plugins loaded

Includes recommendations for addressing overhead issues.
@slin1237 slin1237 merged commit d747147 into main Dec 17, 2025
50 of 57 checks passed
@slin1237 slin1237 deleted the metric-n/13 branch December 17, 2025 17:42
Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments