[router][grpc] disable health check generation and increase timeout by slin1237 · Pull Request #11353 · sgl-project/sglang

slin1237 · 2025-10-09T02:21:57Z

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-09T02:22:08Z

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This PR simplifies the gRPC health check on the server side by removing the generation-based validation, making it a simple 'server is alive' check. Concurrently, it dramatically increases the gRPC client's channel timeout from 30 seconds to 3600 seconds, likely to accommodate longer-running operations without premature disconnections.

Highlights

Intent: This pull request aims to modify the gRPC health check mechanism and increase the gRPC client timeout. Specifically, it disables the active generation-based health check in the server and extends the client-side gRPC channel timeout.
Changes: The changes involve two main areas:

gRPC Server Health Check (python/sglang/srt/entrypoints/grpc_server.py): The HealthCheck function has been simplified. Previously, it performed a full generation request to validate the server's health, including submitting a TokenizedGenerateReqInput and waiting for a response. This logic has been removed, and the function now immediately returns a healthy=True response, effectively making it a passive 'server is up' check rather than an active 'server can generate' check.
gRPC Client Timeout (sgl-router/src/grpc_client/sglang_scheduler.rs): The gRPC channel timeout in the SglangSchedulerClient has been significantly increased from 30 seconds to 3600 seconds (1 hour). This change allows gRPC calls to have a much longer duration before timing out.

Reviewer Activity: No reviewer activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request simplifies the gRPC health check to a liveness probe and increases the client-side timeout. The changes are logical, but there are a few areas for improvement regarding robustness and configurability. I've left some suggestions to address these points.

gemini-code-assist · 2025-10-09T02:23:38Z

python/sglang/srt/entrypoints/grpc_server.py

+        return sglang_scheduler_pb2.HealthCheckResponse(
+            healthy=True, message="Health check passed"
+        )


The simplified health check always returns healthy=True. This can be problematic during a graceful shutdown, as a load balancer or service orchestrator might continue to send traffic to a server that is about to terminate. The previous implementation correctly handled this by checking self.request_manager.gracefully_exit. This check should be restored to ensure the server reports as unhealthy when it's in the process of shutting down.

if self.request_manager.gracefully_exit: return sglang_scheduler_pb2.HealthCheckResponse( healthy=False, message="Server is shutting down" ) return sglang_scheduler_pb2.HealthCheckResponse( healthy=True, message="Health check passed" )

gemini-code-assist · 2025-10-09T02:23:38Z

python/sglang/srt/entrypoints/grpc_server.py

        request: sglang_scheduler_pb2.HealthCheckRequest,
        context: grpc.aio.ServicerContext,


The request and context arguments are no longer used in the HealthCheck method. According to PEP 8, it's good practice to prefix unused arguments with an underscore to make this explicit.

Suggested change

request: sglang_scheduler_pb2.HealthCheckRequest,

context: grpc.aio.ServicerContext,

_request: sglang_scheduler_pb2.HealthCheckRequest,

_context: grpc.aio.ServicerContext,

gemini-code-assist · 2025-10-09T02:23:39Z

sgl-router/src/grpc_client/sglang_scheduler.rs


        let channel = Channel::from_shared(http_endpoint)?
-            .timeout(Duration::from_secs(30))
+            .timeout(Duration::from_secs(3600))


Hardcoding the timeout to 3600 seconds (1 hour) makes the client inflexible. While a long timeout might be necessary for some use cases, it can also mask problems with an unresponsive server.

It would be better to make this value configurable, for example, by reading it from an environment variable. This would allow operators to tune the timeout to their specific needs.

Example:

use std::env; use std::time::Duration; // In connect() const DEFAULT_GRPC_TIMEOUT_SECS: u64 = 3600; let timeout_secs = env::var("SGLANG_GRPC_CLIENT_TIMEOUT") .ok() .and_then(|s| s.parse().ok()) .unwrap_or(DEFAULT_GRPC_TIMEOUT_SECS); let channel = Channel::from_shared(http_endpoint)? .timeout(Duration::from_secs(timeout_secs)) // ...

…gl-project#11353)

[router][grpc] disable health check generation and increase timeout

4ad51fa

slin1237 requested review from ByronHsu, CatherineSue, JustinTong0323, ispobock and merrymercy as code owners October 9, 2025 02:21

sglang-bot added the run-ci label Oct 9, 2025

slin1237 merged commit 368fd20 into main Oct 9, 2025
15 of 51 checks passed

slin1237 deleted the grpc-fix branch October 9, 2025 02:23

gemini-code-assist bot reviewed Oct 9, 2025

View reviewed changes

ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025

[router][grpc] disable health check generation and increase timeout (s…

76b0557

…gl-project#11353)

CatherineSue mentioned this pull request Oct 9, 2025

[router][grpc] Replace fake health check with correct ones #11387

Merged

4 tasks

lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025

[router][grpc] disable health check generation and increase timeout (s…

9e28baa

…gl-project#11353)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[router][grpc] disable health check generation and increase timeout#11353

[router][grpc] disable health check generation and increase timeout#11353
slin1237 merged 1 commit intomainfrom
grpc-fix

slin1237 commented Oct 9, 2025

Uh oh!

gemini-code-assist bot commented Oct 9, 2025

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

gemini-code-assist bot Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

		request: sglang_scheduler_pb2.HealthCheckRequest,
		context: grpc.aio.ServicerContext,

Conversation

slin1237 commented Oct 9, 2025

Checklist

Uh oh!

gemini-code-assist bot commented Oct 9, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments