Skip to content

[router][grpc] disable health check generation and increase timeout#11353

Merged
slin1237 merged 1 commit intomainfrom
grpc-fix
Oct 9, 2025
Merged

[router][grpc] disable health check generation and increase timeout#11353
slin1237 merged 1 commit intomainfrom
grpc-fix

Conversation

@slin1237
Copy link
Collaborator

@slin1237 slin1237 commented Oct 9, 2025

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This PR simplifies the gRPC health check on the server side by removing the generation-based validation, making it a simple 'server is alive' check. Concurrently, it dramatically increases the gRPC client's channel timeout from 30 seconds to 3600 seconds, likely to accommodate longer-running operations without premature disconnections.

Highlights

  • Intent: This pull request aims to modify the gRPC health check mechanism and increase the gRPC client timeout. Specifically, it disables the active generation-based health check in the server and extends the client-side gRPC channel timeout.
  • Changes: The changes involve two main areas:
  1. gRPC Server Health Check (python/sglang/srt/entrypoints/grpc_server.py): The HealthCheck function has been simplified. Previously, it performed a full generation request to validate the server's health, including submitting a TokenizedGenerateReqInput and waiting for a response. This logic has been removed, and the function now immediately returns a healthy=True response, effectively making it a passive 'server is up' check rather than an active 'server can generate' check.
  2. gRPC Client Timeout (sgl-router/src/grpc_client/sglang_scheduler.rs): The gRPC channel timeout in the SglangSchedulerClient has been significantly increased from 30 seconds to 3600 seconds (1 hour). This change allows gRPC calls to have a much longer duration before timing out.
  • Reviewer Activity: No reviewer activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@slin1237 slin1237 merged commit 368fd20 into main Oct 9, 2025
15 of 51 checks passed
@slin1237 slin1237 deleted the grpc-fix branch October 9, 2025 02:23
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the gRPC health check to a liveness probe and increases the client-side timeout. The changes are logical, but there are a few areas for improvement regarding robustness and configurability. I've left some suggestions to address these points.

Comment on lines +317 to +319
return sglang_scheduler_pb2.HealthCheckResponse(
healthy=True, message="Health check passed"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The simplified health check always returns healthy=True. This can be problematic during a graceful shutdown, as a load balancer or service orchestrator might continue to send traffic to a server that is about to terminate. The previous implementation correctly handled this by checking self.request_manager.gracefully_exit. This check should be restored to ensure the server reports as unhealthy when it's in the process of shutting down.

        if self.request_manager.gracefully_exit:
            return sglang_scheduler_pb2.HealthCheckResponse(
                healthy=False, message="Server is shutting down"
            )
        return sglang_scheduler_pb2.HealthCheckResponse(
            healthy=True, message="Health check passed"
        )

Comment on lines 313 to 314
request: sglang_scheduler_pb2.HealthCheckRequest,
context: grpc.aio.ServicerContext,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The request and context arguments are no longer used in the HealthCheck method. According to PEP 8, it's good practice to prefix unused arguments with an underscore to make this explicit.

Suggested change
request: sglang_scheduler_pb2.HealthCheckRequest,
context: grpc.aio.ServicerContext,
_request: sglang_scheduler_pb2.HealthCheckRequest,
_context: grpc.aio.ServicerContext,


let channel = Channel::from_shared(http_endpoint)?
.timeout(Duration::from_secs(30))
.timeout(Duration::from_secs(3600))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the timeout to 3600 seconds (1 hour) makes the client inflexible. While a long timeout might be necessary for some use cases, it can also mask problems with an unresponsive server.

It would be better to make this value configurable, for example, by reading it from an environment variable. This would allow operators to tune the timeout to their specific needs.

Example:

use std::env;
use std::time::Duration;

// In connect()
const DEFAULT_GRPC_TIMEOUT_SECS: u64 = 3600;
let timeout_secs = env::var("SGLANG_GRPC_CLIENT_TIMEOUT")
    .ok()
    .and_then(|s| s.parse().ok())
    .unwrap_or(DEFAULT_GRPC_TIMEOUT_SECS);

let channel = Channel::from_shared(http_endpoint)?
    .timeout(Duration::from_secs(timeout_secs))
    // ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments