Skip to content

[grpc] Support gRPC standard health check#11955

Merged
slin1237 merged 3 commits intomainfrom
chang/health-check
Oct 22, 2025
Merged

[grpc] Support gRPC standard health check#11955
slin1237 merged 3 commits intomainfrom
chang/health-check

Conversation

@CatherineSue
Copy link
Collaborator

@CatherineSue CatherineSue commented Oct 22, 2025

Motivation

This PR adds a HealthCheck servicer following gRPC's standard Health Checking Protocol, grpc.health.v1.Health

This enables native Kubernetes gRPC health probes without requiring custom scripts or tools.

Reference: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe

Health Check Services

The gRPC server exposes two health check services:

  1. Overall Server Health (service: "")

    • Checks if the gRPC server process is running
    • Fast, lightweight check
    • Recommended for liveness probes
  2. SGLang Service Health (service: "sglang.grpc.scheduler.SglangScheduler")

    • Checks if the model is loaded and ready to serve
    • Checks if scheduler is responsive
    • Recommended for readiness probes

Modifications

  • Add grpcio-health-checking into pyproject.toml. This is required for HealthServicer
    • NOTE: using customized exec would require grpcurl dependency, and it would be slower, so it has no advantage over the current approach
  • Add SGLangHealthServicer to inherit from standard health_pb2_grpc.HealthServicer
  • Move _run_scheduler_with_signal_handling and _launch_scheduler_process_only to grpc/scheduler_handler.py as grpc_server.py is getting too long

Kubernetes Configuration

Basic Example

apiVersion: v1
kind: Pod
metadata:
  name: sglang-server
spec:
  containers:
  - name: sglang
    image: your-sglang-image:latest
    ports:
    - containerPort: 30000
      name: grpc

    # Liveness: Restart if process dies
    livenessProbe:
      grpc:
        port: 30000
        service: ""  # Overall server health
      initialDelaySeconds: 60
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

    # Readiness: Don't send traffic until ready
    readinessProbe:
      grpc:
        port: 30000
        service: "sglang.grpc.scheduler.SglangScheduler"
      initialDelaySeconds: 30
      periodSeconds: 5
      timeoutSeconds: 3
      failureThreshold: 3

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@slin1237
Copy link
Collaborator

tested manually

@slin1237 slin1237 merged commit 6ade6a0 into main Oct 22, 2025
29 of 58 checks passed
@slin1237 slin1237 deleted the chang/health-check branch October 22, 2025 23:59
@slin1237 slin1237 mentioned this pull request Oct 23, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments