[router] Add mTLS Support for Router-to-Worker Communication#12019
[router] Add mTLS Support for Router-to-Worker Communication#12019
Conversation
Summary of ChangesHello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces mutual TLS (mTLS) support for secure communication between the SGLang router and its worker nodes. This enhancement is crucial for production environments, particularly where a service mesh is not utilized, by allowing the router to authenticate itself to workers using client certificates and to verify worker identities against specified Certificate Authorities. The changes involve adding new configuration options in Python and integrating the mTLS setup into the Rust core, specifically within the HTTP client builder. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces mTLS support for secure communication between the router and workers, which is a valuable security enhancement. The implementation is well-structured, adding new CLI arguments in Python and handling certificate loading and client configuration in Rust. My review identified a critical issue in the Python argument parsing that prevents CA certificates from being loaded correctly. Additionally, I've pointed out two high-severity error handling issues in the Rust code that could cause the server to panic on startup if certificate files are malformed. Addressing these points will significantly improve the robustness of this new feature.
c887ed9 to
c1d298c
Compare
6e34755 to
e4a12d0
Compare
e4a12d0 to
b457cb3
Compare
Motivation
This PR adds mutual TLS (mTLS) authentication support for secure communication between the SGLang Model Gateway and worker nodes. This enables the router to authenticate itself to workers using client certificates and verify worker identities using custom Certificate Authorities (CAs).
In production deployments, especially in enterprise or multi-tenant environments, secure communication between the gateway and workers is critical. Typically this is done via service mesh. This feature is dedicated when service mesh is no an option.
Modification
Configuration (Python)
Added three new CLI arguments in
router_args.py:--client-cert-path: Path to the router's client certificate for authenticating to workers--client-key-path: Path to the router's private key corresponding to the client certificate--ca-cert-path: Path(s) to CA certificate(s) for verifying worker TLS certificates (supports multiple CAs)Core Implementation (Rust)
Configuration (
config/types.rs,lib.rs):client_identityfield to store combined certificate + private key (PEM format)ca_certificatesfield to store one or more CA certificates (PEM format)HTTP Client (server.rs):
reqwest::Identity::from_pem()add_root_certificate()Usage Example
python -m sglang_router.launch_router \ --worker-urls https://worker1:8000 https://worker2:8000 \ --client-cert-path /path/to/router-cert.pem \ --client-key-path /path/to/router-key.pem \ --ca-cert-path /path/to/ca1.pem /path/to/ca2.pemKnown Limitations
The current implementation creates a single HTTP client for all workers, which works well for
deployments where:
For multi-domain deployments (e.g., different model families with different CAs), the
architecture would need refactoring to support per-worker HTTP clients. See the detailed FIXME
comment in
server.rs:801for the required changes.Checklist