[docs] major SGL Model Gateway documentation update by slin1237 · Pull Request #15715 · sgl-project/sglang

slin1237 · 2025-12-24T02:51:32Z

Rename docs/advanced_features/router.md to sgl_model_gateway.md
Add comprehensive documentation for new features:
- Tokenization endpoints (/v1/tokenize, /v1/detokenize)
- Tokenizer management APIs (/v1/tokenizers)
- Parser endpoints (/parse/reasoning, /parse/function_call)
- gRPC embedding support
- Expanded metrics documentation (40+ Prometheus metrics)
- OpenTelemetry tracing integration
Update sgl-model-gateway/README.md with:
- New feature highlights for tokenization and parsing APIs
- Detailed tokenization endpoint examples
- Parser endpoint documentation with supported parsers
- Expanded observability section with metric categories
- OpenTelemetry configuration details
Update docs/index.rst to reference new filename
Add TLS (HTTPS) documentation for gateway server
- --tls-cert-path and --tls-key-path configuration
- rustls with ring crypto provider details
Add mTLS documentation for worker communication
- --client-cert-path, --client-key-path, --ca-cert-path flags
- Multiple CA certificate support
- TCP keepalive configuration
Update both README.md and sgl_model_gateway.md with:
- Full TLS configuration examples
- Parameter reference tables
- Security configuration guidance
Add TLS Configuration section to Configuration Reference
Update Table of Contents with TLS subsections
Security: TLS checklist and production security best practices
High Availability:
- Multi-replica architecture diagram
- Trade-offs table (radix tree, circuit breaker, rate limiting)
- Cache hit reduction (10-20%) with multiple replicas
- Horizontal vs vertical scaling guidance
- Session affinity recommendations
Performance:
- gRPC mode recommendation for high throughput
- Performance tuning table with parameter recommendations
- Benefits of native Rust tokenization
Kubernetes Deployment:
- Pod labeling examples for service discovery
- Regular and PD mode worker deployments
- RBAC configuration for pod watching
- PD mode with bootstrap port annotations
Monitoring with PromQL:
- Request rate and latency queries
- Worker health monitoring
- Circuit breaker status tracking
- Inference performance metrics (TTFT, TPOT)
- Rate limiting and queuing metrics
- MCP tool execution monitoring
- Example Prometheus alerting rules
Add comprehensive documentation for writing custom WASM middleware modules
including authentication, rate limiting, and request logging examples.
Documents the WIT interface, deployment process, and runtime configuration.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-24T02:51:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Rename docs/advanced_features/router.md to sgl_model_gateway.md - Add comprehensive documentation for new features: - Tokenization endpoints (/v1/tokenize, /v1/detokenize) - Tokenizer management APIs (/v1/tokenizers) - Parser endpoints (/parse/reasoning, /parse/function_call) - gRPC embedding support - Expanded metrics documentation (40+ Prometheus metrics) - OpenTelemetry tracing integration - Update sgl-model-gateway/README.md with: - New feature highlights for tokenization and parsing APIs - Detailed tokenization endpoint examples - Parser endpoint documentation with supported parsers - Expanded observability section with metric categories - OpenTelemetry configuration details - Update docs/index.rst to reference new filename - Add TLS (HTTPS) documentation for gateway server - --tls-cert-path and --tls-key-path configuration - rustls with ring crypto provider details - Add mTLS documentation for worker communication - --client-cert-path, --client-key-path, --ca-cert-path flags - Multiple CA certificate support - TCP keepalive configuration - Update both README.md and sgl_model_gateway.md with: - Full TLS configuration examples - Parameter reference tables - Security configuration guidance - Add TLS Configuration section to Configuration Reference - Update Table of Contents with TLS subsections Add new Production Recommendations section covering: - Security: TLS checklist and production security best practices - High Availability: - Multi-replica architecture diagram - Trade-offs table (radix tree, circuit breaker, rate limiting) - Cache hit reduction (10-20%) with multiple replicas - Horizontal vs vertical scaling guidance - Session affinity recommendations - Performance: - gRPC mode recommendation for high throughput - Performance tuning table with parameter recommendations - Benefits of native Rust tokenization - Kubernetes Deployment: - Pod labeling examples for service discovery - Regular and PD mode worker deployments - RBAC configuration for pod watching - PD mode with bootstrap port annotations - Monitoring with PromQL: - Request rate and latency queries - Worker health monitoring - Circuit breaker status tracking - Inference performance metrics (TTFT, TPOT) - Rate limiting and queuing metrics - MCP tool execution monitoring - Example Prometheus alerting rules Add comprehensive documentation for writing custom WASM middleware modules including authentication, rate limiting, and request logging examples. Documents the WIT interface, deployment process, and runtime configuration. Add comprehensive documentation for the Python and Go bindings: Python Bindings: - Installation (development/production/PyPI) - Basic usage with Router and RouterArgs - CLI commands (smg launch, smg server) - Full RouterArgs configuration reference - PD disaggregation and K8s service discovery examples Go Bindings: - Two-layer architecture explanation (Go API + Rust FFI) - Installation and build requirements - Non-streaming and streaming usage examples - ClientConfig and ChatCompletionRequest options - OpenAI-compatible server example - Testing instructions

slin1237 requested a review from CatherineSue as a code owner December 24, 2025 02:51

github-actions bot added documentation Improvements or additions to documentation model-gateway labels Dec 24, 2025

slin1237 added the run-ci label Dec 24, 2025

slin1237 force-pushed the smg-docs branch 3 times, most recently from bbf0d35 to da3e991 Compare December 24, 2025 04:10

slin1237 force-pushed the smg-docs branch from da3e991 to a870c7e Compare December 24, 2025 04:23

slin1237 merged commit 9665574 into main Dec 24, 2025
55 of 61 checks passed

slin1237 deleted the smg-docs branch December 24, 2025 04:26

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[docs] major SGL Model Gateway documentation update (sgl-project#15715)

686fc16

GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026

[docs] major SGL Model Gateway documentation update (sgl-project#15715)

55e2349

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] major SGL Model Gateway documentation update#15715

[docs] major SGL Model Gateway documentation update#15715
slin1237 merged 1 commit intomainfrom
smg-docs

slin1237 commented Dec 24, 2025

Uh oh!

gemini-code-assist bot commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

slin1237 commented Dec 24, 2025

Checklist

Uh oh!

gemini-code-assist bot commented Dec 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments