Skip to content

[docs] major SGL Model Gateway documentation update#15715

Merged
slin1237 merged 1 commit intomainfrom
smg-docs
Dec 24, 2025
Merged

[docs] major SGL Model Gateway documentation update#15715
slin1237 merged 1 commit intomainfrom
smg-docs

Conversation

@slin1237
Copy link
Collaborator

  • Rename docs/advanced_features/router.md to sgl_model_gateway.md

  • Add comprehensive documentation for new features:

    • Tokenization endpoints (/v1/tokenize, /v1/detokenize)
    • Tokenizer management APIs (/v1/tokenizers)
    • Parser endpoints (/parse/reasoning, /parse/function_call)
    • gRPC embedding support
    • Expanded metrics documentation (40+ Prometheus metrics)
    • OpenTelemetry tracing integration
  • Update sgl-model-gateway/README.md with:

    • New feature highlights for tokenization and parsing APIs
    • Detailed tokenization endpoint examples
    • Parser endpoint documentation with supported parsers
    • Expanded observability section with metric categories
    • OpenTelemetry configuration details
  • Update docs/index.rst to reference new filename

  • Add TLS (HTTPS) documentation for gateway server

    • --tls-cert-path and --tls-key-path configuration
    • rustls with ring crypto provider details
  • Add mTLS documentation for worker communication

    • --client-cert-path, --client-key-path, --ca-cert-path flags
    • Multiple CA certificate support
    • TCP keepalive configuration
  • Update both README.md and sgl_model_gateway.md with:

    • Full TLS configuration examples
    • Parameter reference tables
    • Security configuration guidance
  • Add TLS Configuration section to Configuration Reference

  • Update Table of Contents with TLS subsections

  • Security: TLS checklist and production security best practices

  • High Availability:

    • Multi-replica architecture diagram
    • Trade-offs table (radix tree, circuit breaker, rate limiting)
    • Cache hit reduction (10-20%) with multiple replicas
    • Horizontal vs vertical scaling guidance
    • Session affinity recommendations
  • Performance:

    • gRPC mode recommendation for high throughput
    • Performance tuning table with parameter recommendations
    • Benefits of native Rust tokenization
  • Kubernetes Deployment:

    • Pod labeling examples for service discovery
    • Regular and PD mode worker deployments
    • RBAC configuration for pod watching
    • PD mode with bootstrap port annotations
  • Monitoring with PromQL:

    • Request rate and latency queries
    • Worker health monitoring
    • Circuit breaker status tracking
    • Inference performance metrics (TTFT, TPOT)
    • Rate limiting and queuing metrics
    • MCP tool execution monitoring
    • Example Prometheus alerting rules

    Add comprehensive documentation for writing custom WASM middleware modules
    including authentication, rate limiting, and request logging examples.
    Documents the WIT interface, deployment process, and runtime configuration.

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added documentation Improvements or additions to documentation model-gateway labels Dec 24, 2025
@slin1237 slin1237 force-pushed the smg-docs branch 3 times, most recently from bbf0d35 to da3e991 Compare December 24, 2025 04:10
- Rename docs/advanced_features/router.md to sgl_model_gateway.md
- Add comprehensive documentation for new features:
  - Tokenization endpoints (/v1/tokenize, /v1/detokenize)
  - Tokenizer management APIs (/v1/tokenizers)
  - Parser endpoints (/parse/reasoning, /parse/function_call)
  - gRPC embedding support
  - Expanded metrics documentation (40+ Prometheus metrics)
  - OpenTelemetry tracing integration
- Update sgl-model-gateway/README.md with:
  - New feature highlights for tokenization and parsing APIs
  - Detailed tokenization endpoint examples
  - Parser endpoint documentation with supported parsers
  - Expanded observability section with metric categories
  - OpenTelemetry configuration details
- Update docs/index.rst to reference new filename

- Add TLS (HTTPS) documentation for gateway server
  - --tls-cert-path and --tls-key-path configuration
  - rustls with ring crypto provider details

- Add mTLS documentation for worker communication
  - --client-cert-path, --client-key-path, --ca-cert-path flags
  - Multiple CA certificate support
  - TCP keepalive configuration

- Update both README.md and sgl_model_gateway.md with:
  - Full TLS configuration examples
  - Parameter reference tables
  - Security configuration guidance

- Add TLS Configuration section to Configuration Reference
- Update Table of Contents with TLS subsections

Add new Production Recommendations section covering:

- Security: TLS checklist and production security best practices
- High Availability:
  - Multi-replica architecture diagram
  - Trade-offs table (radix tree, circuit breaker, rate limiting)
  - Cache hit reduction (10-20%) with multiple replicas
  - Horizontal vs vertical scaling guidance
  - Session affinity recommendations

- Performance:
  - gRPC mode recommendation for high throughput
  - Performance tuning table with parameter recommendations
  - Benefits of native Rust tokenization

- Kubernetes Deployment:
  - Pod labeling examples for service discovery
  - Regular and PD mode worker deployments
  - RBAC configuration for pod watching
  - PD mode with bootstrap port annotations

- Monitoring with PromQL:
  - Request rate and latency queries
  - Worker health monitoring
  - Circuit breaker status tracking
  - Inference performance metrics (TTFT, TPOT)
  - Rate limiting and queuing metrics
  - MCP tool execution monitoring
  - Example Prometheus alerting rules

Add comprehensive documentation for writing custom WASM middleware modules
including authentication, rate limiting, and request logging examples.
Documents the WIT interface, deployment process, and runtime configuration.

Add comprehensive documentation for the Python and Go bindings:

Python Bindings:
- Installation (development/production/PyPI)
- Basic usage with Router and RouterArgs
- CLI commands (smg launch, smg server)
- Full RouterArgs configuration reference
- PD disaggregation and K8s service discovery examples

Go Bindings:
- Two-layer architecture explanation (Go API + Rust FFI)
- Installation and build requirements
- Non-streaming and streaming usage examples
- ClientConfig and ChatCompletionRequest options
- OpenAI-compatible server example
- Testing instructions
@slin1237 slin1237 merged commit 9665574 into main Dec 24, 2025
55 of 61 checks passed
@slin1237 slin1237 deleted the smg-docs branch December 24, 2025 04:26
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation model-gateway run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments