SGLang Autonomous Model Gateway Roadmap

## Gateway 2.4 and 2.5 Final Releases
- [x] MCP labeling support for built-in tools
- [x] HA support for data synchronization
- [x] WASM support for custom plugins
- [x] Bug fixes for gRPC models
- [x] Additional model support (min-m2, kimi-k2 thinking)
- [ ] Support background mode for responses API across all models and OAI router

## SGLang Autonomous Model Gateway 3.0

### Multimodality
- [x] Support multimodality and image processor, preferably using PyO3 binding existing SGLang image processor
- [x] Support both URL and raw data image content

### Semantic Routing
- [ ] Support PII and classify API for classifying intent and complexity of the input
- [ ] Training new model for semantic routing
- [ ] Publish training library for customers to train on their own data
- [ ] Publish models to HuggingFace
- [ ] Support automatic routing in multi-router mode (use Candle to execute those models)

### SLO-Based Routing
- [ ] Allow Gateway to actively listen to SGLang server's KV cache events to better handle routing decisions in gRPC mode
- [ ] Define SLO criteria, such as latency, accuracy, cost, and preference; define set of APIs, preferably HTTP headers to decide the best routing decision
- [ ] Allow SGLang server to start with both gRPC and HTTP server

### Gateway UI
- [ ] Terminal UI which includes components such as router metrics, worker metrics, worker metadata, router metadata, and active logs
- [ ] Reactive UI to launch workers remotely; this should support both local machine and remote, with SSH as a beta feature for remote support

### Message API Support
- [ ] Natively support Anthropic Message API instead of wrapping around chat completion in gRPC mode
- [ ] HTTP mode routing will fall back to wrapping around chat completion
- [ ] Natively support MCP calls and multi-turn in Anthropic Message API
- [ ] Add continuous integration test for Message API; critical model to support is M2

### Build and Language Support Improvement
- [x] Binding to Go
- [ ] Binding to Node.js
- [x] Better organization for bindings across all three languages
- [x] Restructure project as Cargo workspace to streamline multi-crate development and dependency management
- [x] Publish Rust crate during CI
- [x] Optimize build and config to leverage ccache properly
- [x] Update Docker build for multi-architecture support

### gRPC Multi-Model Gateway Support
- [x] Introduce model card data structure to worker, which includes metadata such as tokenizer, chat template, reasoning parser, tool parser, DP size, TP size, etc.
- [x] Add gRPC endpoint to fetch tokenizer, chat template, and remote Python code for multimodality support
- [x] Add registry pattern to tokenizer which maps model family to tokenizer

## Metrics and Observability Framework

### Core Metrics Improvements
- [x] **Model-Specific Metrics**
  - [x] Add TTFT (Time to First Token) tracking per model instance with labels for model_id, worker_id
  - [x] Implement token throughput metrics per model (input/output tokens per second)
  - [x] Track generation speed metrics (tokens/second) during streaming per model

### OpenTelemetry Integration
- [x] **Distributed Tracing**
  - [x] Integrate OpenTelemetry SDK with proper span creation and propagation
  - [x] Add trace context propagation between router and workers (W3C TraceContext)
  - [x] Implement span attributes for model_id, worker_id, request_type, batch_size
  - [x] Create custom spans for routing decisions, queue operations, and retries
  - [x] Add OTLP exporter support for Jaeger, Tempo, and other backends

### Dashboard and Visualization
- [ ] **Observability UI**
  - [ ] Create Grafana dashboard templates for standard deployments
  - [ ] Add real-time metrics streaming to terminal UI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGLang Autonomous Model Gateway Roadmap #13098

Gateway 2.4 and 2.5 Final Releases

SGLang Autonomous Model Gateway 3.0

Multimodality

Semantic Routing

SLO-Based Routing

Gateway UI

Message API Support

Build and Language Support Improvement

gRPC Multi-Model Gateway Support

Metrics and Observability Framework

Core Metrics Improvements

OpenTelemetry Integration

Dashboard and Visualization

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SGLang Autonomous Model Gateway Roadmap #13098

Description

Gateway 2.4 and 2.5 Final Releases

SGLang Autonomous Model Gateway 3.0

Multimodality

Semantic Routing

SLO-Based Routing

Gateway UI

Message API Support

Build and Language Support Improvement

gRPC Multi-Model Gateway Support

Metrics and Observability Framework

Core Metrics Improvements

OpenTelemetry Integration

Dashboard and Visualization

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions