-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
1 / 11 of 1 issue completedDescription
Gateway 2.4 and 2.5 Final Releases
- MCP labeling support for built-in tools
- HA support for data synchronization
- WASM support for custom plugins
- Bug fixes for gRPC models
- Additional model support (min-m2, kimi-k2 thinking)
- Support background mode for responses API across all models and OAI router
SGLang Autonomous Model Gateway 3.0
Multimodality
- Support multimodality and image processor, preferably using PyO3 binding existing SGLang image processor
- Support both URL and raw data image content
Semantic Routing
- Support PII and classify API for classifying intent and complexity of the input
- Training new model for semantic routing
- Publish training library for customers to train on their own data
- Publish models to HuggingFace
- Support automatic routing in multi-router mode (use Candle to execute those models)
SLO-Based Routing
- Allow Gateway to actively listen to SGLang server's KV cache events to better handle routing decisions in gRPC mode
- Define SLO criteria, such as latency, accuracy, cost, and preference; define set of APIs, preferably HTTP headers to decide the best routing decision
- Allow SGLang server to start with both gRPC and HTTP server
Gateway UI
- Terminal UI which includes components such as router metrics, worker metrics, worker metadata, router metadata, and active logs
- Reactive UI to launch workers remotely; this should support both local machine and remote, with SSH as a beta feature for remote support
Message API Support
- Natively support Anthropic Message API instead of wrapping around chat completion in gRPC mode
- HTTP mode routing will fall back to wrapping around chat completion
- Natively support MCP calls and multi-turn in Anthropic Message API
- Add continuous integration test for Message API; critical model to support is M2
Build and Language Support Improvement
- Binding to Go
- Binding to Node.js
- Better organization for bindings across all three languages
- Restructure project as Cargo workspace to streamline multi-crate development and dependency management
- Publish Rust crate during CI
- Optimize build and config to leverage ccache properly
- Update Docker build for multi-architecture support
gRPC Multi-Model Gateway Support
- Introduce model card data structure to worker, which includes metadata such as tokenizer, chat template, reasoning parser, tool parser, DP size, TP size, etc.
- Add gRPC endpoint to fetch tokenizer, chat template, and remote Python code for multimodality support
- Add registry pattern to tokenizer which maps model family to tokenizer
Metrics and Observability Framework
Core Metrics Improvements
- Model-Specific Metrics
- Add TTFT (Time to First Token) tracking per model instance with labels for model_id, worker_id
- Implement token throughput metrics per model (input/output tokens per second)
- Track generation speed metrics (tokens/second) during streaming per model
OpenTelemetry Integration
- Distributed Tracing
- Integrate OpenTelemetry SDK with proper span creation and propagation
- Add trace context propagation between router and workers (W3C TraceContext)
- Implement span attributes for model_id, worker_id, request_type, batch_size
- Create custom spans for routing decisions, queue operations, and retries
- Add OTLP exporter support for Jaeger, Tempo, and other backends
Dashboard and Visualization
- Observability UI
- Create Grafana dashboard templates for standard deployments
- Add real-time metrics streaming to terminal UI
Reactions are currently unavailable