-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Description
🚀 SGLang Model Gateway - New Release!
✨ Headline Features
⚡ Bucket Mode Routing - 20-30% Performance Boost
Introducing our new bucket-based routing algorithm that dramatically improves performance in prefix-disabled (PD) mode. See up to 20-30% improvements in TTFT (Time To First Token) and overall throughput – making your inference workloads faster and more efficient than ever!
💾 PostgreSQL Support for Chat History Management
Flexibility in data storage! We now support PostgreSQL alongside OracleDB and in-memory storage for chat history management. Choose the database solution that best fits your infrastructure and scale requirements.
🛠️ Enhanced Tool & Structured Output Support
- MinMax M2 model reasoning and function calling support
- Structured model output for OpenAI and gRPC router
- Streaming parsing with Tool Choice in chat completions API
- Tool_choice support for Responses API
- OutputItemDone events with output item array storage for better observability
🐛 Stability & Quality Improvements
Multiple bug fixes for model validation, streaming logic, reasoning content indexing, and CI stability enhancements.
🔧 Code Quality Enhancements
Refactored builders for chat and responses, restructured modules for better maintainability, and consolidated error handling.
Features
- [router] bucket policy #11719
- [router] add postgres databases data connector #12218
- [router] Support structured model output for openai and grpc router #12431
- [router][grpc] Support streaming parsing with Tool Choice in chat completions API #12677
- [router][grpc] Implement tool_choice support for Responses API #12668
- [router][grpc] Emit OutputItemDone event and store output item array #12656
- [router] minmax-m2 xml tool parser #13148
- [router] add minmax m2 reasoning parser #13137
Bug Fixes
- Revert "fix: display served_model_name in /v1/models" #13093
- [router][ci] Quick Improvement to make CI more stable #12869
- [router][ci] Fix maturin build #13012
- [router] Switch MCP tests from DeepWiki to self-hosted Brave search server #12849
- fix ci #12760
- [router][ci] Disable cache #12752
- Revert "[router] web_search_preview tool basic implementation" #12716
- Revert "[ci] fix permission" #12732
- [router][grpc] Make harmony parser checks recipient first before channel #12713
- [router][grpc] Fix index issues in reasoning content and missing streaming events #12650
- [router][grpc] Fix model validation, tool call check, streaming logic and misc in responses #12616
Enhancement
- [router][grpc] Move all error logs to their call sites #12859
- [router][grpc] Refactor: Add builders for chat and responses #12852
- [router][grpc] Add more mcp test cases to responses api #12749
- [router] add basic ci tests for gpt-oss model support #12651
- [router][quick fix] Add minimal option for reasoning effort in spec #12711
- [router][ci] speed up python binding to 1.5 min #12673
- [router][grpc] Restructure modules and code clean up #12598
- [router][grpc] Consolidate error messages build in error.rs #12301
- [router] remove worker url requirement #13172
Reactions are currently unavailable