Skip to content

[diffusion] CI: use unified sampling_params for CI#14045

Merged
mickqian merged 4 commits intosgl-project:mainfrom
mickqian:flux-2
Dec 10, 2025
Merged

[diffusion] CI: use unified sampling_params for CI#14045
mickqian merged 4 commits intosgl-project:mainfrom
mickqian:flux-2

Conversation

@mickqian
Copy link
Collaborator

@mickqian mickqian commented Nov 27, 2025

Motivation

Modifications

  1. unify and improve sampling params for CI
  2. clean duplicated cli tests

Accuracy Tests

Benchmarking and Profiling

Checklist

@mickqian mickqian requested a review from yhyang201 November 27, 2025 07:04
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the diffusion SGLang Diffusion label Nov 27, 2025
@mickqian mickqian changed the title Flux 2 [diffusion] CI: improve sampling_params for CI Nov 27, 2025
@mickqian
Copy link
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian changed the title [diffusion] CI: improve sampling_params for CI [diffusion] CI: use unified sampling_params for CI Dec 10, 2025
@mickqian mickqian merged commit d659873 into sgl-project:main Dec 10, 2025
76 of 79 checks passed
Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 17, 2025
GuoYechang pushed a commit to GuoYechang/sglang that referenced this pull request Jan 13, 2026
ZhengdQin added a commit to ZhengdQin/sglang that referenced this pull request Jan 16, 2026
* [model-gateway] extract conversation out of oai router (sgl-project#14440)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [DeepseekV3.2][NSA][Indexer] Fix PAGED top-k transform for NSA indexer chunked execution on H200 (sgl-project#14325)

* [model-gateway] move oai header util to router header util (sgl-project#14441)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [FIX] trtllm-moe-fp4-renorm for Qwen series models (sgl-project#14350)

* add doc for quantized kv cache (sgl-project#14348)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

* fix: Correct environment variable syntax in docker-compose configuration (sgl-project#8287)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [model-gateway] move all responses api event from oai to proto (sgl-project#14446)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [model-gateway] add mistral 3 image processor (sgl-project#14445)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* [model-gateway] grpc to leverage event type (sgl-project#14450)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* ministral3 (sgl-project#14251)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>

* [Bug] fix not desired disable fused share experts caused by rocm logic (sgl-project#14432)

* Rename secrets.WHL_TOKEN -> secrets.GH_PAT_FOR_WHL_RELEASE (sgl-project#14421)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* [diffusion] improve: further optimize model load (sgl-project#13836)

* Add CI permissions for user 'yushengsu-thu' (sgl-project#14468)

* [ez] Fix typing (sgl-project#14473)

* Add AMD stage support to /rerun-stage command and fix related bugs (sgl-project#14463)

* Add YAMY1234 to CI Permission (sgl-project#14475)

* clean up gemlite usage (sgl-project#14444)

* [diffusion] chore: further improve model searching logic (sgl-project#14484)

* [diffusion] fix: fix bug about pin memory when offloading (sgl-project#14472)

* [diffusion] cli: add argument --adjust-frames and --override-protected-fields (sgl-project#13996)

Co-authored-by: dev <devnull@example.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* dockerfile: add runtime stage + ubuntu 24.04 (sgl-project#13861)

* [diffusion] fix: fix CLIP text encoder attention mask not used (sgl-project#14364)

Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* Enable RadixCache for Mamba2 models (sgl-project#13584)

* [diffusion] fix: Fix profiler trace missing Python stack in diffusion pipeline (sgl-project#14499)

* support GLM-V vision model dp  (sgl-project#14097)

* [misc] add model arch and type to server info and use it for harmony (sgl-project#14456)

* Add Mistral Large 3 Eagle Support (sgl-project#14466)

Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>

* Add Mistral Large 3 to nightly CI tests (sgl-project#14459)

* [diffusion] chore: set allowing overriding protected fields of sampling params as default behavior (sgl-project#14471)

* [model-gateway] move conversation to first class routing (sgl-project#14506)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [Spec] Mamba2 support in target models (sgl-project#13434)

* [diffusion] feat: support cache-dit integration (sgl-project#14234)

Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: DefTruth <qiustudent_r@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* Add fused FP8 KV cache write kernel for TRTLLM MHA backend (sgl-project#14093)

Co-authored-by: Qiaolin Yu <liin1211@outlook.com>

* [model-gateway] Add WASM support for middleware (sgl-project#12471)

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>

* [model-gateway] reorganized conversation handler (sgl-project#14507)

Co-authored-by: key4ng <rukeyang@gmail.com>

* tiny remove deprecated endpoint call (sgl-project#13607)

* [model-gateway] fix server info comment (sgl-project#14508)

* Add Mistral Large 3 basic test to PR CI (sgl-project#14460)

* Fix removing worker will make it healthy forever in prometheus metrics (sgl-project#14420)

* [model-gateway] Make Tokenizer Builder Aware of Env Vars Like HF_ENDPOINT (sgl-project#14405)

* [model-gateway] change sgl-router to sgl-model-gateway (sgl-project#14312)

* [model-gateway] fix left over sgl-router names to sgl-model-gateway (sgl-project#14512)

* [model-gateway] fix logs in smg workflow (sgl-project#14513)

* [model-gateway] fix left over sgl-router names in wasm (sgl-project#14514)

* [model-gateway] fix code owner for wasm (sgl-project#14516)

* chore: bump sgl-kernel version to 0.3.18.post3 (sgl-project#14427)

Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>

* Tiny use trtllm_mha as default when possible (sgl-project#14291)

* [Docs] Add /rerun-stage command to contribution guide (sgl-project#14521)

* Fix safetensors validation to catch corruption after download (sgl-project#14465)

* [CODEOWNER] update codeowner for qwen3-next related (sgl-project#14522)

* fix: fix rmsnorm -> layernorm in qwen3 omni (sgl-project#11791)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* [diffusion] chore: temporarily upgrade diffusers to make Z-image compatible with Cache-DiT (sgl-project#14530)

* [bug] fix notebook to include new keys from model_info (sgl-project#14528)

* Revise DP Multi-Modal Encoder Document (sgl-project#14290)

* [CPU] add mamba fla kernels for Qwen3-next (sgl-project#12324)

* Revert "tiny remove deprecated endpoint call" (sgl-project#14533)

* support mtp with deepseek r1 nvfp4 model (sgl-project#13115)

Co-authored-by: Trevor Morris <tmorris@nvidia.com>

* [diffusion] refactor: simplify sampling params' override logic (sgl-project#14539)

* [diffusion] perf: add QKV fusion optimization for Flux models (sgl-project#14505)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [model-gateway][tracing]: implement request tracing using OpenTelemetry with trace context propagation (HTTP) (sgl-project#13897)

* [diffusion] lora: fix LoRA dtype handling and weight attribute access for z-image model (sgl-project#14543)

Co-authored-by: niehen6174 <nihen6174@gmail.com>

* fix "GrammarMatcher has terminated after accepting the stop token, but is trying to find the next token mask" when both reasoning and spec are enabled (sgl-project#14464)

* [1/n] Fix hanging during DeepGemm Warmup (sgl-project#14493)

* [Bug fix] Add /model_info endpoint to mini_lb (sgl-project#14535)

* [Qwen3-next] remove heuristics and add radix cache kl test (sgl-project#14520)

* [Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (sgl-project#14538)

* chore: bump sgl-kernel version to 0.3.18.post3 (sgl-project#14518)

* Update CI_PERMISSIONS.json (sgl-project#14552)

* Update DeepSeek V3 docs to use B200 (sgl-project#14447)

* [Doc] Add short explanation on page size (sgl-project#14557)

* [docs] Add missing word in argument description (sgl-project#14205)

* support piecewise cuda graph for Olmo models (sgl-project#14476)

* Enhance prefill PP node robustness (sgl-project#14494)

* DOC update nemo-skills in docs (sgl-project#14555)

Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* remove unecessary dual stream token threshold from the rest of models (qwen moe, kimi linear, etc.) (sgl-project#14337)

* feat(ci): add framework target to release-docker workflows (sgl-project#14559)

* Fix attention backend logic for Qwen3-Next on SM100 (sgl-project#14560)

* [FLA] Add explicit kernel arguments to kda.py for Kimi Linear support (sgl-project#14561)

* Add CUDA kernel size analysis tool for sgl-kernel optimization (sgl-project#14544)

* [DLLM] feat: Add threshold based parallel decoding support (sgl-project#14412)

Co-authored-by: Jinwei Yao <jinweiy@illinois.edu>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>

* Add unit-test-backend-8-gpu-b200 to rerun-stage command (sgl-project#14569)

* [apply][2/2] Fused qk_norm_rope for Qwen3-MoE (sgl-project#13998)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* Add Expert Parallelism (EP) support for kimi-k2-thinking (sgl-project#13725)

* Tiny remove wrong import from `python.sglang` (sgl-project#14577)

* Add small model test for spec v2 + dp + trtllm_mla (sgl-project#14576)

* [diffusion] cli: profiling utilities support (sgl-project#14185)

Co-authored-by: jianyingzhu <53300651@qq.com>
Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [NPU]LoRA: Adding Torch Native backend (sgl-project#14132)

* [BugFix] fix prefixcache performance and accuracy on ascend (sgl-project#13573)

* Fix FP8 KV Triton type issue and add regression test (sgl-project#14553)

* Rename TensorRT Model Optimizer to Model Optimizer (sgl-project#14455)

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

* [CI] Tiny speed up VLM CI (sgl-project#14517)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* [Minor] Temporarily skipping deepep large mtp test (sgl-project#14586)

* [model-gateway] extra accumulator and tool handler in oai router (sgl-project#14587)

* [model-gateway] Fixed WASM Security Vulnerability - Execution Timeout (sgl-project#14588)

* [model-gateway] reorganize metrics, logging, and otel to its own module (sgl-project#14590)

* Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (sgl-project#14141)

* [CI]Unblock and split spec v2+dp test (sgl-project#14551)

* [Tool Call] Fix DeepSeekV32Detector skipping functions with no params in streaming mode (sgl-project#14573)

* [feat] use cachebuffer to store mm feature to speedup hash (sgl-project#14386)

* [CI] Fix unit-test-backend-8-gpu-b200 running on every /rerun-stage (sgl-project#14591)

* [model-gateway] fix WASM memory limit per module (sgl-project#14600)

* Tiny fix missing policy decision recording (sgl-project#14605)

* Super tiny remove unneeded policy flag (sgl-project#14608)

* [model-gateway] refactor otel to be more efficient (sgl-project#14604)

* Super tiny remove unused select_worker_pair (sgl-project#14609)

* [model-gateway] fix WASM unbounded request/response body read vuln (sgl-project#14612)

* [2/2] Add rope kernel in sgl-kernel  (sgl-project#14452)

* [DLLM] Add initial cuda graph support (sgl-project#14203)

* Super tiny fix unused code in router (sgl-project#14618)

* [Glm46v] Bug fix for accuracy drop and unable to launch server (sgl-project#14585)

Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>

* Fix amd rope definition (sgl-project#14556)

* modify the sgl-kernel to be compatible with transformers 5.x. (sgl-project#14625)

* [Reasoning + Structured Output] make reasoning compatible with structured output (sgl-project#12551)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

* [diffusion] feat: add support for LoRA layers in transformer_2 within LoRAPipeline (sgl-project#14606)

* chore: bump sgl-kernel version to 0.3.19 (sgl-project#14632)

* [cpu] Implement all gather/reduce for arm64 cpu (sgl-project#12527)

* [diffusion] chore: further refine output resolution adjustment logic (sgl-project#14558)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix dp-aware incompatible with service-discovery (sgl-project#14629)

* update transformers package version to 5.0.0rc0 (sgl-project#14356)

* chore: bump sgl-kernel version to 0.3.19 (sgl-project#14649)

* chore: bump SGLang version to 0.5.6.post1 (sgl-project#14651)

* [AMD] change fused rms quant interface for aiter upgrade (sgl-project#14497)

* [model-gateway] reducing cpu overhead in various of places (sgl-project#14658)

* [model-gateway] reduce cpu overhead in grpc router (sgl-project#14663)

* [model-gateway] fix WASM arbitrary file read security vol (sgl-project#14664)

* vlm: Use fa3 as the default backend for qwen3 vl (sgl-project#14634)

* [model-gateway] Optimize memory usage in HTTP router (sgl-project#14667)

* fix: use .get() when accessing strict mem-check env variable (sgl-project#14657)

* improve default glm mtp setting (sgl-project#14457)

Signed-off-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* Fix cache-aware router should pick min load instead of min tenant size (sgl-project#14650)

* Bump up diffusers to latest official release version (sgl-project#14670)

* [model-gateway] add OTEL integration to grpc router (sgl-project#14671)

* [CI] Increase max-parallel to 15 for high priority PRs (sgl-project#14675)

* [HiCache] fix condition check when use decode offload (sgl-project#14489)

* [RadixTree] Optimize the Time Complexity of Node Retrieval Operation from O(n*m) to O(n) (sgl-project#13334)

Signed-off-by: CLFutureX <chenyongqyl@163.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>

* Tiny support printing requests in bench_serving for observability (sgl-project#14652)

* Aiter fp8 kv cache (sgl-project#13147)

* [SMG]feat: implement TokenGuardBody for managing token return (sgl-project#14653)

* [NPU] chore: bump basic software version to 8.3.rc2 (sgl-project#14614)

* [CI] Unblock gb200 cutedsl test (sgl-project#14469)

* Add ffmpeg into sglang docker - required by transformers multimodal V… (sgl-project#14679)

* [Bugfix] Fix KeyError for Mistral-Large-3 rope_scaling config (sgl-project#14627)

* Tiny support sgl-router http response status code metrics (sgl-project#14689)

* [CI] Migrate Eagle 1-GPU tests to test/registered/ (sgl-project#14529)

* Revert "[Bug] fix not desired disable fused share experts caused by r… (sgl-project#14676)

* Add per-request decode tp size (sgl-project#14678)

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* [ci][smg] fix docker release ci and add it to pr test (sgl-project#14683)

* Tiny extract select_worker_min_load (sgl-project#14648)

* Fix dp-aware incompatible with completions and chat completions APIs (sgl-project#14647)

* [CI] Fix Llama 3.1 8B FP4 CI (sgl-project#14699)

* fix: make override DeepseekV2Model work (sgl-project#14707)

* chore: add code owners for deepseek_v2.py (sgl-project#14714)

* [CI] Move mistral large 3 basic to nightly (sgl-project#14622)

* fix the deepep 8 gpu unit test (sgl-project#14601)

* Add fuse_marlin_moe test to ci and add new ep test (sgl-project#14686)

* [Bugfix] Fix environ error in scheduler_runtime_checker_mixin.py (sgl-project#14461)

Signed-off-by: Kun(llfl) <i@imux.top>

* [Feat] Add received_time in serving_base (sgl-project#13432)

Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>

* fix: prevent HugginqFace access when SGLANG_USE_MODELSCOPE is enabled (sgl-project#12039)

* [Test] Skip STANDALONE speculative decoding tests for different hidden sizes (sgl-project#14733)

* [diffusion] feat: support comparing batch perf (sgl-project#14738)

Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* Revert "[Feat] Add received_time in serving_base" (sgl-project#14743)

* [Model] Add PaddleOCR-VL Model Support (sgl-project#12953)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* fix rope parameter initialization error caused by transformers v5.0 update (sgl-project#14745)

* [model-gateway] optimize core modules (sgl-project#14751)

* [SMG] perf: optimize tokenizer for reduced CPU and memory overhead (sgl-project#14752)

* Add FP8 Blockwise GEMM Backend Flag `--fp8-gemm-backend` (sgl-project#14379)

* fix: checking if tokenizer is in cache before downloading from HF (sgl-project#14698)

* fix: making rate limit a warning instead of error  (sgl-project#14753)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* move multi-item scoring functions in tokenizer manager into a separate file (sgl-project#14740)

* Improve CI by trying a warmup before unit tests (sgl-project#14669)

* [Perf] Optimize radix tree for cache-aware load balancin (sgl-project#14758)

* [Feature] Add LoRA support for embedding layers (sgl-project#14177)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Beichen-Ma <bm685@cornell.edu>

* [model-gateway] release gateway 0.2.4 (sgl-project#14763)

* [ci]: Enable the new hf API (sgl-project#14687)

* Re-add the API serving timing metrics. (sgl-project#14744)

Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: zhanghaotong <zhanghaotong.zht@antgroup.com>

* fix: adding rate limit warning at verify token permission stage (sgl-project#14756)

* Disable 8-gpu-b200 runner in PR tests (sgl-project#14768)

* [fix] Fix issues for in-flight weight updates (sgl-project#14064)

Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>

* [Auto Sync] Update data_parallel_controller.py, detokenizer... (20251209) (sgl-project#14759)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: race condition between validation and download locks (sgl-project#14761)

* Fix VLM accuracy thresholds for nightly tests (sgl-project#14777)

* fix server args bug (sgl-project#14725)

* handling incomplete rope_scaling config ci after transformers upgrade (sgl-project#14784)

* fix b200 ci (sgl-project#14786)

* [RL] support weight reload for low-bit rollout (sgl-project#9650)

Co-authored-by: Hecate0821 <hec4te0821@gmail.com>
Co-authored-by: eternally-z <zzywzj@gmail.com>
Co-authored-by: Wilboludriver <wilbolu@outlook.com>
Co-authored-by: Wilbolu <81792854+Wilboludriver@users.noreply.github.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>

* fix: add missing logic for SGLANG_USE_MODELSCOPE variable (sgl-project#14794)

* fix b200 fa4 ci (sgl-project#14788)

* [diffusion] profile: early exit when enough steps are captured to reduce the size of the trace file (sgl-project#14803)

* [GLM-4.6V] Support Pipeline Parallelism for GLM-4.6V & GLM-4.1V (sgl-project#14720)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [diffusion] CI: Add LoRA support to diffusion server configuration and test cases (sgl-project#14697)

* Revert "fix: checking if tokenizer is in cache before downloading from HF" (sgl-project#14808)

* [diffusion] performance: refactor diffusion fuse qkv and apply to qwen-image (sgl-project#14793)

* [SMG-GO] implement a Go SGLang Model Gateway - OpenAI Compatible API Server (sgl-project#14770)

* [model-gateway] Dynamically Populate Tool Call Parser Choices (sgl-project#14807)

* Support HTTP response status code prometheus metrics (sgl-project#14710)

* Fix router keep nonzero metrics after worker is deleted (sgl-project#14819)

* Tiny fix incorrect worker removal command (sgl-project#14822)

* [NPU] bug fix for mtp and w4a8 (sgl-project#14806)

* [CI] fix UT success check in `test_eagle_infer_beta_dp_attention.py` (sgl-project#14831)

* Fix CI registry scan to only check test/registered directory (sgl-project#14812)

* [model-gateway] add anthropic message api spec (sgl-project#14834)

* [diffusion] doc: fix tiny typo in multimodal_gen/README.md (sgl-project#14830)

* [model-gateway] support customizing Prometheus duration buckets (sgl-project#14716)

* [model-gateway] support engine response http status statistics in router (sgl-project#14712)

* [CI] Reduce stage-b auto-partition from 4 to 2 (sgl-project#14769)

Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

* Apply back moe_sum_reduce for fused_marlin_moe (sgl-project#14829)

* [diffusion] parallel: pad tokens for video models under sp (sgl-project#14833)

* [diffusion] CI: use unified sampling_params for CI (sgl-project#14045)

* [Auto Sync] Update tool_chat_template_deepseekv31.jinja (20251210) (sgl-project#14837)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jue Wang <zjuwangjue@gmail.com>

* Revert transformers to 4.57.1 (sgl-project#14801)

* [model-gateway] Fix incompatible metric comparison in` PowerOfTwo` policy (sgl-project#14823)

* [bugfix] qwen25-VL support lora (sgl-project#14638)

* fix lora target all + csgmv backend (sgl-project#14796)

* [model-gateway] adds default implementations to RouterTrait in mod.rs (sgl-project#14841)

* [AMD] Add model to AMD nightly test (sgl-project#14442)

* Treat unittest SkipTest exception as pass instead of as failure (sgl-project#14847)

* [model-gateway] code clean up on oai router (sgl-project#14850)

* [model-gateway] fix import order in oai conversation (sgl-project#14851)

* fix fp8 gemm nightly CI (sgl-project#14844)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* fix: restrict cache validation behaviors to CI only (sgl-project#14849)

* Fix CUDA version handling in ci_install_deepep.sh (sgl-project#14854)

* Fix TestGLM41VPPAccuracy test flakiness (sgl-project#14848)

* Minor code style fix for dllm (sgl-project#14836)

* Enable TP for Mamba-based models (sgl-project#14811)

Signed-off-by: Roi Koren <roik@nvidia.com>

* [CI] Temp disable gb200 test (sgl-project#14865)

* Refactor Marlin MoeRunner (sgl-project#14554)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>

* [6/n] Fix `num_token_non_padded` computation in prefill (sgl-project#14313)

Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Runkai Tao <rt572@physics.rutger.edu>

* Remove myself to test CI gate issue (sgl-project#14871)

* fix: creating blobs only once for publish trace retries (sgl-project#14845)

* Move and update MindSpore docs, make it appear on the online documentation (sgl-project#14861)

Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix nightly vlm ci : restore original eval for requests without regex (sgl-project#14875)

* Only count limitations for previous runs  that reaches the test stages (sgl-project#14856)

* [CI][BUG] fix ib setup for disaggregation hicache test (sgl-project#14877)

Signed-off-by: lukotong-7 <shicanwei.scw@alibaba-inc.com>

* [Fix] Remove unused import from test_disaggregation_hicache.py (sgl-project#14880)

* fix: adding temporary bypass for nightly tests (sgl-project#14876)

* Avoid deleting entire cache for missing shards (sgl-project#14754 follow-up) (sgl-project#14853)

* Tiny add more error info for bench_serving (sgl-project#14827)

* Tiny support range ratio in GSP in bench serving (sgl-project#14828)

* [diffusion] feat: enable torch compile to eliminate GPU bubble (sgl-project#13641)

Co-authored-by: jianyingzhu <53300651@qq.com>
Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com>
Co-authored-by: root <root@2u2g-spr-0417.ipp4a1.colossus.nvidia.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* [NPU] adapt dsv3.2 nsa prefill context parallel (sgl-project#14541)

* [diffusion] feat: support sageattn & sageattn3 backend (sgl-project#14878)

* dsv32 multistream opt

* clean code

* delete renormalize in topk

* dsv32 use batch_matmul_transpose in MTP

* modify comment

* Support dynamic w8a8

* dsv3 support ascend_fuseep

* rebase modify

---------

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Signed-off-by: CLFutureX <chenyongqyl@163.com>
Signed-off-by: Kun(llfl) <i@imux.top>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Roi Koren <roik@nvidia.com>
Signed-off-by: lukotong-7 <shicanwei.scw@alibaba-inc.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Co-authored-by: key4ng <rukeyang@gmail.com>
Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com>
Co-authored-by: Sam <lsam@nvidia.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Co-authored-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>
Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: zyksir <zhuyikai.zyk@gmail.com>
Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com>
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai>
Co-authored-by: dev <devnull@example.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: WenhaoZhang <42087078+niehen6174@users.noreply.github.com>
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: roikoren755 <26850796+roikoren755@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: blahblah <28567807+Brain97@users.noreply.github.com>
Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: DefTruth <qiustudent_r@163.com>
Co-authored-by: Hudson Xing <77495133+harvenstar@users.noreply.github.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: Tony Lu <tonylu@linux.alibaba.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: Wenyi Xu <wenyixu101@gmail.com>
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
Co-authored-by: Rain Jiang <96632942+rainj-me@users.noreply.github.com>
Co-authored-by: Trevor Morris <tmorris@nvidia.com>
Co-authored-by: Feng Su <sufeng@linux.alibaba.com>
Co-authored-by: niehen6174 <nihen6174@gmail.com>
Co-authored-by: gongwei-130 <56567052+gongwei-130@users.noreply.github.com>
Co-authored-by: harrisonlimh <97203667+harrisonlimh@users.noreply.github.com>
Co-authored-by: Lee Nau <lnau@nvidia.com>
Co-authored-by: almaslof <187766901+almaslof@users.noreply.github.com>
Co-authored-by: Rain H <2510421000@qq.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Chen1022 <jincong.cjc@ant-intl.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
Co-authored-by: Jinwei Yao <jinweiy@illinois.edu>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: AichenF <aichenf@nvidia.com>
Co-authored-by: jianyingzhu <53300651@qq.com>
Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com>
Co-authored-by: Vladimir Serov <serov.vladimir.zser@gmail.com>
Co-authored-by: khalilzhk <khalilzhk@gmail.com>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: wentx <3843588+momaek@users.noreply.github.com>
Co-authored-by: Nicholas <45984215+liusy58@users.noreply.github.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Prozac614 <dwt614707404@163.com>
Co-authored-by: Yibo Cai <yibo.cai@arm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com>
Co-authored-by: PiteXChen <44110731+CLFutureX@users.noreply.github.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>
Co-authored-by: Jimmy <29097382+jimmy-evo@users.noreply.github.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
Co-authored-by: kun-llfl <i@imux.top>
Co-authored-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: yrk111222 <2493404415@qq.com>
Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com>
Co-authored-by: Douglas Yang <dyang@college.harvard.edu>
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
Co-authored-by: Beichen-Ma <bm685@cornell.edu>
Co-authored-by: MingxuZh <109504044+MingxuZh@users.noreply.github.com>
Co-authored-by: ShawnY112358 <61113840+ShawnY112358@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: TomerBN-Nvidia <tbarnatan@nvidia.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: Hecate0821 <hec4te0821@gmail.com>
Co-authored-by: eternally-z <zzywzj@gmail.com>
Co-authored-by: Wilboludriver <wilbolu@outlook.com>
Co-authored-by: Wilbolu <81792854+Wilboludriver@users.noreply.github.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
Co-authored-by: liupeng374 <liupeng374@huawei.com>
Co-authored-by: Li Jinliang <975761915@qq.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Jue Wang <zjuwangjue@gmail.com>
Co-authored-by: Praneth Paruchuri <pranethparuchuri@gmail.com>
Co-authored-by: Siyuan Chen <41201609+SYChen123@users.noreply.github.com>
Co-authored-by: michael-amd <Michael.Zhang@amd.com>
Co-authored-by: Trang Do <200224632+trangdough@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: yuchengz816-bot <yuchengz816@gmail.com>
Co-authored-by: Runkai Tao <rt572@physics.rutger.edu>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Tiance Wang <wangtiance@gmail.com>
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: shicanwei.scw <shicanwei.scw@alibaba-inc.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: root <root@2u2g-spr-0417.ipp4a1.colossus.nvidia.com>
Co-authored-by: liupeng374 <782420244@qq.com>
ZhengdQin added a commit to ZhengdQin/sglang that referenced this pull request Jan 16, 2026
* [model-gateway] extract conversation out of oai router (sgl-project#14440)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [DeepseekV3.2][NSA][Indexer] Fix PAGED top-k transform for NSA indexer chunked execution on H200 (sgl-project#14325)

* [model-gateway] move oai header util to router header util (sgl-project#14441)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [FIX] trtllm-moe-fp4-renorm for Qwen series models (sgl-project#14350)

* add doc for quantized kv cache (sgl-project#14348)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

* fix: Correct environment variable syntax in docker-compose configuration (sgl-project#8287)

Signed-off-by: Kay Yan <kay.yan@daocloud.io>

* [model-gateway] move all responses api event from oai to proto (sgl-project#14446)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [model-gateway] add mistral 3 image processor (sgl-project#14445)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* [model-gateway] grpc to leverage event type (sgl-project#14450)

Co-authored-by: Chang Su <chang.s.su@oracle.com>

* ministral3 (sgl-project#14251)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>

* [Bug] fix not desired disable fused share experts caused by rocm logic (sgl-project#14432)

* Rename secrets.WHL_TOKEN -> secrets.GH_PAT_FOR_WHL_RELEASE (sgl-project#14421)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* [diffusion] improve: further optimize model load (sgl-project#13836)

* Add CI permissions for user 'yushengsu-thu' (sgl-project#14468)

* [ez] Fix typing (sgl-project#14473)

* Add AMD stage support to /rerun-stage command and fix related bugs (sgl-project#14463)

* Add YAMY1234 to CI Permission (sgl-project#14475)

* clean up gemlite usage (sgl-project#14444)

* [diffusion] chore: further improve model searching logic (sgl-project#14484)

* [diffusion] fix: fix bug about pin memory when offloading (sgl-project#14472)

* [diffusion] cli: add argument --adjust-frames and --override-protected-fields (sgl-project#13996)

Co-authored-by: dev <devnull@example.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* dockerfile: add runtime stage + ubuntu 24.04 (sgl-project#13861)

* [diffusion] fix: fix CLIP text encoder attention mask not used (sgl-project#14364)

Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* Enable RadixCache for Mamba2 models (sgl-project#13584)

* [diffusion] fix: Fix profiler trace missing Python stack in diffusion pipeline (sgl-project#14499)

* support GLM-V vision model dp  (sgl-project#14097)

* [misc] add model arch and type to server info and use it for harmony (sgl-project#14456)

* Add Mistral Large 3 Eagle Support (sgl-project#14466)

Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>

* Add Mistral Large 3 to nightly CI tests (sgl-project#14459)

* [diffusion] chore: set allowing overriding protected fields of sampling params as default behavior (sgl-project#14471)

* [model-gateway] move conversation to first class routing (sgl-project#14506)

Co-authored-by: key4ng <rukeyang@gmail.com>

* [Spec] Mamba2 support in target models (sgl-project#13434)

* [diffusion] feat: support cache-dit integration (sgl-project#14234)

Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: DefTruth <qiustudent_r@163.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* Add fused FP8 KV cache write kernel for TRTLLM MHA backend (sgl-project#14093)

Co-authored-by: Qiaolin Yu <liin1211@outlook.com>

* [model-gateway] Add WASM support for middleware (sgl-project#12471)

Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>

* [model-gateway] reorganized conversation handler (sgl-project#14507)

Co-authored-by: key4ng <rukeyang@gmail.com>

* tiny remove deprecated endpoint call (sgl-project#13607)

* [model-gateway] fix server info comment (sgl-project#14508)

* Add Mistral Large 3 basic test to PR CI (sgl-project#14460)

* Fix removing worker will make it healthy forever in prometheus metrics (sgl-project#14420)

* [model-gateway] Make Tokenizer Builder Aware of Env Vars Like HF_ENDPOINT (sgl-project#14405)

* [model-gateway] change sgl-router to sgl-model-gateway (sgl-project#14312)

* [model-gateway] fix left over sgl-router names to sgl-model-gateway (sgl-project#14512)

* [model-gateway] fix logs in smg workflow (sgl-project#14513)

* [model-gateway] fix left over sgl-router names in wasm (sgl-project#14514)

* [model-gateway] fix code owner for wasm (sgl-project#14516)

* chore: bump sgl-kernel version to 0.3.18.post3 (sgl-project#14427)

Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>

* Tiny use trtllm_mha as default when possible (sgl-project#14291)

* [Docs] Add /rerun-stage command to contribution guide (sgl-project#14521)

* Fix safetensors validation to catch corruption after download (sgl-project#14465)

* [CODEOWNER] update codeowner for qwen3-next related (sgl-project#14522)

* fix: fix rmsnorm -> layernorm in qwen3 omni (sgl-project#11791)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* [diffusion] chore: temporarily upgrade diffusers to make Z-image compatible with Cache-DiT (sgl-project#14530)

* [bug] fix notebook to include new keys from model_info (sgl-project#14528)

* Revise DP Multi-Modal Encoder Document (sgl-project#14290)

* [CPU] add mamba fla kernels for Qwen3-next (sgl-project#12324)

* Revert "tiny remove deprecated endpoint call" (sgl-project#14533)

* support mtp with deepseek r1 nvfp4 model (sgl-project#13115)

Co-authored-by: Trevor Morris <tmorris@nvidia.com>

* [diffusion] refactor: simplify sampling params' override logic (sgl-project#14539)

* [diffusion] perf: add QKV fusion optimization for Flux models (sgl-project#14505)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [model-gateway][tracing]: implement request tracing using OpenTelemetry with trace context propagation (HTTP) (sgl-project#13897)

* [diffusion] lora: fix LoRA dtype handling and weight attribute access for z-image model (sgl-project#14543)

Co-authored-by: niehen6174 <nihen6174@gmail.com>

* fix "GrammarMatcher has terminated after accepting the stop token, but is trying to find the next token mask" when both reasoning and spec are enabled (sgl-project#14464)

* [1/n] Fix hanging during DeepGemm Warmup (sgl-project#14493)

* [Bug fix] Add /model_info endpoint to mini_lb (sgl-project#14535)

* [Qwen3-next] remove heuristics and add radix cache kl test (sgl-project#14520)

* [Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (sgl-project#14538)

* chore: bump sgl-kernel version to 0.3.18.post3 (sgl-project#14518)

* Update CI_PERMISSIONS.json (sgl-project#14552)

* Update DeepSeek V3 docs to use B200 (sgl-project#14447)

* [Doc] Add short explanation on page size (sgl-project#14557)

* [docs] Add missing word in argument description (sgl-project#14205)

* support piecewise cuda graph for Olmo models (sgl-project#14476)

* Enhance prefill PP node robustness (sgl-project#14494)

* DOC update nemo-skills in docs (sgl-project#14555)

Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* remove unecessary dual stream token threshold from the rest of models (qwen moe, kimi linear, etc.) (sgl-project#14337)

* feat(ci): add framework target to release-docker workflows (sgl-project#14559)

* Fix attention backend logic for Qwen3-Next on SM100 (sgl-project#14560)

* [FLA] Add explicit kernel arguments to kda.py for Kimi Linear support (sgl-project#14561)

* Add CUDA kernel size analysis tool for sgl-kernel optimization (sgl-project#14544)

* [DLLM] feat: Add threshold based parallel decoding support (sgl-project#14412)

Co-authored-by: Jinwei Yao <jinweiy@illinois.edu>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>

* Add unit-test-backend-8-gpu-b200 to rerun-stage command (sgl-project#14569)

* [apply][2/2] Fused qk_norm_rope for Qwen3-MoE (sgl-project#13998)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* Add Expert Parallelism (EP) support for kimi-k2-thinking (sgl-project#13725)

* Tiny remove wrong import from `python.sglang` (sgl-project#14577)

* Add small model test for spec v2 + dp + trtllm_mla (sgl-project#14576)

* [diffusion] cli: profiling utilities support (sgl-project#14185)

Co-authored-by: jianyingzhu <53300651@qq.com>
Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [NPU]LoRA: Adding Torch Native backend (sgl-project#14132)

* [BugFix] fix prefixcache performance and accuracy on ascend (sgl-project#13573)

* Fix FP8 KV Triton type issue and add regression test (sgl-project#14553)

* Rename TensorRT Model Optimizer to Model Optimizer (sgl-project#14455)

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

* [CI] Tiny speed up VLM CI (sgl-project#14517)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* [Minor] Temporarily skipping deepep large mtp test (sgl-project#14586)

* [model-gateway] extra accumulator and tool handler in oai router (sgl-project#14587)

* [model-gateway] Fixed WASM Security Vulnerability - Execution Timeout (sgl-project#14588)

* [model-gateway] reorganize metrics, logging, and otel to its own module (sgl-project#14590)

* Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-FP8 (sgl-project#14141)

* [CI]Unblock and split spec v2+dp test (sgl-project#14551)

* [Tool Call] Fix DeepSeekV32Detector skipping functions with no params in streaming mode (sgl-project#14573)

* [feat] use cachebuffer to store mm feature to speedup hash (sgl-project#14386)

* [CI] Fix unit-test-backend-8-gpu-b200 running on every /rerun-stage (sgl-project#14591)

* [model-gateway] fix WASM memory limit per module (sgl-project#14600)

* Tiny fix missing policy decision recording (sgl-project#14605)

* Super tiny remove unneeded policy flag (sgl-project#14608)

* [model-gateway] refactor otel to be more efficient (sgl-project#14604)

* Super tiny remove unused select_worker_pair (sgl-project#14609)

* [model-gateway] fix WASM unbounded request/response body read vuln (sgl-project#14612)

* [2/2] Add rope kernel in sgl-kernel  (sgl-project#14452)

* [DLLM] Add initial cuda graph support (sgl-project#14203)

* Super tiny fix unused code in router (sgl-project#14618)

* [Glm46v] Bug fix for accuracy drop and unable to launch server (sgl-project#14585)

Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>

* Fix amd rope definition (sgl-project#14556)

* modify the sgl-kernel to be compatible with transformers 5.x. (sgl-project#14625)

* [Reasoning + Structured Output] make reasoning compatible with structured output (sgl-project#12551)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

* [diffusion] feat: add support for LoRA layers in transformer_2 within LoRAPipeline (sgl-project#14606)

* chore: bump sgl-kernel version to 0.3.19 (sgl-project#14632)

* [cpu] Implement all gather/reduce for arm64 cpu (sgl-project#12527)

* [diffusion] chore: further refine output resolution adjustment logic (sgl-project#14558)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix dp-aware incompatible with service-discovery (sgl-project#14629)

* update transformers package version to 5.0.0rc0 (sgl-project#14356)

* chore: bump sgl-kernel version to 0.3.19 (sgl-project#14649)

* chore: bump SGLang version to 0.5.6.post1 (sgl-project#14651)

* [AMD] change fused rms quant interface for aiter upgrade (sgl-project#14497)

* [model-gateway] reducing cpu overhead in various of places (sgl-project#14658)

* [model-gateway] reduce cpu overhead in grpc router (sgl-project#14663)

* [model-gateway] fix WASM arbitrary file read security vol (sgl-project#14664)

* vlm: Use fa3 as the default backend for qwen3 vl (sgl-project#14634)

* [model-gateway] Optimize memory usage in HTTP router (sgl-project#14667)

* fix: use .get() when accessing strict mem-check env variable (sgl-project#14657)

* improve default glm mtp setting (sgl-project#14457)

Signed-off-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* Fix cache-aware router should pick min load instead of min tenant size (sgl-project#14650)

* Bump up diffusers to latest official release version (sgl-project#14670)

* [model-gateway] add OTEL integration to grpc router (sgl-project#14671)

* [CI] Increase max-parallel to 15 for high priority PRs (sgl-project#14675)

* [HiCache] fix condition check when use decode offload (sgl-project#14489)

* [RadixTree] Optimize the Time Complexity of Node Retrieval Operation from O(n*m) to O(n) (sgl-project#13334)

Signed-off-by: CLFutureX <chenyongqyl@163.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>

* Tiny support printing requests in bench_serving for observability (sgl-project#14652)

* Aiter fp8 kv cache (sgl-project#13147)

* [SMG]feat: implement TokenGuardBody for managing token return (sgl-project#14653)

* [NPU] chore: bump basic software version to 8.3.rc2 (sgl-project#14614)

* [CI] Unblock gb200 cutedsl test (sgl-project#14469)

* Add ffmpeg into sglang docker - required by transformers multimodal V… (sgl-project#14679)

* [Bugfix] Fix KeyError for Mistral-Large-3 rope_scaling config (sgl-project#14627)

* Tiny support sgl-router http response status code metrics (sgl-project#14689)

* [CI] Migrate Eagle 1-GPU tests to test/registered/ (sgl-project#14529)

* Revert "[Bug] fix not desired disable fused share experts caused by r… (sgl-project#14676)

* Add per-request decode tp size (sgl-project#14678)

Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>

* [ci][smg] fix docker release ci and add it to pr test (sgl-project#14683)

* Tiny extract select_worker_min_load (sgl-project#14648)

* Fix dp-aware incompatible with completions and chat completions APIs (sgl-project#14647)

* [CI] Fix Llama 3.1 8B FP4 CI (sgl-project#14699)

* fix: make override DeepseekV2Model work (sgl-project#14707)

* chore: add code owners for deepseek_v2.py (sgl-project#14714)

* [CI] Move mistral large 3 basic to nightly (sgl-project#14622)

* fix the deepep 8 gpu unit test (sgl-project#14601)

* Add fuse_marlin_moe test to ci and add new ep test (sgl-project#14686)

* [Bugfix] Fix environ error in scheduler_runtime_checker_mixin.py (sgl-project#14461)

Signed-off-by: Kun(llfl) <i@imux.top>

* [Feat] Add received_time in serving_base (sgl-project#13432)

Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>

* fix: prevent HugginqFace access when SGLANG_USE_MODELSCOPE is enabled (sgl-project#12039)

* [Test] Skip STANDALONE speculative decoding tests for different hidden sizes (sgl-project#14733)

* [diffusion] feat: support comparing batch perf (sgl-project#14738)

Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* Revert "[Feat] Add received_time in serving_base" (sgl-project#14743)

* [Model] Add PaddleOCR-VL Model Support (sgl-project#12953)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* fix rope parameter initialization error caused by transformers v5.0 update (sgl-project#14745)

* [model-gateway] optimize core modules (sgl-project#14751)

* [SMG] perf: optimize tokenizer for reduced CPU and memory overhead (sgl-project#14752)

* Add FP8 Blockwise GEMM Backend Flag `--fp8-gemm-backend` (sgl-project#14379)

* fix: checking if tokenizer is in cache before downloading from HF (sgl-project#14698)

* fix: making rate limit a warning instead of error  (sgl-project#14753)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* move multi-item scoring functions in tokenizer manager into a separate file (sgl-project#14740)

* Improve CI by trying a warmup before unit tests (sgl-project#14669)

* [Perf] Optimize radix tree for cache-aware load balancin (sgl-project#14758)

* [Feature] Add LoRA support for embedding layers (sgl-project#14177)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Beichen-Ma <bm685@cornell.edu>

* [model-gateway] release gateway 0.2.4 (sgl-project#14763)

* [ci]: Enable the new hf API (sgl-project#14687)

* Re-add the API serving timing metrics. (sgl-project#14744)

Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: zhanghaotong <zhanghaotong.zht@antgroup.com>

* fix: adding rate limit warning at verify token permission stage (sgl-project#14756)

* Disable 8-gpu-b200 runner in PR tests (sgl-project#14768)

* [fix] Fix issues for in-flight weight updates (sgl-project#14064)

Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>

* [Auto Sync] Update data_parallel_controller.py, detokenizer... (20251209) (sgl-project#14759)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: race condition between validation and download locks (sgl-project#14761)

* Fix VLM accuracy thresholds for nightly tests (sgl-project#14777)

* fix server args bug (sgl-project#14725)

* handling incomplete rope_scaling config ci after transformers upgrade (sgl-project#14784)

* fix b200 ci (sgl-project#14786)

* [RL] support weight reload for low-bit rollout (sgl-project#9650)

Co-authored-by: Hecate0821 <hec4te0821@gmail.com>
Co-authored-by: eternally-z <zzywzj@gmail.com>
Co-authored-by: Wilboludriver <wilbolu@outlook.com>
Co-authored-by: Wilbolu <81792854+Wilboludriver@users.noreply.github.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>

* fix: add missing logic for SGLANG_USE_MODELSCOPE variable (sgl-project#14794)

* fix b200 fa4 ci (sgl-project#14788)

* [diffusion] profile: early exit when enough steps are captured to reduce the size of the trace file (sgl-project#14803)

* [GLM-4.6V] Support Pipeline Parallelism for GLM-4.6V & GLM-4.1V (sgl-project#14720)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [diffusion] CI: Add LoRA support to diffusion server configuration and test cases (sgl-project#14697)

* Revert "fix: checking if tokenizer is in cache before downloading from HF" (sgl-project#14808)

* [diffusion] performance: refactor diffusion fuse qkv and apply to qwen-image (sgl-project#14793)

* [SMG-GO] implement a Go SGLang Model Gateway - OpenAI Compatible API Server (sgl-project#14770)

* [model-gateway] Dynamically Populate Tool Call Parser Choices (sgl-project#14807)

* Support HTTP response status code prometheus metrics (sgl-project#14710)

* Fix router keep nonzero metrics after worker is deleted (sgl-project#14819)

* Tiny fix incorrect worker removal command (sgl-project#14822)

* [NPU] bug fix for mtp and w4a8 (sgl-project#14806)

* [CI] fix UT success check in `test_eagle_infer_beta_dp_attention.py` (sgl-project#14831)

* Fix CI registry scan to only check test/registered directory (sgl-project#14812)

* [model-gateway] add anthropic message api spec (sgl-project#14834)

* [diffusion] doc: fix tiny typo in multimodal_gen/README.md (sgl-project#14830)

* [model-gateway] support customizing Prometheus duration buckets (sgl-project#14716)

* [model-gateway] support engine response http status statistics in router (sgl-project#14712)

* [CI] Reduce stage-b auto-partition from 4 to 2 (sgl-project#14769)

Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

* Apply back moe_sum_reduce for fused_marlin_moe (sgl-project#14829)

* [diffusion] parallel: pad tokens for video models under sp (sgl-project#14833)

* [diffusion] CI: use unified sampling_params for CI (sgl-project#14045)

* [Auto Sync] Update tool_chat_template_deepseekv31.jinja (20251210) (sgl-project#14837)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jue Wang <zjuwangjue@gmail.com>

* Revert transformers to 4.57.1 (sgl-project#14801)

* [model-gateway] Fix incompatible metric comparison in` PowerOfTwo` policy (sgl-project#14823)

* [bugfix] qwen25-VL support lora (sgl-project#14638)

* fix lora target all + csgmv backend (sgl-project#14796)

* [model-gateway] adds default implementations to RouterTrait in mod.rs (sgl-project#14841)

* [AMD] Add model to AMD nightly test (sgl-project#14442)

* Treat unittest SkipTest exception as pass instead of as failure (sgl-project#14847)

* [model-gateway] code clean up on oai router (sgl-project#14850)

* [model-gateway] fix import order in oai conversation (sgl-project#14851)

* fix fp8 gemm nightly CI (sgl-project#14844)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* fix: restrict cache validation behaviors to CI only (sgl-project#14849)

* Fix CUDA version handling in ci_install_deepep.sh (sgl-project#14854)

* Fix TestGLM41VPPAccuracy test flakiness (sgl-project#14848)

* Minor code style fix for dllm (sgl-project#14836)

* Enable TP for Mamba-based models (sgl-project#14811)

Signed-off-by: Roi Koren <roik@nvidia.com>

* [CI] Temp disable gb200 test (sgl-project#14865)

* Refactor Marlin MoeRunner (sgl-project#14554)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>

* [6/n] Fix `num_token_non_padded` computation in prefill (sgl-project#14313)

Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Runkai Tao <rt572@physics.rutger.edu>

* Remove myself to test CI gate issue (sgl-project#14871)

* fix: creating blobs only once for publish trace retries (sgl-project#14845)

* Move and update MindSpore docs, make it appear on the online documentation (sgl-project#14861)

Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix nightly vlm ci : restore original eval for requests without regex (sgl-project#14875)

* Only count limitations for previous runs  that reaches the test stages (sgl-project#14856)

* [CI][BUG] fix ib setup for disaggregation hicache test (sgl-project#14877)

Signed-off-by: lukotong-7 <shicanwei.scw@alibaba-inc.com>

* [Fix] Remove unused import from test_disaggregation_hicache.py (sgl-project#14880)

* fix: adding temporary bypass for nightly tests (sgl-project#14876)

* Avoid deleting entire cache for missing shards (sgl-project#14754 follow-up) (sgl-project#14853)

* Tiny add more error info for bench_serving (sgl-project#14827)

* Tiny support range ratio in GSP in bench serving (sgl-project#14828)

* [diffusion] feat: enable torch compile to eliminate GPU bubble (sgl-project#13641)

Co-authored-by: jianyingzhu <53300651@qq.com>
Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com>
Co-authored-by: root <root@2u2g-spr-0417.ipp4a1.colossus.nvidia.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* [NPU] adapt dsv3.2 nsa prefill context parallel (sgl-project#14541)

* [diffusion] feat: support sageattn & sageattn3 backend (sgl-project#14878)

* dsv32 multistream opt

* clean code

* delete renormalize in topk

* dsv32 use batch_matmul_transpose in MTP

* modify comment

* Support dynamic w8a8

* dsv3 support ascend_fuseep

* rebase modify

---------

Signed-off-by: Kay Yan <kay.yan@daocloud.io>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Signed-off-by: CLFutureX <chenyongqyl@163.com>
Signed-off-by: Kun(llfl) <i@imux.top>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Roi Koren <roik@nvidia.com>
Signed-off-by: lukotong-7 <shicanwei.scw@alibaba-inc.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Co-authored-by: key4ng <rukeyang@gmail.com>
Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com>
Co-authored-by: Sam <lsam@nvidia.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Co-authored-by: Kay Yan <kay.yan@daocloud.io>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: Yueming Yuan <yy28@illinois.edu>
Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: zyksir <zhuyikai.zyk@gmail.com>
Co-authored-by: Alison Shao <54658187+alisonshao@users.noreply.github.com>
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: GMI Xiao Jin <xiao.j@gmicloud.ai>
Co-authored-by: dev <devnull@example.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: WenhaoZhang <42087078+niehen6174@users.noreply.github.com>
Co-authored-by: niehen6174 <niehen.6174@gmail.com>
Co-authored-by: roikoren755 <26850796+roikoren755@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: blahblah <28567807+Brain97@users.noreply.github.com>
Co-authored-by: shuxiguo <shuxiguo@meituan.com>
Co-authored-by: DefTruth <qiustudent_r@163.com>
Co-authored-by: Hudson Xing <77495133+harvenstar@users.noreply.github.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: Tony Lu <tonylu@linux.alibaba.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: Wenyi Xu <wenyixu101@gmail.com>
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
Co-authored-by: Rain Jiang <96632942+rainj-me@users.noreply.github.com>
Co-authored-by: Trevor Morris <tmorris@nvidia.com>
Co-authored-by: Feng Su <sufeng@linux.alibaba.com>
Co-authored-by: niehen6174 <nihen6174@gmail.com>
Co-authored-by: gongwei-130 <56567052+gongwei-130@users.noreply.github.com>
Co-authored-by: harrisonlimh <97203667+harrisonlimh@users.noreply.github.com>
Co-authored-by: Lee Nau <lnau@nvidia.com>
Co-authored-by: almaslof <187766901+almaslof@users.noreply.github.com>
Co-authored-by: Rain H <2510421000@qq.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Chen1022 <jincong.cjc@ant-intl.com>
Co-authored-by: Tiwei Bie <tiwei.btw@antgroup.com>
Co-authored-by: Jinwei Yao <jinweiy@illinois.edu>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: AichenF <aichenf@nvidia.com>
Co-authored-by: jianyingzhu <53300651@qq.com>
Co-authored-by: Jianying <53503712+jianyingzhu@users.noreply.github.com>
Co-authored-by: Vladimir Serov <serov.vladimir.zser@gmail.com>
Co-authored-by: khalilzhk <khalilzhk@gmail.com>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: wentx <3843588+momaek@users.noreply.github.com>
Co-authored-by: Nicholas <45984215+liusy58@users.noreply.github.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Muqi Li <muqi1029@gmail.com>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Prozac614 <dwt614707404@163.com>
Co-authored-by: Yibo Cai <yibo.cai@arm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Co-authored-by: Francis <38564764+ssssnow@users.noreply.github.com>
Co-authored-by: PiteXChen <44110731+CLFutureX@users.noreply.github.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: kk <43161300+kkHuang-amd@users.noreply.github.com>
Co-authored-by: Jimmy <29097382+jimmy-evo@users.noreply.github.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
Co-authored-by: kun-llfl <i@imux.top>
Co-authored-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Co-authored-by: yrk111222 <2493404415@qq.com>
Co-authored-by: yudian0504 <138860534+yudian0504@users.noreply.github.com>
Co-authored-by: Douglas Yang <dyang@college.harvard.edu>
Co-authored-by: Ethan (Yusheng) Su <yushengsu.thu@gmail.com>
Co-authored-by: Beichen-Ma <bm685@cornell.edu>
Co-authored-by: MingxuZh <109504044+MingxuZh@users.noreply.github.com>
Co-authored-by: ShawnY112358 <61113840+ShawnY112358@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: TomerBN-Nvidia <tbarnatan@nvidia.com>
Co-authored-by: Peng Zhang <aniz1905@gmail.com>
Co-authored-by: Hecate0821 <hec4te0821@gmail.com>
Co-authored-by: eternally-z <zzywzj@gmail.com>
Co-authored-by: Wilboludriver <wilbolu@outlook.com>
Co-authored-by: Wilbolu <81792854+Wilboludriver@users.noreply.github.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com>
Co-authored-by: liupeng374 <liupeng374@huawei.com>
Co-authored-by: Li Jinliang <975761915@qq.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Jue Wang <zjuwangjue@gmail.com>
Co-authored-by: Praneth Paruchuri <pranethparuchuri@gmail.com>
Co-authored-by: Siyuan Chen <41201609+SYChen123@users.noreply.github.com>
Co-authored-by: michael-amd <Michael.Zhang@amd.com>
Co-authored-by: Trang Do <200224632+trangdough@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: yuchengz816-bot <yuchengz816@gmail.com>
Co-authored-by: Runkai Tao <rt572@physics.rutger.edu>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Tiance Wang <wangtiance@gmail.com>
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: shicanwei.scw <shicanwei.scw@alibaba-inc.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: root <root@2u2g-spr-0417.ipp4a1.colossus.nvidia.com>
Co-authored-by: liupeng374 <782420244@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments