Skip to content

Add sgl-kernel CI test for Blackwell (B200)#13301

Merged
Kangyan-Zhou merged 5 commits intomainfrom
feat/add-b200-kernel-tests
Nov 21, 2025
Merged

Add sgl-kernel CI test for Blackwell (B200)#13301
Kangyan-Zhou merged 5 commits intomainfrom
feat/add-b200-kernel-tests

Conversation

@alisonshao
Copy link
Collaborator

Summary

Add CI test for sgl-kernel on Blackwell (B200) GPUs to ensure test coverage across different GPU architectures.

Changes

  • Add sgl-kernel-b200-test job to pr-test.yml
  • Runs on 4-gpu-b200 runner with IS_BLACKWELL=1 environment variable
  • Executes all sgl-kernel unit tests via pytest
  • Added to pr-test-finish dependency list to ensure proper CI gating

Notes

  • Currently uses 4-gpu-b200 runner as suggested by @kangyan Zhou
  • Can be changed to 1-gpu-b200 runner in the future if needed (only 1 GPU is required for kernel tests)

Fixes #13233

- Add sgl-kernel-b200-test job to pr-test.yml
- Runs on 4-gpu-b200 runner with IS_BLACKWELL=1
- Ensures test coverage for sgl-kernel on Blackwell architecture

Fixes #13233
@gemini-code-assist
Copy link
Contributor

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@Fridge003
Copy link
Collaborator

cc @FlamingoPg

@FlamingoPg
Copy link
Collaborator

Great job, could you please help resolve conflicts

@github-actions github-actions bot added documentation Improvements or additions to documentation sgl-kernel labels Nov 21, 2025
@Kangyan-Zhou Kangyan-Zhou merged commit 64480ec into main Nov 21, 2025
13 of 19 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the feat/add-b200-kernel-tests branch November 21, 2025 03:02
yukavio pushed a commit to yukavio/sglang that referenced this pull request Nov 25, 2025
* [model-gateway] update workflow names for gateway and exclude npu (sgl-project#13415)

* [Tiny fix] Fix bench_speculative.py run bug (sgl-project#13416)

* [model-gateway] Add Gateway Release Tooling (sgl-project#13420)

* fix uneven PP layer indices (sgl-project#13282)

Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

* diffusion: fix wan2.2 ti2v num_frames adjust logic (sgl-project#13379)

Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>

* [PD][bug fix] fix memleak when last_batch is none (sgl-project#13144)

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>

* Fix cache_tokens calculate issue when retracted (sgl-project#11900)

Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>

* [feature] Custom base path on FastAPI server (sgl-project#5879)

Co-authored-by: lianhu.yin <lianhu.yin@nio.com>
Co-authored-by: kebyn <kebyn@kebyn.cc>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

* Adding user defined hooks support (sgl-project#13217)

* Fix log time stats (sgl-project#13418)

* [Ci tiny fix] Lower score threshold in evaluation test (sgl-project#13443)

* diffusion: fix loading with local model_path (sgl-project#13445)

* [2/N] CI refactor: sperate some backend-independent CPU tasks. (sgl-project#13447)

* Temporarily disable model hooks CI (sgl-project#13450)

* [Deepseek V3.2] Use torch.compile to speed up torch.cat in nsa (sgl-project#13022)

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>

* Remove verbs from GET endpoint paths to follow REST standards (sgl-project#13273)

* Add missing models (sgl-project#13456)

* extend sagemaker.Dockerfile serve script to allow all sglang serve flags (sgl-project#13173)

* Fix 8-gpu B200 nightly tests (sgl-project#13457)

* Fixes validation errors for Wan-AI models which store model weights in subdirectories (sgl-project#13461)

* [Embeddings Performance Testing] Add performance test for embedding models (sgl-project#12359)

* [NVIDIA] Fix broken fp8 MoE of deepseek v3 (sgl-project#13264)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>

* Temporarily comment out multimodal gen test to recover runners (sgl-project#13463)

* Update pr-test.yml to fix invalid job name error

* Add interface_v1 option for dynamic HiCache backend (sgl-project#13140)

Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>

* Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 (sgl-project#13455)

* fix MambaPool clear method after refactoring (sgl-project#13449)

* [AMD CI] Update sgl-router python path in dockerfile. (sgl-project#13458)

* [CI] re-enable test_vision_openai_server_a ci (sgl-project#13444)

* Adding CI Monitor Improvements (sgl-project#13462)

* [GLM4.6v] Required changes for bumping up to transformer 5.x (sgl-project#13229)

* [GLM4.6v] Relax the constraint of non-user role chat completion message schema for new GLM-v release (sgl-project#13258)

* [model-gateway] use worker startup time out for worker registration (sgl-project#13473)

* model: support JetVLM (sgl-project#13289)

* chore: add an unified server arg for multimodal inputs preprocess config(sgl-project#12149)

Co-authored-by: bianfeng <bianfeng@pinduoduo.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

* [PD] Clarify init method docstrings for kvsender and kvreceiver (sgl-project#13476)

* Fix lora test (sgl-project#13479)

* [Piecewise CUDA Graph] Support ModelOpt FP8 (sgl-project#13094)

* CI: fix NFS EBUSY error in PR test workflow (sgl-project#13460)

Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [CI] fix triggered by a non-run-ci label (sgl-project#13393)

* [CI] remove auto-labeling `run-ci` label. (sgl-project#13486)

* fix: change performance log directory to cache path (sgl-project#13482)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [CI] Add input for pr-gate (sgl-project#13491)

* [opt kimi k2 3/n] opt kimi_k2 moe_fused_gate kernel (sgl-project#13374)

* [CI] fix lint yml (syntax error) (sgl-project#13496)

* [VLM][feat] Support encoder DP for Qwen2.5-VL (sgl-project#13126)

Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: liusy58 <xiehang.lsy@alibaba-inc.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>

* [HiCache] Critical fix to host memory double free (sgl-project#13501)

Co-authored-by: Hao Chen <cighao@gmail.com>

* [BugFix] Accuracy and function Issue when run ptpc quant model (sgl-project#13157)

Co-authored-by: yuechguo <yuechguo@amd.com>

* fix: create git tags directly instead of temporary branches (sgl-project#13168)

* Add .github/CI_PERMISSIONS.json to define the CI permissions (sgl-project#13509)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* README.md -> FOLDER_README.md (sgl-project#13510)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Use slash command to trigger CI (sgl-project#13512)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Add docs on trigger ci (sgl-project#13513)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* [Feature] Re:Enable hybrid mem saver (sgl-project#12962)

* Trigger CI retry with edit (sgl-project#13516)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Update docs (sgl-project#13519)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Add /tag-and-rerun-ci (sgl-project#13521)

* [CI] update pr-gate to be compatible with new slash triggering mananer. (sgl-project#13522)

* [CI] fix skipping pr-gate on main (sgl-project#13525)

* Small cleanups related to LoRA weight loading (sgl-project#13474)

* [CI] fix CI skipped on main (sgl-project#13527)

* [model-gateway] fix gateway docker build due to recent py code change (sgl-project#13532)

* [model-gateway] limit opened files in docker build to fix edge case (sgl-project#13536)

* [docker] fix dockerfile naming for diffusion (sgl-project#13534)

* fix lora test (sgl-project#13537)

* Remove jet-ai/Jet-Nemotron-2B in nightly text tests as this is constantly failing (sgl-project#13540)

* [fix] Fixes accuracy issues caused by incorrect use of rope (sgl-project#13495)

* Flashinfer TRTLLM-GEN-MoE + Qwen3 (sgl-project#13489)

* [chore] Disable ccache for sgl-kernel release (sgl-project#13541)

* Add Qwen/Qwen1.5-MoE-A2.7B to model list (sgl-project#13543)

* [Fix] Fix DeepSeek V3 MTP on B200 (sgl-project#13548)

* [router][grpc] Support num_reasoning_tokens in haromy models (sgl-project#13047)

* [feat][Ascend][Mindspore]: support model-impl of mindspore (sgl-project#9234)

* [AMD CI] Local cache fallback. (sgl-project#13452)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [CI] fix amd 1 gpu basic test (sgl-project#13551)

* [Doc] Update HiCache and Mooncake docs & Mooncake Setup Error Checking (sgl-project#12740)

* purge unnecessary env variable set in deterministic test (sgl-project#13481)

* chore: bump sgl-kernel version to 0.3.17.post2 (sgl-project#13542)

* Add `lmsys/gpt-oss-20b-bf16` to model validation check (sgl-project#13557)

* CI Failure Monitor Improvements (sgl-project#13558)

* [RL] Allow passing tensors of different dtypes for FlattenedTensorBucket (sgl-project#13413)

* [CI] Fix CUDA workflow's dependency. (sgl-project#13568)

* [NPU] Adapt pr-gate for pr-test workflow & workflows refresh (sgl-project#13567)

* Tiny enhance test suites sanity check (sgl-project#13589)

* [3/N] CI refactor: move some manually triggered tests. (sgl-project#13448)

* Support moe topk sigmoid kernel (sgl-project#13049)

Co-authored-by: xuebi <xuebi@minimaxi.com>

* Expend compatibility check for all quantized MoE models (sgl-project#13465)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

* add https://github.com/netanel-haber to CI_PERMISSIONS.json (sgl-project#13577)

* chore: bump sgl-kernel version to 0.3.17.post2 (sgl-project#13570)

* [Auto Sync] Update base_grammar_backend.py, collector.py (20251116) (sgl-project#13357)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>

* [GDN] Remove unnecessary contiguous() (sgl-project#13604)

* [GDN] Remove unnecessary conv state clone (sgl-project#13603)

* [VLM] Support Piecewise CUDA Graph for Qwen2.5-VL  (sgl-project#13055)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Yuhao Yang <yhyang201@gmail.com>

* [diffusion] CI: improve diffusion CI (sgl-project#13562)

Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>

* feat: support external custom models (sgl-project#13429)

Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [CI fix] Fix image download failures in VLM CI tests (sgl-project#13613)

* [NVIDIA] Add fp8 gemm benchmark on blackwell (sgl-project#13528)

* [UT] Destroy process group after broadcast to resolve port occupation issues in multi-server tests (sgl-project#12379)

* [diffusion] refactor: remove PreprocessorConfig (sgl-project#13248)

* [diffusion] refactor: refactor pipeline folders (sgl-project#13253)

* Add FP32 dtype support for RoPE - Part2 (sgl-project#13328)

* [diffusion] fix: remove multimodal_gen redundant get_bool_env_var func (sgl-project#13583)

Co-authored-by: Mick <mickjagger19@icloud.com>

* Add support for new aiter version (AR accuracy, is_shuffled PR) (sgl-project#13554)

Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com>

* diffusion: improve baseline performance monitor (sgl-project#13614)

* [Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (sgl-project#13453)

* [CI] Align metric units for CI rate limit (sgl-project#13633)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ROCM] Optimized deepseek-r1 fp8 model with + triton_gemm_a8w8 + batch_gemm_a8w8 + fused set_mla_kv_buffer kernel (sgl-project#13617)

Co-authored-by: root <root@smci355-ccs-aus-m12-17.cs-aus.dcgpu>
Co-authored-by: jacky.cheng <yichiche@amd.com>

* fix bench_speculative bug (sgl-project#13197)

* Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel)" (sgl-project#13644)

* [CI] optimize CI workflow info (sgl-project#13634)

* CI: Kill zombie diffusion processes in CI & minor code style fix on rotary embedding fallback  (sgl-project#13637)

* [CI] apply pr-gate for XPU (sgl-project#13663)

* Add fused_rmsnorm_gated_cpu kernel for CPU to support Qwen3-Next (sgl-project#11577)

* [10/n] decouple quantization impl from vllm dependency - fix import (sgl-project#13524)

* Adding nightly tests as release guard for bot bump workflows (sgl-project#13655)

* [DeepseekV3.2] Deepseek fp8 support for MHA path (sgl-project#12964)

* Fix launch of `Olmo3` (sgl-project#13666)

Signed-off-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>

* [Deepseek V3.2] Change indexer weights_proj to fp32 (sgl-project#13459)

* enable csgmv automatically on cuda (sgl-project#13600)

* Add nightly test CI monitor workflow (sgl-project#13038)

* allow loras to be implicitly evicted and loaded based on max_loaded_loras (sgl-project#11526)

* Test reorganization: Move tests to manual/ (sgl-project#13610)

* [Piecewise CUDA Graph] Fix recompile issue for Mixtral and Grok2 (sgl-project#13667)

Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com>

* Super tiny remove unused MiniMaxM2MLP class (sgl-project#13659)

* Update quantization.md with new model resources (sgl-project#13677)

* [model-gateway] add both python and rust cli alias (sgl-project#13678)

* [diffusion] CI: improve validation method (sgl-project#13627)

* [model-gateway] fix gateway cli arg parser to not use = (sgl-project#13685)

* [CI] Move nightly tests to test/nightly/ (sgl-project#13683)

* [NVIDIA] Add cutedsl e2e test to GB200 CI (sgl-project#12672)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* Add sgl-kernel CI test for Blackwell (B200) (sgl-project#13301)

* remove unnecessary starvation check (sgl-project#13619)

* Fix target MLA with eagle3 support for PD disaggregation (sgl-project#13555)

Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>

* [kimi k2 thinking] Avoid useless torch.zeros_  (sgl-project#13596)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* [opt kimi k2 4 / n] Delete useless pad kernel in sgl_moe_align_block_size (sgl-project#13587)

* [VLM] Support Piecewise CUDA Graph for InternVL (sgl-project#13640)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [Piecewise Cuda Graph] rename, refactor and add more logging (sgl-project#13675)

Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com>

* [difusion] CI: speed up multimodal_gen ci (sgl-project#13665)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [diffusion] doc: minor update docs (sgl-project#13177)

* Fix ZMQ bind error on non-zero rank nodes when using SGLANG_BLOCK_NONZERO_RANK_CHILDREN=0 (sgl-project#13686)

* [diffusion] server: use meta to avoid Linear init for TextEncoder (sgl-project#13564)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [Auto Sync] Update http_server.py, io_struct.py, scheduler_... (20251120) (sgl-project#13679)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zhuqi Li <zhli@x.ai>

* [Bugfix] Fix hidden state size in EAGLE PD disaggregation buffers (sgl-project#13590)

Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>

* [HiCache] fix unit test with changed new APIs (sgl-project#13498)

* [Fix] Qwen3Next lmhead dtype  (sgl-project#13708)

* [NPU] chore: bump to CANN 8.3.RC1 and Pytorch 2.8.0 (sgl-project#13647)

* [11/N] MoE Refactor: Simplifying SBO Implementation with Dispatcher Hooks (sgl-project#13327)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* [Clean code] Compressed_tensors_moe code clean (sgl-project#13719)

* [diffusion] profile: support performance metric dumping and comparison (sgl-project#13630)

* [AMD] Enable fused shared expert append and flatten quant for fp8 deepseekR1 model (sgl-project#13705)

Co-authored-by: yctseng0211 <yctseng@amd.com>

* [diffusion] doc: add contributing.md (sgl-project#13649)

* fix 3fs down, lock schedule main thread (sgl-project#13407)

* Fix url: use https://roadmap.sglang.io for roadmap (sgl-project#13733)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Super tiny delete unused files (sgl-project#13734)

* [diffusion] log: minor improve logging (sgl-project#13735)

* [CI] minor hot fix of model validation list (sgl-project#13737)

* Add to ci permission (sgl-project#13739)

* [Piecewise CUDA Graph] Support Kimi-K2 (non-Thinking) (sgl-project#13466)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* Fix: CI monitor should not exit with error on regressions (sgl-project#13694)

* Revert "enable csgmv automatically on cuda" (sgl-project#13707)

* Support torch 12.9 + DeepEP by removing custom nvshmem (sgl-project#12949)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* add some more labels (sgl-project#13701)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* Feat/nemotron nano v3 support (sgl-project#12690)

* Fix global scaling factor loading hang (sgl-project#13484)

* Fix B200 Nightly tests and move one manual test back to unit test to prevent the same issue (sgl-project#13746)

* fix test_lora_update.py starvation message check (sgl-project#13702)

* Fix model weights validation with automatic cache cleanup (sgl-project#13729)

* [Auto Sync] Update evict_policy.py, radix_cache.py (20251120) (sgl-project#13669)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>

* [Tiny] Renaming environ for NVFP4 dispatch (sgl-project#13756)

* modularize gsm8k and mmmu test classes (sgl-project#13506)

* Use dual stream for DS MoE whenever cuda graph is used (instead of with token threshold) (sgl-project#9405)

* [Ascend] support Kimi-K2-Thinking (sgl-project#12759)

Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
Co-authored-by: richhuan <huan_rz@qq.com>
Co-authored-by: ZhengdQin <46387172+ZhengdQin@users.noreply.github.com>

* Refactor eagle bigram key matching (sgl-project#13714)

* [diffusion] fix: fix hunyuanvideo and add 2-gpu ci test  (sgl-project#13720)

Co-authored-by: Mick <mickjagger19@icloud.com>

* Update mem checker during busy (sgl-project#13704)

* Tiny support different prompts in `send_one.py` (sgl-project#13768)

* [diffusion] refactor: refactor sampling params (sgl-project#13706)

* [VLM] Replace torch.repeat_interleave with faster np.repeat for Qwen-VL series (sgl-project#13736)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [Spec v2] Remove `allocate_lens` and enable over-allocation (sgl-project#13478)

* [diffusion] CI: tinyfix diffusion ci (sgl-project#13769)

Co-authored-by: Mick <mickjagger19@icloud.com>

* align code style eagle draft&draft_extend cuda graph runner (sgl-project#13533)

* Refactor MHA & MLA KV caches to support FP4 (sgl-project#13547)

Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

* Move unnecessary input_addr capture under debug mode flag for speed-up (sgl-project#13690)

* Gather static input buffers for cuda graph (sgl-project#13676)

* Revert "Fix RMSNorm API CALL mismatch issue. (sgl-project#10032)" (sgl-project#13727)

* [model-gateway] update smg code owner (sgl-project#13777)

* [model-gateway] clean up router manager function order (sgl-project#13776)

* Fix typo in docs (sgl-project#13709)

* [Feature] HiCache JIT kernel (once again) (sgl-project#13764)

* [DeepEP] Add SGLANG_DEEPEP_BF16_DISPATCH env var in Normal mode (sgl-project#13787)

* Upgrade flashmla kernel for NSA tp support (sgl-project#13718)

* [diffusion] feat: support sp for image models (sgl-project#13180)

* [diffusion] CI: add run_suite to multimodal_gen CI (sgl-project#13791)

* Fix pagination bug in CI monitor preventing performance-test-2-gpu data collection (sgl-project#13781)

* [Scheduler] Tiny organize code style (sgl-project#13806)

* [Deepseek] Refactor deepseek server_args _handle_model_specific_adjustments (sgl-project#13687)

* [CI] Tiny refactoring sgl-kernel tests (sgl-project#13813)

* Tune fp8_w8a8 fused triton moe for GLM-4.6-FP8 (sgl-project#13815)

* make trtllm attn backend's init_forward_metadat non blocking (sgl-project#13802)

* remove package json which is not used (sgl-project#13810)

* [1/2] Refactor DeepGeem requant for FP8 Linear on Blackwell  (sgl-project#13601)

Co-authored-by: fy1214

* chore: bump sgl-kernel version to 0.3.18 (sgl-project#13816)

* xgrammar up version to 0.1.27 (sgl-project#13650)

* Fix bug: Incorrect variable used in rem_total_token_offset calculatio… (sgl-project#13201)

* [Doc] Refine fused_moe_triton configs doc (sgl-project#13820)

* Update MindSpore documentation (sgl-project#13656)

Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor cache init logic (sgl-project#13800)

* [Bugfix] Add jit kernel files in packaging (sgl-project#13829)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xu Yongfei <xuyongfei.xyf@antgroup.com>

* [diffusion] doc: minor update contributing.md with test section (sgl-project#13792)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [misc] Rename minilb install env & remove files & fix lint (sgl-project#13831)

* [diffusion] CI: send nightly-test outputs of diffusion to slack for correctness monitoring (sgl-project#13833)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [chore]Upgrade flashinfer to 0.5.3 (sgl-project#13751)

* [Intel XPU]support xgrammar backend for intel xpu (sgl-project#13245)

* [sgl-kernel Code Clean] Remove useless lightning_attention kernel (sgl-project#13819)

* [VLM] Revise InternVL Piecewise CUDA Graph Supporting (sgl-project#13846)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* Fix TorchAO quant in VLM (sgl-project#13508)

Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>

* [Fix]: Adjust FutureMap's token_id_bufs Size to Prevent ChunkedPrefill's next_token_ids from Overwriting Previous Prefill Requests' next_token_id (sgl-project#13713)

Signed-off-by: vito.yy <vito.yy@antgroup.com>

* Fix: Safe RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (sgl-project#11871)

* [Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (sgl-project#13612)

Signed-off-by: lzy <tomlzy213@gmail.com>
Co-authored-by: lzy <tomlzy213@gmail.com>

* Tiny unpin uvloop for other backends (sgl-project#13858)

* [model-gateway] Refactor router e2e responses tests (sgl-project#13745)

Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>

* [Perf] Optimize DeepSeek-R1 w4afp8 glue kernels (sgl-project#10027)

Co-authored-by: Fan Yin <1106310035@qq.com>

* Fix quantized moe checker fail for Qwen3 dense fp8 model (sgl-project#13853)

* [model-gateway] add grpc server code owner (sgl-project#13865)

* [BugFix] fix outplace_fused_experts missing is_gated (sgl-project#13864)

* fix xgrammar_backend crash with malformed inputs (sgl-project#13752)

* [Auto Sync] Update schedule_batch.py, schedule_policy.py, b... (20251122) (sgl-project#13763)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Hanming Lu <hanming@x.ai>

* [Doc] Add an Introduction to Expert Parallelism (sgl-project#13783)

* add LoRA warning if loading a preexisting LoRA adapter with a different name (sgl-project#13822)

* [NPU] Fix NPU CI (sgl-project#13834)

Co-authored-by: c30031083 <chenxu140@huawei.com>

* Overlap glm moe gemms in two cuda streams (sgl-project#13786)

* [Performance] Replace preprocess_video logic from GLM  multimodal processor with transformer impl for speed up (up to 27% faster) and addressing OOM (up to 50x improvements) (sgl-project#13487)

* Add support for bf16 x bf16 cutlass fused MoE (sgl-project#10275)

Co-authored-by: Sam Li <lsam@nvidia.com>
Co-authored-by: jackeyhua <jackeyhuasjtu@gmail.com>

* [Router bugfix] Fix router_manager selecting the wrong router when enable-igw. (sgl-project#13572)

* Fix nightly test job to fail when any test fails (sgl-project#13871)

* [diffusion] refactor: remove training-related code (sgl-project#13860)

* [CI] fix multimodel-gen-test job (sgl-project#13874)

* [diffusion] CI: add validation and cleanup for corrupted safetensors in multimodal loader (sgl-project#13870)

Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [CI] fix lint error (sgl-project#13891)

* fix: draft model revision misuse model revision (sgl-project#11893)

* Fix trace publish paths in nightly-test-nvidia workflow (sgl-project#13888)

* Adding nightly tests for Kimi-K2-thinking, Qwen3, minimax-m2, GLM4.6 (sgl-project#13890)

* [Fix] JIT kernel dependencies in other platforms (sgl-project#13889)

* remove RoPE CPU fp32 tests (sgl-project#13827)

Co-authored-by: Fan Yin <1106310035@qq.com>

* Move test_dummy_grok_models.py from manual to srt (temporary) (sgl-project#13901)

* [CI tiny fix] Enhance robustness of vision chunked prefill test with ROUGE-L metric (sgl-project#13793)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* update flashinfer_cubin==0.5.3 (sgl-project#13848)

* fix

* fix

---------

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Signed-off-by: vito.yy <vito.yy@antgroup.com>
Signed-off-by: lzy <tomlzy213@gmail.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: AlphaBaby <fujianhao1997@qq.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Mike Qiu <qdy220091330@gmail.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: kebyn <kebuyuni@gmail.com>
Co-authored-by: lianhu.yin <lianhu.yin@nio.com>
Co-authored-by: kebyn <kebyn@kebyn.cc>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Carlo Mussolini <48855305+Carlomus@users.noreply.github.com>
Co-authored-by: Rain H <2510421000@qq.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Sirut Buasai <73297481+sirutBuasai@users.noreply.github.com>
Co-authored-by: Vedant V Jhaveri <vedantjh2@gmail.com>
Co-authored-by: Kaixi Hou <kaixih@nvidia.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: Douglas Yang <dyang@college.harvard.edu>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Zijian Zhang <35801754+futrime@users.noreply.github.com>
Co-authored-by: wingedge <handkodu@gmail.com>
Co-authored-by: bianfeng <bianfeng@pinduoduo.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: alisonshao <54658187+alisonshao@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Nicholas <45984215+liusy58@users.noreply.github.com>
Co-authored-by: liusy58 <xiehang.lsy@alibaba-inc.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
Co-authored-by: Hao Chen <cighao@gmail.com>
Co-authored-by: Morpheus Guo <yuechao.guo@amd.com>
Co-authored-by: yuechguo <yuechguo@amd.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>
Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: gongwei-130 <56567052+gongwei-130@users.noreply.github.com>
Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com>
Co-authored-by: Chen Haozhe <c-34@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ykwd <oneday117@qq.com>
Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: Roger Young <42564206+rogeryoungh@users.noreply.github.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Yuhao Yang <yhyang201@gmail.com>
Co-authored-by: StonyPort <157573149+zhooooong@users.noreply.github.com>
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Zeyu Li <li_zeyu@pku.edu.cn>
Co-authored-by: iLeGend <824040212@qq.com>
Co-authored-by: joesun <shauntajoesph@gmail.com>
Co-authored-by: Thomas Wang <1am9trash@gmail.com>
Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com>
Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Co-authored-by: root <root@smci355-ccs-aus-m12-17.cs-aus.dcgpu>
Co-authored-by: jacky.cheng <yichiche@amd.com>
Co-authored-by: Lzhang-hub <57925599+Lzhang-hub@users.noreply.github.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com>
Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: zyksir <zhuyikai.zyk@gmail.com>
Co-authored-by: Zhuqi Li <zhli@x.ai>
Co-authored-by: Michele Marzollo <37903931+michelemarzollo@users.noreply.github.com>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: weibingo <weibing_lai@163.com>
Co-authored-by: Jiajun Li <48857426+guapisolo@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: roikoren755 <26850796+roikoren755@users.noreply.github.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: cctry <shiyang@x.ai>
Co-authored-by: Trevor Morris <tmorris@nvidia.com>
Co-authored-by: Yijie Zhu <762412795@qq.com>
Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
Co-authored-by: richhuan <huan_rz@qq.com>
Co-authored-by: ZhengdQin <46387172+ZhengdQin@users.noreply.github.com>
Co-authored-by: yinghui <32845984+cicirori@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Co-authored-by: ErsongWang <158176536+ErsongWang@users.noreply.github.com>
Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com>
Co-authored-by: Swipe4057 <106391009+Swipe4057@users.noreply.github.com>
Co-authored-by: liuhuijiayou <46172426+liuhuijiayou@users.noreply.github.com>
Co-authored-by: Tiance Wang <wangtiance@gmail.com>
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: Xu Yongfei <xuyongfei.xyf@antgroup.com>
Co-authored-by: gaopengff <pengfei.gao@intel.com>
Co-authored-by: ant-yy <vito.yy@antgroup.com>
Co-authored-by: Zhi Yiliu <2584074296@qq.com>
Co-authored-by: lzy <tomlzy213@gmail.com>
Co-authored-by: Xinyue Zhang <xinyue.zhang@oracle.com>
Co-authored-by: Yuhao Yao <37280700+yuhyao@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Hanming Lu <hanming@x.ai>
Co-authored-by: c30031083 <chenxu140@huawei.com>
Co-authored-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
Co-authored-by: Sam Li <lsam@nvidia.com>
Co-authored-by: jackeyhua <jackeyhuasjtu@gmail.com>
Co-authored-by: Siyuan Chen <41201609+SYChen123@users.noreply.github.com>
Co-authored-by: Yibo Cai <cyb70289@gmail.com>
Co-authored-by: Yibo Cai <yibo.cai@arm.com>
Co-authored-by: Zaili Wang <109502517+ZailiWang@users.noreply.github.com>
Co-authored-by: josephyou <josephyou@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add sgl-kernel CI test for Blackwell

5 participants

Comments