Skip to content

Comments

Add nightly test CI monitor workflow#13038

Merged
Kangyan-Zhou merged 32 commits intomainfrom
add-nightly-ci-monitor
Nov 20, 2025
Merged

Add nightly test CI monitor workflow#13038
Kangyan-Zhou merged 32 commits intomainfrom
add-nightly-ci-monitor

Conversation

@alisonshao
Copy link
Collaborator

@alisonshao alisonshao commented Nov 11, 2025

The nightly monitor:

  • Tracks all performance metrics (throughput, latency, accuracy) over time
  • Compares with historical data from the repository
  • Detects regressions >10% automatically
  • Reports everything in GitHub Actions summary with a clean table
  • Saves all data as artifacts for analysis

Update CI monitors to support new nightly workflow structure
Support tracking of new hardware-specific nightly workflows (NVIDIA, AMD, Intel) alongside existing workflows.

Changes:

  • Add new NVIDIA job names from nightly-test-nvidia.yml
  • Add AMD job names from nightly-test-amd.yml
  • Update nightly_monitor.py to fetch from multiple workflow files instead of just nightly-test.yml
  • Maintain backward compatibility with old job names

This enables the CI monitor to track failures from all new workflows, making it safe to eventually disable the old nightly-test.yml and nightly-test-b200.yml workflows.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Add dedicated monitoring for nightly test runs to track performance
and accuracy regressions over time:

- New nightly_monitor.py script that analyzes nightly test workflow runs
  - Tracks job success/failure rates
  - Calculates average duration per job
  - Detects high failure rates (>30%)
  - Identifies consecutive failures
  - Generates daily trend reports

- New nightly-monitor.yml workflow that runs daily at 8 AM UTC
  - Analyzes last 7 days of nightly test runs
  - Uploads detailed statistics as artifacts
  - Reports regressions via GitHub Actions output
  - Can be triggered manually with custom date range

- Updated ci_analyzer.py to include nightly-test-8-gpu-b200 job
  in tracking list for general CI monitoring
- Add regex patterns for parsing performance metrics from logs
- Add get_job_logs() method to fetch job logs from GitHub API
- Add parse_metrics_from_logs() to extract metrics using regex
- Track performance metrics (throughput, latency, ttft, accuracy) with timestamps
- Display average metrics in report output
- Only fetch metrics from successful perf/eval jobs

This is Step 1 of incremental enhancement to add day-to-day performance
comparison and anomaly detection.
- Add data_repo and data_branch configuration for sglang-bot/sglang-ci-data
- Add get_historical_data_paths() to list available historical data files
- Add fetch_historical_data() to fetch and decode specific data files from repo
- Add get_recent_historical_metrics() to retrieve metrics for a job over time
- Fetch up to 14 recent historical files to compare with current metrics
- Handle base64 decoding and JSON parsing with error handling

This enables day-to-day metric comparison in the next step.
- Compare current metrics with 7-day historical average
- Calculate percentage changes and classify as stable/minor/significant
- Display metrics with change indicators and percentages in report
- Detect performance regressions exceeding 10% change
- Flag throughput decreases >10% and latency increases >10%
- Add performance_regression type to regression reports
- Display regression details with current vs 7-day average
- Generate GitHub Actions step summary with regression alerts
- Display performance metrics table with percentage changes
- Highlight regressions detected section
- JSON artifact already created by workflow for all tracked data
Support tracking of new hardware-specific nightly workflows (NVIDIA, AMD, Intel) alongside existing workflows.

Changes:
- Add new NVIDIA job names from nightly-test-nvidia.yml
- Add AMD job names from nightly-test-amd.yml
- Update nightly_monitor.py to fetch from multiple workflow files instead of just nightly-test.yml
- Maintain backward compatibility with old job names

This enables the CI monitor to track failures from all new workflows, making it safe to eventually disable the old nightly-test.yml and nightly-test-b200.yml workflows.
@alisonshao alisonshao force-pushed the add-nightly-ci-monitor branch from fd68914 to 45adfaf Compare November 15, 2025 01:05
alisonshao and others added 2 commits November 14, 2025 17:47
The GH_PAT_FOR_NIGHTLY_CI_DATA secret may not be available in PR/branch contexts. Using the built-in GITHUB_TOKEN instead for testing purposes.
@alisonshao
Copy link
Collaborator Author

alisonshao commented Nov 15, 2025

The historical comparison in current is not working. it will work once:

  1. Changes are merged to main
  2. The workflow runs with proper write permissions (GH_PAT_FOR_NIGHTLY_CI_DATA)
  3. The first successful run creates the nightly_monitor/ directory and stores the first data file
  4. Subsequent runs can then fetch and compare with that historical data

- Integrate nightly monitoring functionality into ci_analyzer.py with --mode flag
- Update nightly job names for Nvidia, AMD, and Intel workflows
- Remove standalone nightly_monitor.py file
- Update workflow files to use ci_analyzer.py --mode nightly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also merge this workflow to the existing CI monitor workflow?

Nightly monitoring is now integrated into ci_analyzer.py with --mode flag
- Track throughput, latency, accuracy metrics from nightly test jobs
- Parse metrics from successful nightly job logs
- Detect performance trends (>10% changes)
- Display metrics table in GitHub Actions summary with trend indicators
- Save all metrics data in JSON artifacts for analysis
The GitHub API requires higher permissions when filtering by branch.
Change default from 'main' to None to avoid 403 Forbidden errors.
The custom PAT may not be available on non-main branches.
The default GITHUB_TOKEN has Actions: read permission.
Match the main branch configuration. The PAT works on main branch,
issue is likely related to secret access on non-main branches.

# Nightly workflow files to monitor
self.nightly_workflows = [
"nightly-test.yml",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we deprecate nightly-test.yml in this change if this is being replaced?

… analyzer

Stop tracking old nightly jobs that have been replaced by hardware-specific workflows:
- Removed nightly-test.yml from monitored workflows
- Removed old job names (nightly-test-eval-text-models, nightly-test-perf-text-models, etc.)
- Now only tracking new NVIDIA/AMD/Intel specific job names
Copy link
Collaborator

@Kangyan-Zhou Kangyan-Zhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching nightly test runs from the last 2 days...
Fetching from nightly-test.yml...
Fetched 1000 runs from nightly-test.yml
Fetching from nightly-test-nvidia.yml...
Fetched 1000 runs from nightly-test-nvidia.yml
Fetching from nightly-test-amd.yml...
Fetched 1000 runs from nightly-test-amd.yml
Fetching from nightly-test-intel.yml...
Fetched 1000 runs from nightly-test-intel.yml
Total nightly runs fetched: 4000

Do we need to fetch 4k runs from nightly tests from the last 2 days? This seems a bit off

@alisonshao
Copy link
Collaborator Author

Fetching nightly test runs from the last 2 days... Fetching from nightly-test.yml... Fetched 1000 runs from nightly-test.yml Fetching from nightly-test-nvidia.yml... Fetched 1000 runs from nightly-test-nvidia.yml Fetching from nightly-test-amd.yml... Fetched 1000 runs from nightly-test-amd.yml Fetching from nightly-test-intel.yml... Fetched 1000 runs from nightly-test-intel.yml Total nightly runs fetched: 4000

Do we need to fetch 4k runs from nightly tests from the last 2 days? This seems a bit off

i will change this

Nightly tests run once per day, so fetching 1000 runs per workflow was excessive.
Now limited to fetch at most (days * 5) runs per workflow with smaller page size.
For 2 days: ~10 runs per workflow instead of 1000
Now nightly analysis will appear in the GitHub Actions summary showing:
- Overall statistics with daily trends table
- Job statistics with performance metrics
- Recent failures for each job

The daily trends table shows the date and success rate for easy tracking
@alisonshao
Copy link
Collaborator Author

@Kangyan-Zhou
Copy link
Collaborator

Kangyan-Zhou commented Nov 20, 2025

@alisonshao
Copy link
Collaborator Author

alisonshao commented Nov 20, 2025

CI Monitor run: https://github.com/sgl-project/sglang/actions/runs/19515827459/job/55867179387 @Kangyan-Zhou

image Is this expected?

when i use the default number of runs (1000) it will work fine, i only set it to 100 earlier to test faster: https://github.com/sgl-project/sglang/actions/runs/19532471855/job/55926166602

possible explanation: temporal sampling bias - the 100 most recent runs may not contain runs that have the specific job names the analyzer is looking for. Looking at the run numbers in the 100-run output, they appear to be much more recent runs (#35042, #18254, etc.), while the 1000-run output includes older runs with different run numbers. The test balance data (the timing information in the format filename='...', elapsed=..., estimated_time=...) is only present in certain types of workflow runs.

@Kangyan-Zhou Kangyan-Zhou merged commit 5a2c703 into main Nov 20, 2025
37 checks passed
@Kangyan-Zhou Kangyan-Zhou deleted the add-nightly-ci-monitor branch November 20, 2025 20:55
sys.exit(1)
else:
print("\n✓ No significant regressions detected")
sys.exit(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@alisonshao alisonshao Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will change it to: when regression detected, report regression instead of exit with error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yukavio pushed a commit to yukavio/sglang that referenced this pull request Nov 25, 2025
* [model-gateway] update workflow names for gateway and exclude npu (sgl-project#13415)

* [Tiny fix] Fix bench_speculative.py run bug (sgl-project#13416)

* [model-gateway] Add Gateway Release Tooling (sgl-project#13420)

* fix uneven PP layer indices (sgl-project#13282)

Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>

* diffusion: fix wan2.2 ti2v num_frames adjust logic (sgl-project#13379)

Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>

* [PD][bug fix] fix memleak when last_batch is none (sgl-project#13144)

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>

* Fix cache_tokens calculate issue when retracted (sgl-project#11900)

Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>

* [feature] Custom base path on FastAPI server (sgl-project#5879)

Co-authored-by: lianhu.yin <lianhu.yin@nio.com>
Co-authored-by: kebyn <kebyn@kebyn.cc>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

* Adding user defined hooks support (sgl-project#13217)

* Fix log time stats (sgl-project#13418)

* [Ci tiny fix] Lower score threshold in evaluation test (sgl-project#13443)

* diffusion: fix loading with local model_path (sgl-project#13445)

* [2/N] CI refactor: sperate some backend-independent CPU tasks. (sgl-project#13447)

* Temporarily disable model hooks CI (sgl-project#13450)

* [Deepseek V3.2] Use torch.compile to speed up torch.cat in nsa (sgl-project#13022)

Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>

* Remove verbs from GET endpoint paths to follow REST standards (sgl-project#13273)

* Add missing models (sgl-project#13456)

* extend sagemaker.Dockerfile serve script to allow all sglang serve flags (sgl-project#13173)

* Fix 8-gpu B200 nightly tests (sgl-project#13457)

* Fixes validation errors for Wan-AI models which store model weights in subdirectories (sgl-project#13461)

* [Embeddings Performance Testing] Add performance test for embedding models (sgl-project#12359)

* [NVIDIA] Fix broken fp8 MoE of deepseek v3 (sgl-project#13264)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>

* Temporarily comment out multimodal gen test to recover runners (sgl-project#13463)

* Update pr-test.yml to fix invalid job name error

* Add interface_v1 option for dynamic HiCache backend (sgl-project#13140)

Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>

* Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 (sgl-project#13455)

* fix MambaPool clear method after refactoring (sgl-project#13449)

* [AMD CI] Update sgl-router python path in dockerfile. (sgl-project#13458)

* [CI] re-enable test_vision_openai_server_a ci (sgl-project#13444)

* Adding CI Monitor Improvements (sgl-project#13462)

* [GLM4.6v] Required changes for bumping up to transformer 5.x (sgl-project#13229)

* [GLM4.6v] Relax the constraint of non-user role chat completion message schema for new GLM-v release (sgl-project#13258)

* [model-gateway] use worker startup time out for worker registration (sgl-project#13473)

* model: support JetVLM (sgl-project#13289)

* chore: add an unified server arg for multimodal inputs preprocess config(sgl-project#12149)

Co-authored-by: bianfeng <bianfeng@pinduoduo.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>

* [PD] Clarify init method docstrings for kvsender and kvreceiver (sgl-project#13476)

* Fix lora test (sgl-project#13479)

* [Piecewise CUDA Graph] Support ModelOpt FP8 (sgl-project#13094)

* CI: fix NFS EBUSY error in PR test workflow (sgl-project#13460)

Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [CI] fix triggered by a non-run-ci label (sgl-project#13393)

* [CI] remove auto-labeling `run-ci` label. (sgl-project#13486)

* fix: change performance log directory to cache path (sgl-project#13482)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [CI] Add input for pr-gate (sgl-project#13491)

* [opt kimi k2 3/n] opt kimi_k2 moe_fused_gate kernel (sgl-project#13374)

* [CI] fix lint yml (syntax error) (sgl-project#13496)

* [VLM][feat] Support encoder DP for Qwen2.5-VL (sgl-project#13126)

Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: liusy58 <xiehang.lsy@alibaba-inc.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>

* [HiCache] Critical fix to host memory double free (sgl-project#13501)

Co-authored-by: Hao Chen <cighao@gmail.com>

* [BugFix] Accuracy and function Issue when run ptpc quant model (sgl-project#13157)

Co-authored-by: yuechguo <yuechguo@amd.com>

* fix: create git tags directly instead of temporary branches (sgl-project#13168)

* Add .github/CI_PERMISSIONS.json to define the CI permissions (sgl-project#13509)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* README.md -> FOLDER_README.md (sgl-project#13510)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Use slash command to trigger CI (sgl-project#13512)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Add docs on trigger ci (sgl-project#13513)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* [Feature] Re:Enable hybrid mem saver (sgl-project#12962)

* Trigger CI retry with edit (sgl-project#13516)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Update docs (sgl-project#13519)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Add /tag-and-rerun-ci (sgl-project#13521)

* [CI] update pr-gate to be compatible with new slash triggering mananer. (sgl-project#13522)

* [CI] fix skipping pr-gate on main (sgl-project#13525)

* Small cleanups related to LoRA weight loading (sgl-project#13474)

* [CI] fix CI skipped on main (sgl-project#13527)

* [model-gateway] fix gateway docker build due to recent py code change (sgl-project#13532)

* [model-gateway] limit opened files in docker build to fix edge case (sgl-project#13536)

* [docker] fix dockerfile naming for diffusion (sgl-project#13534)

* fix lora test (sgl-project#13537)

* Remove jet-ai/Jet-Nemotron-2B in nightly text tests as this is constantly failing (sgl-project#13540)

* [fix] Fixes accuracy issues caused by incorrect use of rope (sgl-project#13495)

* Flashinfer TRTLLM-GEN-MoE + Qwen3 (sgl-project#13489)

* [chore] Disable ccache for sgl-kernel release (sgl-project#13541)

* Add Qwen/Qwen1.5-MoE-A2.7B to model list (sgl-project#13543)

* [Fix] Fix DeepSeek V3 MTP on B200 (sgl-project#13548)

* [router][grpc] Support num_reasoning_tokens in haromy models (sgl-project#13047)

* [feat][Ascend][Mindspore]: support model-impl of mindspore (sgl-project#9234)

* [AMD CI] Local cache fallback. (sgl-project#13452)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [CI] fix amd 1 gpu basic test (sgl-project#13551)

* [Doc] Update HiCache and Mooncake docs & Mooncake Setup Error Checking (sgl-project#12740)

* purge unnecessary env variable set in deterministic test (sgl-project#13481)

* chore: bump sgl-kernel version to 0.3.17.post2 (sgl-project#13542)

* Add `lmsys/gpt-oss-20b-bf16` to model validation check (sgl-project#13557)

* CI Failure Monitor Improvements (sgl-project#13558)

* [RL] Allow passing tensors of different dtypes for FlattenedTensorBucket (sgl-project#13413)

* [CI] Fix CUDA workflow's dependency. (sgl-project#13568)

* [NPU] Adapt pr-gate for pr-test workflow & workflows refresh (sgl-project#13567)

* Tiny enhance test suites sanity check (sgl-project#13589)

* [3/N] CI refactor: move some manually triggered tests. (sgl-project#13448)

* Support moe topk sigmoid kernel (sgl-project#13049)

Co-authored-by: xuebi <xuebi@minimaxi.com>

* Expend compatibility check for all quantized MoE models (sgl-project#13465)

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>

* add https://github.com/netanel-haber to CI_PERMISSIONS.json (sgl-project#13577)

* chore: bump sgl-kernel version to 0.3.17.post2 (sgl-project#13570)

* [Auto Sync] Update base_grammar_backend.py, collector.py (20251116) (sgl-project#13357)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>

* [GDN] Remove unnecessary contiguous() (sgl-project#13604)

* [GDN] Remove unnecessary conv state clone (sgl-project#13603)

* [VLM] Support Piecewise CUDA Graph for Qwen2.5-VL  (sgl-project#13055)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Yuhao Yang <yhyang201@gmail.com>

* [diffusion] CI: improve diffusion CI (sgl-project#13562)

Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>

* feat: support external custom models (sgl-project#13429)

Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Mick <mickjagger19@icloud.com>

* [CI fix] Fix image download failures in VLM CI tests (sgl-project#13613)

* [NVIDIA] Add fp8 gemm benchmark on blackwell (sgl-project#13528)

* [UT] Destroy process group after broadcast to resolve port occupation issues in multi-server tests (sgl-project#12379)

* [diffusion] refactor: remove PreprocessorConfig (sgl-project#13248)

* [diffusion] refactor: refactor pipeline folders (sgl-project#13253)

* Add FP32 dtype support for RoPE - Part2 (sgl-project#13328)

* [diffusion] fix: remove multimodal_gen redundant get_bool_env_var func (sgl-project#13583)

Co-authored-by: Mick <mickjagger19@icloud.com>

* Add support for new aiter version (AR accuracy, is_shuffled PR) (sgl-project#13554)

Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com>

* diffusion: improve baseline performance monitor (sgl-project#13614)

* [Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (sgl-project#13453)

* [CI] Align metric units for CI rate limit (sgl-project#13633)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ROCM] Optimized deepseek-r1 fp8 model with + triton_gemm_a8w8 + batch_gemm_a8w8 + fused set_mla_kv_buffer kernel (sgl-project#13617)

Co-authored-by: root <root@smci355-ccs-aus-m12-17.cs-aus.dcgpu>
Co-authored-by: jacky.cheng <yichiche@amd.com>

* fix bench_speculative bug (sgl-project#13197)

* Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel)" (sgl-project#13644)

* [CI] optimize CI workflow info (sgl-project#13634)

* CI: Kill zombie diffusion processes in CI & minor code style fix on rotary embedding fallback  (sgl-project#13637)

* [CI] apply pr-gate for XPU (sgl-project#13663)

* Add fused_rmsnorm_gated_cpu kernel for CPU to support Qwen3-Next (sgl-project#11577)

* [10/n] decouple quantization impl from vllm dependency - fix import (sgl-project#13524)

* Adding nightly tests as release guard for bot bump workflows (sgl-project#13655)

* [DeepseekV3.2] Deepseek fp8 support for MHA path (sgl-project#12964)

* Fix launch of `Olmo3` (sgl-project#13666)

Signed-off-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>

* [Deepseek V3.2] Change indexer weights_proj to fp32 (sgl-project#13459)

* enable csgmv automatically on cuda (sgl-project#13600)

* Add nightly test CI monitor workflow (sgl-project#13038)

* allow loras to be implicitly evicted and loaded based on max_loaded_loras (sgl-project#11526)

* Test reorganization: Move tests to manual/ (sgl-project#13610)

* [Piecewise CUDA Graph] Fix recompile issue for Mixtral and Grok2 (sgl-project#13667)

Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com>

* Super tiny remove unused MiniMaxM2MLP class (sgl-project#13659)

* Update quantization.md with new model resources (sgl-project#13677)

* [model-gateway] add both python and rust cli alias (sgl-project#13678)

* [diffusion] CI: improve validation method (sgl-project#13627)

* [model-gateway] fix gateway cli arg parser to not use = (sgl-project#13685)

* [CI] Move nightly tests to test/nightly/ (sgl-project#13683)

* [NVIDIA] Add cutedsl e2e test to GB200 CI (sgl-project#12672)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* Add sgl-kernel CI test for Blackwell (B200) (sgl-project#13301)

* remove unnecessary starvation check (sgl-project#13619)

* Fix target MLA with eagle3 support for PD disaggregation (sgl-project#13555)

Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>

* [kimi k2 thinking] Avoid useless torch.zeros_  (sgl-project#13596)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* [opt kimi k2 4 / n] Delete useless pad kernel in sgl_moe_align_block_size (sgl-project#13587)

* [VLM] Support Piecewise CUDA Graph for InternVL (sgl-project#13640)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [Piecewise Cuda Graph] rename, refactor and add more logging (sgl-project#13675)

Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com>

* [difusion] CI: speed up multimodal_gen ci (sgl-project#13665)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [diffusion] doc: minor update docs (sgl-project#13177)

* Fix ZMQ bind error on non-zero rank nodes when using SGLANG_BLOCK_NONZERO_RANK_CHILDREN=0 (sgl-project#13686)

* [diffusion] server: use meta to avoid Linear init for TextEncoder (sgl-project#13564)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [Auto Sync] Update http_server.py, io_struct.py, scheduler_... (20251120) (sgl-project#13679)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Zhuqi Li <zhli@x.ai>

* [Bugfix] Fix hidden state size in EAGLE PD disaggregation buffers (sgl-project#13590)

Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>

* [HiCache] fix unit test with changed new APIs (sgl-project#13498)

* [Fix] Qwen3Next lmhead dtype  (sgl-project#13708)

* [NPU] chore: bump to CANN 8.3.RC1 and Pytorch 2.8.0 (sgl-project#13647)

* [11/N] MoE Refactor: Simplifying SBO Implementation with Dispatcher Hooks (sgl-project#13327)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* [Clean code] Compressed_tensors_moe code clean (sgl-project#13719)

* [diffusion] profile: support performance metric dumping and comparison (sgl-project#13630)

* [AMD] Enable fused shared expert append and flatten quant for fp8 deepseekR1 model (sgl-project#13705)

Co-authored-by: yctseng0211 <yctseng@amd.com>

* [diffusion] doc: add contributing.md (sgl-project#13649)

* fix 3fs down, lock schedule main thread (sgl-project#13407)

* Fix url: use https://roadmap.sglang.io for roadmap (sgl-project#13733)

Co-authored-by: sglang-bot <sglangbot@gmail.com>

* Super tiny delete unused files (sgl-project#13734)

* [diffusion] log: minor improve logging (sgl-project#13735)

* [CI] minor hot fix of model validation list (sgl-project#13737)

* Add to ci permission (sgl-project#13739)

* [Piecewise CUDA Graph] Support Kimi-K2 (non-Thinking) (sgl-project#13466)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* Fix: CI monitor should not exit with error on regressions (sgl-project#13694)

* Revert "enable csgmv automatically on cuda" (sgl-project#13707)

* Support torch 12.9 + DeepEP by removing custom nvshmem (sgl-project#12949)

Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>

* add some more labels (sgl-project#13701)

Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>

* Feat/nemotron nano v3 support (sgl-project#12690)

* Fix global scaling factor loading hang (sgl-project#13484)

* Fix B200 Nightly tests and move one manual test back to unit test to prevent the same issue (sgl-project#13746)

* fix test_lora_update.py starvation message check (sgl-project#13702)

* Fix model weights validation with automatic cache cleanup (sgl-project#13729)

* [Auto Sync] Update evict_policy.py, radix_cache.py (20251120) (sgl-project#13669)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: cctry <shiyang@x.ai>

* [Tiny] Renaming environ for NVFP4 dispatch (sgl-project#13756)

* modularize gsm8k and mmmu test classes (sgl-project#13506)

* Use dual stream for DS MoE whenever cuda graph is used (instead of with token threshold) (sgl-project#9405)

* [Ascend] support Kimi-K2-Thinking (sgl-project#12759)

Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
Co-authored-by: richhuan <huan_rz@qq.com>
Co-authored-by: ZhengdQin <46387172+ZhengdQin@users.noreply.github.com>

* Refactor eagle bigram key matching (sgl-project#13714)

* [diffusion] fix: fix hunyuanvideo and add 2-gpu ci test  (sgl-project#13720)

Co-authored-by: Mick <mickjagger19@icloud.com>

* Update mem checker during busy (sgl-project#13704)

* Tiny support different prompts in `send_one.py` (sgl-project#13768)

* [diffusion] refactor: refactor sampling params (sgl-project#13706)

* [VLM] Replace torch.repeat_interleave with faster np.repeat for Qwen-VL series (sgl-project#13736)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* [Spec v2] Remove `allocate_lens` and enable over-allocation (sgl-project#13478)

* [diffusion] CI: tinyfix diffusion ci (sgl-project#13769)

Co-authored-by: Mick <mickjagger19@icloud.com>

* align code style eagle draft&draft_extend cuda graph runner (sgl-project#13533)

* Refactor MHA & MLA KV caches to support FP4 (sgl-project#13547)

Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>

* Move unnecessary input_addr capture under debug mode flag for speed-up (sgl-project#13690)

* Gather static input buffers for cuda graph (sgl-project#13676)

* Revert "Fix RMSNorm API CALL mismatch issue. (sgl-project#10032)" (sgl-project#13727)

* [model-gateway] update smg code owner (sgl-project#13777)

* [model-gateway] clean up router manager function order (sgl-project#13776)

* Fix typo in docs (sgl-project#13709)

* [Feature] HiCache JIT kernel (once again) (sgl-project#13764)

* [DeepEP] Add SGLANG_DEEPEP_BF16_DISPATCH env var in Normal mode (sgl-project#13787)

* Upgrade flashmla kernel for NSA tp support (sgl-project#13718)

* [diffusion] feat: support sp for image models (sgl-project#13180)

* [diffusion] CI: add run_suite to multimodal_gen CI (sgl-project#13791)

* Fix pagination bug in CI monitor preventing performance-test-2-gpu data collection (sgl-project#13781)

* [Scheduler] Tiny organize code style (sgl-project#13806)

* [Deepseek] Refactor deepseek server_args _handle_model_specific_adjustments (sgl-project#13687)

* [CI] Tiny refactoring sgl-kernel tests (sgl-project#13813)

* Tune fp8_w8a8 fused triton moe for GLM-4.6-FP8 (sgl-project#13815)

* make trtllm attn backend's init_forward_metadat non blocking (sgl-project#13802)

* remove package json which is not used (sgl-project#13810)

* [1/2] Refactor DeepGeem requant for FP8 Linear on Blackwell  (sgl-project#13601)

Co-authored-by: fy1214

* chore: bump sgl-kernel version to 0.3.18 (sgl-project#13816)

* xgrammar up version to 0.1.27 (sgl-project#13650)

* Fix bug: Incorrect variable used in rem_total_token_offset calculatio… (sgl-project#13201)

* [Doc] Refine fused_moe_triton configs doc (sgl-project#13820)

* Update MindSpore documentation (sgl-project#13656)

Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Refactor cache init logic (sgl-project#13800)

* [Bugfix] Add jit kernel files in packaging (sgl-project#13829)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Xu Yongfei <xuyongfei.xyf@antgroup.com>

* [diffusion] doc: minor update contributing.md with test section (sgl-project#13792)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [misc] Rename minilb install env & remove files & fix lint (sgl-project#13831)

* [diffusion] CI: send nightly-test outputs of diffusion to slack for correctness monitoring (sgl-project#13833)

Co-authored-by: Mick <mickjagger19@icloud.com>

* [chore]Upgrade flashinfer to 0.5.3 (sgl-project#13751)

* [Intel XPU]support xgrammar backend for intel xpu (sgl-project#13245)

* [sgl-kernel Code Clean] Remove useless lightning_attention kernel (sgl-project#13819)

* [VLM] Revise InternVL Piecewise CUDA Graph Supporting (sgl-project#13846)

Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>

* Fix TorchAO quant in VLM (sgl-project#13508)

Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>

* [Fix]: Adjust FutureMap's token_id_bufs Size to Prevent ChunkedPrefill's next_token_ids from Overwriting Previous Prefill Requests' next_token_id (sgl-project#13713)

Signed-off-by: vito.yy <vito.yy@antgroup.com>

* Fix: Safe RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (sgl-project#11871)

* [Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (sgl-project#13612)

Signed-off-by: lzy <tomlzy213@gmail.com>
Co-authored-by: lzy <tomlzy213@gmail.com>

* Tiny unpin uvloop for other backends (sgl-project#13858)

* [model-gateway] Refactor router e2e responses tests (sgl-project#13745)

Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>

* [Perf] Optimize DeepSeek-R1 w4afp8 glue kernels (sgl-project#10027)

Co-authored-by: Fan Yin <1106310035@qq.com>

* Fix quantized moe checker fail for Qwen3 dense fp8 model (sgl-project#13853)

* [model-gateway] add grpc server code owner (sgl-project#13865)

* [BugFix] fix outplace_fused_experts missing is_gated (sgl-project#13864)

* fix xgrammar_backend crash with malformed inputs (sgl-project#13752)

* [Auto Sync] Update schedule_batch.py, schedule_policy.py, b... (20251122) (sgl-project#13763)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Hanming Lu <hanming@x.ai>

* [Doc] Add an Introduction to Expert Parallelism (sgl-project#13783)

* add LoRA warning if loading a preexisting LoRA adapter with a different name (sgl-project#13822)

* [NPU] Fix NPU CI (sgl-project#13834)

Co-authored-by: c30031083 <chenxu140@huawei.com>

* Overlap glm moe gemms in two cuda streams (sgl-project#13786)

* [Performance] Replace preprocess_video logic from GLM  multimodal processor with transformer impl for speed up (up to 27% faster) and addressing OOM (up to 50x improvements) (sgl-project#13487)

* Add support for bf16 x bf16 cutlass fused MoE (sgl-project#10275)

Co-authored-by: Sam Li <lsam@nvidia.com>
Co-authored-by: jackeyhua <jackeyhuasjtu@gmail.com>

* [Router bugfix] Fix router_manager selecting the wrong router when enable-igw. (sgl-project#13572)

* Fix nightly test job to fail when any test fails (sgl-project#13871)

* [diffusion] refactor: remove training-related code (sgl-project#13860)

* [CI] fix multimodel-gen-test job (sgl-project#13874)

* [diffusion] CI: add validation and cleanup for corrupted safetensors in multimodal loader (sgl-project#13870)

Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [CI] fix lint error (sgl-project#13891)

* fix: draft model revision misuse model revision (sgl-project#11893)

* Fix trace publish paths in nightly-test-nvidia workflow (sgl-project#13888)

* Adding nightly tests for Kimi-K2-thinking, Qwen3, minimax-m2, GLM4.6 (sgl-project#13890)

* [Fix] JIT kernel dependencies in other platforms (sgl-project#13889)

* remove RoPE CPU fp32 tests (sgl-project#13827)

Co-authored-by: Fan Yin <1106310035@qq.com>

* Move test_dummy_grok_models.py from manual to srt (temporary) (sgl-project#13901)

* [CI tiny fix] Enhance robustness of vision chunked prefill test with ROUGE-L metric (sgl-project#13793)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* update flashinfer_cubin==0.5.3 (sgl-project#13848)

* fix

* fix

---------

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Signed-off-by: Michael Qiu <qiudayu.qdy@antgroup.com>
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Signed-off-by: vito.yy <vito.yy@antgroup.com>
Signed-off-by: lzy <tomlzy213@gmail.com>
Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: AlphaBaby <fujianhao1997@qq.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: adarshxs <adarsh.shirawalmath@gmail.com>
Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Mike Qiu <qdy220091330@gmail.com>
Co-authored-by: Mike_Qiu <qiudayu.qdy@antgroup.com>
Co-authored-by: kebyn <kebuyuni@gmail.com>
Co-authored-by: lianhu.yin <lianhu.yin@nio.com>
Co-authored-by: kebyn <kebyn@kebyn.cc>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: Carlo Mussolini <48855305+Carlomus@users.noreply.github.com>
Co-authored-by: Rain H <2510421000@qq.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Co-authored-by: Kangyan-Zhou <zky314343421@gmail.com>
Co-authored-by: Sirut Buasai <73297481+sirutBuasai@users.noreply.github.com>
Co-authored-by: Vedant V Jhaveri <vedantjh2@gmail.com>
Co-authored-by: Kaixi Hou <kaixih@nvidia.com>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: Douglas Yang <dyang@college.harvard.edu>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: Zijian Zhang <35801754+futrime@users.noreply.github.com>
Co-authored-by: wingedge <handkodu@gmail.com>
Co-authored-by: bianfeng <bianfeng@pinduoduo.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: b8zhong <b8zhong@uwaterloo.ca>
Co-authored-by: alisonshao <54658187+alisonshao@users.noreply.github.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Nicholas <45984215+liusy58@users.noreply.github.com>
Co-authored-by: liusy58 <xiehang.lsy@alibaba-inc.com>
Co-authored-by: Yuan Luo <yuan.luo@hotmail.com>
Co-authored-by: Hao Chen <cighao@gmail.com>
Co-authored-by: Morpheus Guo <yuechao.guo@amd.com>
Co-authored-by: yuechguo <yuechguo@amd.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: sglang-bot <sglangbot@gmail.com>
Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com>
Co-authored-by: Glen Liu <62917497+glenliu21@users.noreply.github.com>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: gongwei-130 <56567052+gongwei-130@users.noreply.github.com>
Co-authored-by: Baidu-AIAK <Baidu_AIAK@163.com>
Co-authored-by: Chen Haozhe <c-34@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ykwd <oneday117@qq.com>
Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: Roger Young <42564206+rogeryoungh@users.noreply.github.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sehoon Kim <sehoon@x.ai>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: Yuhao Yang <yhyang201@gmail.com>
Co-authored-by: StonyPort <157573149+zhooooong@users.noreply.github.com>
Co-authored-by: qiuxuan.lzw <qiuxuan.lzw@alibaba-inc.com>
Co-authored-by: Zeyu Li <li_zeyu@pku.edu.cn>
Co-authored-by: iLeGend <824040212@qq.com>
Co-authored-by: joesun <shauntajoesph@gmail.com>
Co-authored-by: Thomas Wang <1am9trash@gmail.com>
Co-authored-by: sogalin <39478626+sogalin@users.noreply.github.com>
Co-authored-by: DarkSharpness <76582120+DarkSharpness@users.noreply.github.com>
Co-authored-by: yctseng0211 <yctseng@amd.com>
Co-authored-by: root <root@smci355-ccs-aus-m12-17.cs-aus.dcgpu>
Co-authored-by: jacky.cheng <yichiche@amd.com>
Co-authored-by: Lzhang-hub <57925599+Lzhang-hub@users.noreply.github.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
Co-authored-by: YAMY <74099316+YAMY1234@users.noreply.github.com>
Co-authored-by: Vincent Zhong <207368749+vincentzed@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Oasis-Git <ayw.sirius19@gmail.com>
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
Co-authored-by: 赵晨阳 <zhaochen20@outlook.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: zyksir <zhuyikai.zyk@gmail.com>
Co-authored-by: Zhuqi Li <zhli@x.ai>
Co-authored-by: Michele Marzollo <37903931+michelemarzollo@users.noreply.github.com>
Co-authored-by: ZeldaHuang <hzm414167@alibaba-inc.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: weibingo <weibing_lai@163.com>
Co-authored-by: Jiajun Li <48857426+guapisolo@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@users.noreply.github.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: roikoren755 <26850796+roikoren755@users.noreply.github.com>
Co-authored-by: Shu Wang <shuw@nvidia.com>
Co-authored-by: cctry <shiyang@x.ai>
Co-authored-by: Trevor Morris <tmorris@nvidia.com>
Co-authored-by: Yijie Zhu <762412795@qq.com>
Co-authored-by: ZhengdQin <zhengdqin@gmail.com>
Co-authored-by: richhuan <huan_rz@qq.com>
Co-authored-by: ZhengdQin <46387172+ZhengdQin@users.noreply.github.com>
Co-authored-by: yinghui <32845984+cicirori@users.noreply.github.com>
Co-authored-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Co-authored-by: ErsongWang <158176536+ErsongWang@users.noreply.github.com>
Co-authored-by: Peiqi Yin <60515999+yinpeiqi@users.noreply.github.com>
Co-authored-by: Swipe4057 <106391009+Swipe4057@users.noreply.github.com>
Co-authored-by: liuhuijiayou <46172426+liuhuijiayou@users.noreply.github.com>
Co-authored-by: Tiance Wang <wangtiance@gmail.com>
Co-authored-by: wangtiance <tiancew@qq.com>
Co-authored-by: Xu Yongfei <xuyongfei.xyf@antgroup.com>
Co-authored-by: gaopengff <pengfei.gao@intel.com>
Co-authored-by: ant-yy <vito.yy@antgroup.com>
Co-authored-by: Zhi Yiliu <2584074296@qq.com>
Co-authored-by: lzy <tomlzy213@gmail.com>
Co-authored-by: Xinyue Zhang <xinyue.zhang@oracle.com>
Co-authored-by: Yuhao Yao <37280700+yuhyao@users.noreply.github.com>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
Co-authored-by: Hanming Lu <hanming@x.ai>
Co-authored-by: c30031083 <chenxu140@huawei.com>
Co-authored-by: Nicolas Castet <26874160+nvcastet@users.noreply.github.com>
Co-authored-by: Sam Li <lsam@nvidia.com>
Co-authored-by: jackeyhua <jackeyhuasjtu@gmail.com>
Co-authored-by: Siyuan Chen <41201609+SYChen123@users.noreply.github.com>
Co-authored-by: Yibo Cai <cyb70289@gmail.com>
Co-authored-by: Yibo Cai <yibo.cai@arm.com>
Co-authored-by: Zaili Wang <109502517+ZailiWang@users.noreply.github.com>
Co-authored-by: josephyou <josephyou@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants