[CI] Fix PD mooncake dependency error by ShangmingCai · Pull Request #6212 · sgl-project/sglang

ShangmingCai · 2025-05-12T05:50:12Z

Motivation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

.github/workflows/pr-test.yml

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

zhyncs · 2025-05-12T07:58:48Z

Do we need to delete this https://github.com/sgl-project/sglang/blob/main/scripts/ci_install_dependency_8_gpu.sh ? @ShangmingCai @merrymercy @liz-badada

liz-badada · 2025-05-12T09:04:26Z

Do we need to delete this https://github.com/sgl-project/sglang/blob/main/scripts/ci_install_dependency_8_gpu.sh ? @ShangmingCai @merrymercy @liz-badada

This is for 8 GPUs test only (including DeepEP test), do we need to move DeepEP depenencies (pretty time consuming) into ci_install_dependency.sh?

ShangmingCai · 2025-05-12T09:24:50Z

Do we need to delete this https://github.com/sgl-project/sglang/blob/main/scripts/ci_install_dependency_8_gpu.sh ? @ShangmingCai @merrymercy @liz-badada

This is for 8 GPUs test only (including DeepEP test), do we need to move DeepEP depenencies (pretty time consuming) into ci_install_dependency.sh?

@zhyncs I think we should keep it, it is required for many tests. But we don't have to run it for each per-commit suite because the CI can reuse the prepared env. I believe that's the reason why we switched to ci_install_dependency.sh.

Also, we should not move the logic into ci_install_dependency.sh since it brings back the time-out problem. But we should keep this script in case we restart the container and need to prepare the env from 0.

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

* Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (sgl-project#5728) * [fix] fix potential bumpy throughtput with deepgemm (sgl-project#5722) * Resolves the `404 Not Found` error when running `compile_deep_gemm.py` in multi-node setups (sgl-project#5720) * perf: update H20 fused_moe_triton kernel config to get higher throughput during prefilling (sgl-project#5716) * we fix the non existent access of `decrypted_config_file` (sgl-project#5685) * CI: rewrite test_vision_chunked_prefill to speedup (sgl-project#5682) * Fuse MLA set kv cache kernel (sgl-project#5748) * Update amd docker image to `sglang:v0.4.5.post3-rocm630`. (sgl-project#5697) * [feature] support for roberta embedding models (sgl-project#5730) * [fix] fix bench_one_batch_server (sgl-project#5607) * support for the DeepSeek model by enabling streaming response parsing (sgl-project#5592) * fix: Use `is not None` instead of `!= None` for None checks. (sgl-project#5687) * Add Llama 4 to FA3 test (sgl-project#5509) * [misc] more decode step log for batch_one_batch (sgl-project#5565) * Handle JSONDecodeError while processing request data (sgl-project#5599) * fix(srt): check if sample_indices is not None before usage. (sgl-project#5633) * update llguidance to 0.7.11; adds StructTag (sgl-project#4870) * Use sgl-kernel sgl_per_token_group_quant_int8 (sgl-project#4971) * Add memory_saver check (sgl-project#4986) Signed-off-by: Kebe <mail@kebe7jun.com> * add switch to disable open api doc (sgl-project#3744) Signed-off-by: congcongke <zhanweidu@163.com> * Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512" (sgl-project#5772) * Fix eagle test case (sgl-project#5776) * Split local attention test from fa3 test (sgl-project#5774) * Revert "Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512"" (sgl-project#5777) * Simplify FA3 tests (sgl-project#5779) * Revert "[fix] fix bench_one_batch_server" (sgl-project#5785) * Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (sgl-project#5786) * [CI] Tune threshold (sgl-project#5787) * [CI] fix port conflicts (sgl-project#5789) * [CI] Fix ci tests (sgl-project#5769) * [PD]Reduce kv transfer threads (sgl-project#5791) * [CI] Fix test case (sgl-project#5790) * Add 8-GPU Test for Deepseek-V3 (sgl-project#5691) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Release v0.4.6 (sgl-project#5795) * Update nightly-test.yml (sgl-project#5797) * [CI] Improve github summary & enable fa3 for more models (sgl-project#5796) * [Docs] update grafana setup guide in production metrics (sgl-project#5643) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com> * [Misc] add structure logging, write to file and log tracing for SGL Router * Improve overlap scheduling (sgl-project#5788) * Add Cutlass MLA attention backend (sgl-project#5390) * chore: upgrade sgl-kernel 0.1.0 (sgl-project#5690) * Dockerfile.dev pip scikit_build_core (sgl-project#5807) * Add a doc to fix sgl-kernel build link error in py39 with ccache (sgl-project#5809) * Turn on overlap scheduler for multimodal models (sgl-project#5771) * Tiny refactor DefaultModelLoader.Source (sgl-project#5482) * [Docs] Replace lists with tables for cleanup and readability in server_arguments (sgl-project#5276) * Revert "Tiny refactor DefaultModelLoader.Source" (sgl-project#5825) * Feat: add support for thinking mode via chat_template_kwargs.enable_t… (sgl-project#5551) Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> * fix: fix the error where the content is None when reasoning and tool … (sgl-project#5838) * feat: Add fused moe triton config for qwen3 moe on h100 (sgl-project#5833) * fused moe triton tuning script support qwen3 (sgl-project#5842) * feat: Add fused moe triton config for qwen3bf16 moe on h20 (sgl-project#5839) * [PD] support pd fake transfer for warmup (sgl-project#5726) * [config] qwen3moe_tune_h20 fp8 tp4 (sgl-project#5846) * [Doc] Recover history of server_arguments.md (sgl-project#5851) * feat: Add fused moe triton config for qwen3-30b-fp8 moe on h20 (sgl-project#5850) * [CI] test chunked prefill more (sgl-project#5798) * ROCm: update AITER (sgl-project#5816) * [Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (sgl-project#5847) Co-authored-by: sighingnow <sighingnow@gmail.com> * [Fix] Missing bootstrap_port field (sgl-project#5823) * feat: update is_fa3_default_architecture (sgl-project#5854) * add fused moe config for qwen3moe fp8/bf16 (sgl-project#5849) * chore: bump v0.4.6.post1 (sgl-project#5845) * Support `max_completion_tokens` for OpenAIChatCompletions (sgl-project#5857) * simplify fused_moe config logging (sgl-project#5801) * [CI] tune the test order to warmup the server (sgl-project#5860) * Cutlass MLA decode - fix dtype error (sgl-project#5868) * cutlass 3.9 supported to improve fp8_blockwise_gemm (sgl-project#5820) * [Feature] support auto chat template (sgl-project#4949) * Feat: support cuda graph for LoRA (sgl-project#4115) Co-authored-by: Beichen Ma <mabeichen12@gmail.com> * Add qwen3 30b fused moe config (sgl-project#5859) * [Fix] Fix a bug for flashmla to run R1 model (sgl-project#5875) Co-authored-by: pengcuo <dgpengcuo@gmail.com> * Add A800 fused moe config for qwen3 30b (sgl-project#5880) * [Misc] add service discovery for sgl router * [fix]: PyO3 macOS linking and consolidate on tracing for logging * chore: update Dockerfile (sgl-project#5894) * [Docs] Update docs for Qwen3 and Qwen3MoE (sgl-project#5836) * [Doc] Tables instead of bulletpoints for sampling doc (sgl-project#5841) * chore: update CODEOWNERS (sgl-project#5895) * [FEATURE] Enhance platform compatibility for ARM (sgl-project#5746) * [CI] Add test_function_calling.py to run_suite.py (sgl-project#5896) * Auto set draft model path for MTP (sgl-project#5793) * [fix] relax mem_fraction_static for h200 (sgl-project#5893) Co-authored-by: alcanerian <alcanerian@gmail.com> * feat: support pythonic tool call and index in tool call streaming (sgl-project#5725) * [Bugfix]: fix missing queue_time_start for requests from grammar_queue (sgl-project#5696) * Add AMD MI300x Nightly Testing. (sgl-project#5861) * chore: use torch 2.6 for sgl-kernel build (sgl-project#5898) * Fix check_env script (sgl-project#5901) * [PD] Fix Assertion failed: /DeepEP/csrc/kernels/internode.cu:483, condition: ibgda_get_state()->num_rc_per_pe >= num_channels sgl-project#134 (sgl-project#5830) * Bump Flashinfer to 0.2.5 (sgl-project#5870) Co-authored-by: Yuhao Chen <yxckeis8@gmail.com> * [Fix] Unload lora in HF_Runner if needed (sgl-project#5899) * Add A800 fused moe config for qwen3 235b (sgl-project#5900) * Add sm_120 for blackwell (sgl-project#5903) * [Feature] add support kimi vl model (sgl-project#5383) Co-authored-by: wenju.li <wenju.li@deepctr.cn> * support vlm benchmark profile (sgl-project#5905) * [fix] kimi-vl test in test_vision_openai_server.py (sgl-project#5910) * [Misc] use parallel build for cmake in sgl-kernel (sgl-project#5919) * [qwen3] support qwen3 ep moe (sgl-project#5917) Co-authored-by: sleepcoo <sleepcoo@gmail.com> * Add TP2 MOE benchmarks for AMD. (sgl-project#5909) * [Feat] Scale up fa3 kernel to sm8x arch (sgl-project#5912) Co-authored-by: zhyncs <me@zhyncs.com> * chore: bump sgl-kernel 0.1.1 (sgl-project#5932) * chore: upgrade sgl-kernel 0.1.1 (sgl-project#5933) * Remove unused method `calculate_num_image_tokens` from qwen2_vl.py (sgl-project#5783) * [PP] Add pipeline parallelism (sgl-project#5724) * Fix lora batch processing when input lora_path contains None (sgl-project#5930) * add Thor & Spark (sgl-project#5915) * fix: correct stream response when enable_thinking is set to false (sgl-project#5881) * fix: update model runner (sgl-project#5934) * chore: bump v0.4.6.post2 (sgl-project#5939) * Support XiaomiMiMo/MiMo model inference (sgl-project#5921) * [PD] Vectorise group_concurrent_contiguous in NumPy (sgl-project#5834) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * Remove extra contiguous (sgl-project#5953) * Update ci test and doc for MTP api change (sgl-project#5952) * docs: Fix Qwen model typo (sgl-project#5944) Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com> * Optimize a pad operation to accelerate 25us (sgl-project#5945) * Properly return error response in vertex_generate HTTP endpoint (sgl-project#5956) * feat: add concurrency evaluation logic in mmmu benchmark (sgl-project#5782) * Add 1 gpu perf and 2 gpu accuracy tests for AMD MI300x CI. (sgl-project#5960) * feat: Refactor DeepSeekV3 function call (sgl-project#5908) * Remove token in token out in Native API (sgl-project#5967) * Support InternVL3 (sgl-project#5350) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> * Support MMMU benchmark for InternVL (sgl-project#5968) * FA3 speed up: skip len operation and get batch size directly from forward batch (sgl-project#5969) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com> * [PD] NIXL backend Prefill TP & Decode TP+DP (sgl-project#5681) * Fix set kv cache multi-stream (sgl-project#5975) * Overlap qk norm with two streams (sgl-project#5977) * fix: only upgrade nccl for cu128 (sgl-project#5986) * Fix Phi3 serving which was broke by earlier change (sgl-project#5991) Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> * [perf] H100 DeepSeek-V3 fused moe tuned config (sgl-project#5998) * [Fix] Suppress dynamo logging when using flashinfer backend with torch compile (sgl-project#5992) * [Minor] Fix duplicate method definitions in conversation.py (sgl-project#6012) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com> * Fix flaky issues of lora and add multi batch tests (sgl-project#5957) * Tool Call: Add `chat_template_kwargs` documentation (sgl-project#5679) * fix: fix broadcast_pyobj breaking VerlEngine (sgl-project#5997) * [PD] Allow customizing reserved tokens to avoid KV cache waste (sgl-project#6002) * Update dev container config to support live code sync and improve docker setup guide (sgl-project#6018) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com> * [PD] Optimize disaggregation ib device help info (sgl-project#5781) * [Test] Add flashmla attention backend test (sgl-project#5587) * Fix "Avoid computing lse in Ragged Prefill when there's no prefix match" (sgl-project#5555) * feat: Add a unified merge_state API (sgl-project#5428) * feat: append more comprehensive fields in messages instead of merely role and content (sgl-project#5996) * [Security][Bug] Prevent binding to all TCP interfaces (sgl-project#5752) * Fix prefill OOM error in the case of large page size (sgl-project#5081) * Fix problem of large page size with chunked prefill (sgl-project#6046) * docs: add Google Cloud Vertex AI in Adoption and Sponsorship (sgl-project#6047) * docs: add new blog (sgl-project#6048) * Fix not "import os" (sgl-project#6057) * Better PD initialization (sgl-project#5751) * fix: deepep dockerfile, use pip install deepep. (sgl-project#5885) * [Fix] Fix and rename flashmla CI test (sgl-project#6045) * chore: upgrade cutlass 3.9.2 (sgl-project#6004) Co-authored-by: yizhang2077 <1109276519@qq.com> * Fix sgl-kernel build on aarch64 platforms (sgl-project#6062) * Add DeepEP to CI PR Test (sgl-project#5655) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> * fix custom_allreduce namespace (sgl-project#6039) * feat: add release workflow for SGLang kernels on aarch64 (sgl-project#6010) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> * [Feature] Support for Ascend NPU backend (sgl-project#3853) Signed-off-by: Song Zhang <gepin.zs@antgroup.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> * Fix the timeout for 8 gpu tests (sgl-project#6084) * Hint users DeepEP normal mode is incompatible with CUDA Graph (sgl-project#5014) * Super tiny fix doc (sgl-project#5233) * [Doc]Fix description for dp_size argument (sgl-project#6063) * feat(engine): add bootstrap parameters to generate methods (dynamo) (sgl-project#6075) * [refactor] slightly tidy fp8 module (sgl-project#5993) * Clean up fa3 test from 8 gpus (sgl-project#6105) * Deferring 8 GPU test (sgl-project#6102) * Update doc for MLA attention backends (sgl-project#6034) * Clean logs for DeepSeek-V3 launching (sgl-project#6079) * [CI]Add performance CI for VLM (sgl-project#6038) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * adding Triton configs for DeepSeekV3 FusedMoE kernel on Blackwell (sgl-project#6111) * optimize pad operations in fa3 to accelarate 100+us (sgl-project#6077) * Overlap shared expert and routed expert computations (sgl-project#5121) * Tiny refactor ModelConfig.from_server_args (sgl-project#5219) * Tiny refactor weight loading logic (sgl-project#5232) * [PD] Add control to slow down a server (sgl-project#5572) * Change AMD test threshold (sgl-project#6091) * DeepEP normal support deepgemm-contiguous (sgl-project#5626) Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Xuting Zhou <xutingz@nvidia.com> Co-authored-by: ZhengHSI <zhenghsi@qq.com> * [fix] fix pyproject.toml dependencies (sgl-project#6119) * [Feature] Add FlashAttention3 as a backend for VisionAttention (sgl-project#5764) Co-authored-by: othame <chenzhu_912@zju.edu.cn> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Yi Zhang <1109276519@qq.com> * [perf] dsv3 bmm fallback to bf16 (sgl-project#5662) * [AMD] switch to custom allreduce regardless of MSCCL setting on ROCm (sgl-project#6097) * [sgl-kernel] fix: fix cu118 compile error (sgl-project#6123) Co-authored-by: zhyncs <me@zhyncs.com> * upgrade xgrammar to 0.1.19 (sgl-project#6129) * Remove unecessary is_fa3_supported check (sgl-project#6112) * chore: bump sgl-kernel 0.1.2 (sgl-project#6131) * docs: update README (sgl-project#6132) * [Fix] Incorrect Memory Allocation on CUDA:0 by Non-Zero CUDA Processes in TP/DP (sgl-project#5745) * Cutlass MLA: Disable split kv due to NVIDIA/cutlass#2274 (sgl-project#6101) * opt flashinfer mla cat (sgl-project#5822) Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com> * Update amd nightly concurrency. (sgl-project#6141) * feat: add thinking_budget (sgl-project#6089) * [Bugfix] Fix Llama4 gibberish output with long context and CUDA graph (sgl-project#6162) * fix bug that gpu0 occupies more memory when hicache is turned on (sgl-project#5778) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> * chore: bump v0.4.6.post3 (sgl-project#6165) * KV‑Cache (MHA, MLA): add missing start_layer / end_layer fields to MHATokenToKVPoolHost and MLATokenToKVPoolHost (sgl-project#6016) Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com> Co-authored-by: chus-chus <chus-chus@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> * [fix] fix determine_n_share_experts_fusion (sgl-project#6118) * Fix and Clean up chat-template requirement for VLM (sgl-project#6114) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> * [Docs]Delete duplicate content (sgl-project#6146) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com> * Revert "feat: add thinking_budget (sgl-project#6089)" (sgl-project#6181) * Added async_encode method to Engine (sgl-project#4701) * Fix data parallel perf regression (sgl-project#6183) * Fix request abortion (sgl-project#6184) * Add typo checker in pre-commit (sgl-project#6179) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> * Remove duplicate IO Struct test (sgl-project#6180) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * [PD] Add simple unit test for disaggregation feature (sgl-project#5654) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * [CI] Disabled deepep tests temporarily because it takes too much time. (sgl-project#6186) * feat: support loogle eval (sgl-project#6190) * [fix] remove mixtral from is_fa3_default_architecture (sgl-project#6191) * fix: handle None multimodal_inputs during merging and filtering batches in disaggregation decode mode (sgl-project#6169) * chore: upgrade deepgemm (sgl-project#6073) * chore: bump sgl-kernel v0.1.2.post1 (sgl-project#6195) * chore: upgrade sgl-kernel v0.1.2.post1 (sgl-project#6196) Co-authored-by: alcanderian <alcanderian@gmail.com> * Handle empty input string for embedding models (sgl-project#5621) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local> * doc: fix the erroneous documents and example codes about Alibaba-NLP/gme-Qwen2-VL-2B-Instruct (sgl-project#6199) * [Docs] minor Qwen3 and reasoning parser docs fix (sgl-project#6032) * Improve structured outputs: fix race condition, server crash, metrics and style (sgl-project#6188) * [CI] Reorganize the 8 gpu tests (sgl-project#6192) * Add dev-deepep docker image (sgl-project#6198) * Replace time.time() to time.perf_counter() for benchmarking. (sgl-project#6178) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com> * Update README.md (sgl-project#6202) * Fix release-docs.yml to not use python 3.9 (sgl-project#6204) * Fix start_profile does not support with_stack and record_shapes (sgl-project#6043) * [doc] add a note for --n-share-experts-fusion args (sgl-project#6154) * Performing Vocabulary Parallelism for LM Head across Attention TP Groups (sgl-project#5558) Co-authored-by: liusy58 <liusy58@linux.alibaba.com> * Update AMD CI docker to v0.4.6.post3-rocm630. (sgl-project#6213) * Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (sgl-project#6201) Co-authored-by: SangBin Cho <rkooo567@gmail.com> * [CI] Fix PD mooncake dependency error (sgl-project#6212) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * [CI] Re-enable pd disaggregation test (sgl-project#6231) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * fix some typos (sgl-project#6209) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> * [Docs] Add docs for `SGLANG_` and `SGL_` environment variables (sgl-project#6206) * [PP] Fix init_memory_pool desync & add PP for mixtral (sgl-project#6223) * Revert "fix some typos" (sgl-project#6244) * chore: add hf_xet dep (sgl-project#6243) * Update AMD nightly deps. (sgl-project#6241) * [PD] Add support for different TP sizes per DP rank (sgl-project#5922) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * Support incremental streaming of logprob/token_ids between scheduler and detokenizer (sgl-project#6225) Co-authored-by: SangBin Cho <rkooo567@gmail.com> * fix typo (sgl-project#6248) * Support tuning moe for llama 4 model (sgl-project#6042) * Skip the flaky test_stateful_custom_logit_processor (sgl-project#6251) * [Llama4] Add docs note about enable multimodal (sgl-project#6235) * [VERL Use Case] Add torch_memory_saver into deps (sgl-project#6247) * Fix two issues related to `--moe-dense-tp-size=1` (sgl-project#5657) Co-authored-by: liusy58 <liusy58@linux.alibaba.com> Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com> * model(vlm): pixtral (sgl-project#5084) * [misc] deep_gemm fallback to NVRTC when NVCC not found (sgl-project#6252) * Enable MI325X AMD CI. (sgl-project#6259) * chore: bump v0.4.6.post4 (sgl-project#6245) * formatting fix for the rebased commit for 4.6.0_post4 Signed-off-by: Mohit Sinha <msinha@habana.ai> * fix issues in model runner and python packages fix for following issues: > vLLM dependency for xgrammar==0.1.17 > 'Scheduler' object has no attribute 'device > 'pp_proxy_tensors' unexpected arg in HPUGraphRunner > TODO: Add pipeline parallelism support in HPUGraphRunner Signed-off-by: Mohit Sinha <msinha@habana.ai> * fix formatting in model runner Signed-off-by: Mohit Sinha <msinha@habana.ai> * base grammar fix for the is_terminated case > 'OutlinesGrammar' object has no attribute 'is_terminated' Signed-off-by: Mohit Sinha <msinha@habana.ai> --------- Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: congcongke <zhanweidu@163.com> Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com> Signed-off-by: Lifu Huang <lifu.hlf@gmail.com> Signed-off-by: Song Zhang <gepin.zs@antgroup.com> Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Wenxuan Tan <wtan45@wisc.edu> Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com> Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com> Co-authored-by: saltyfish66 <38240284+saltyfish66@users.noreply.github.com> Co-authored-by: vzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Ke Bao <ISPObaoke@163.com> Co-authored-by: saienduri <saimanas.enduri@amd.com> Co-authored-by: DavidBao <121073073+DavidBao03@users.noreply.github.com> Co-authored-by: Frankey_8080 <32973306+Frank-Jie@users.noreply.github.com> Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: yan97ao <580776+yan97ao@users.noreply.github.com> Co-authored-by: aoshen524 <aoshen524@gmail.com> Co-authored-by: Michał Moskal <michal@moskal.me> Co-authored-by: lambert0312 <lambert80.ios@gmail.com> Co-authored-by: Kebe <mail@kebe7jun.com> Co-authored-by: zhanweidu <zhanweidu@163.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Huapeng Zhou <73010314+PopSoda2002@users.noreply.github.com> Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com> Co-authored-by: Simo Lin <linsimo.mark@gmail.com> Co-authored-by: Trevor Morris <tmorris@nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Michael Yao <haifeng.yao@daocloud.io> Co-authored-by: mlmz <54172054+minleminzui@users.noreply.github.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: JiLi <leege233@gmail.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: sighingnow <sighingnow@gmail.com> Co-authored-by: XTY <xutianyi1999@live.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: woodx <124784234+woodx9@users.noreply.github.com> Co-authored-by: Qiaolin Yu <qy254@cornell.edu> Co-authored-by: Beichen Ma <mabeichen12@gmail.com> Co-authored-by: pengcuo <pengcbupt@163.com> Co-authored-by: pengcuo <dgpengcuo@gmail.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: simveit <69345428+simveit@users.noreply.github.com> Co-authored-by: Johnny <johnnync13@gmail.com> Co-authored-by: alcanerian <alcanerian@gmail.com> Co-authored-by: Yuhao Chen <yxckeis8@gmail.com> Co-authored-by: zhjunqin <zhjunqin@users.noreply.github.com> Co-authored-by: liwenju0 <like4hub@gmail.com> Co-authored-by: wenju.li <wenju.li@deepctr.cn> Co-authored-by: laixin <xielx@shanghaitech.edu.cn> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: ryang <38470282+ryang-max@users.noreply.github.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: 江家瑋 <36886416+JiangJiaWei1103@users.noreply.github.com> Co-authored-by: KCFindstr <shimakaze@google.com> Co-authored-by: xm:D <38322020+xiaomin-D@users.noreply.github.com> Co-authored-by: Lifu Huang <lifu.hlf@gmail.com> Co-authored-by: Yongtong Wu <914554688@qq.com> Co-authored-by: Junrong Lin <33685709+ocss884@users.noreply.github.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: Hank Han <54751605+HanHan009527@users.noreply.github.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Jinyan Chen <93358689+liz-badada@users.noreply.github.com> Co-authored-by: Jinyan Chen <jinyanc@nvidia.com> Co-authored-by: Johnny <johnnynuca14@gmail.com> Co-authored-by: Song Zhang <70674731+botieking98@users.noreply.github.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> Co-authored-by: Xuting Zhou <xutingz@nvidia.com> Co-authored-by: ZhengHSI <zhenghsi@qq.com> Co-authored-by: Zhu Chen <51010608+Othame@users.noreply.github.com> Co-authored-by: othame <chenzhu_912@zju.edu.cn> Co-authored-by: Hubert Lu <55214931+hubertlu-tw@users.noreply.github.com> Co-authored-by: Yixin Dong <ubospica@gmail.com> Co-authored-by: xu-yfei <xu_yfei@qq.com> Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com> Co-authored-by: thyecust <tienhoayu@gmail.com> Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com> Co-authored-by: Simon (Jiyou) Li <Simon-Li@users.noreply.github.com> Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com> Co-authored-by: chus-chus <chus-chus@users.noreply.github.com> Co-authored-by: Ximingwang-09 <72070413+Ximingwang-09@users.noreply.github.com> Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com> Co-authored-by: Steven Shimizu <shimizust@gmail.com> Co-authored-by: applesaucethebun <113181361+applesaucethebun@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Yusong Gao <yusong.gao@gmail.com> Co-authored-by: alcanderian <alcanderian@gmail.com> Co-authored-by: Ravi Theja <ravi03071991@gmail.com> Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local> Co-authored-by: liusy58 <liusy58@linux.alibaba.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com> Co-authored-by: Kiv Chen <34561254+KivenChen@users.noreply.github.com>

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

[CI] Fix PD mooncake dependency error

ff7f86b

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

zhyncs approved these changes May 12, 2025

View reviewed changes

tmp fix ci env

56a4237

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

merrymercy reviewed May 12, 2025

View reviewed changes

.github/workflows/pr-test.yml Outdated Show resolved Hide resolved

revert pr-test modification and fix dependency

c416af0

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

revert apt install

429e5f1

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

zhyncs approved these changes May 12, 2025

View reviewed changes

zhyncs merged commit 0f33494 into sgl-project:main May 12, 2025
20 checks passed

lifuhuang pushed a commit to lifuhuang/sglang that referenced this pull request May 17, 2025

[CI] Fix PD mooncake dependency error (sgl-project#6212)

647b626

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

[CI] Fix PD mooncake dependency error (sgl-project#6212)

4764fb8

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

[CI] Fix PD mooncake dependency error (sgl-project#6212)

bc62de9

Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>

ShangmingCai deleted the fix_pd_fail branch June 19, 2025 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[CI] Fix PD mooncake dependency error#6212

[CI] Fix PD mooncake dependency error#6212
zhyncs merged 4 commits intosgl-project:mainfrom
kvcache-ai:fix_pd_fail

ShangmingCai commented May 12, 2025

Uh oh!

Uh oh!

zhyncs commented May 12, 2025

Uh oh!

liz-badada commented May 12, 2025

Uh oh!

ShangmingCai commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

ShangmingCai commented May 12, 2025

Motivation

Modifications

Checklist

Uh oh!

Uh oh!

zhyncs commented May 12, 2025

Uh oh!

liz-badada commented May 12, 2025

Uh oh!

ShangmingCai commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants