Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs …#5426
Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs …#5426hnyls2002 merged 4 commits intosgl-project:mainfrom
Conversation
…may be occupied by other processes.'
zhyncs
left a comment
There was a problem hiding this comment.
First of all, even if we want to change it to a log, it should use a warning level instead of info. Secondly, it should only take effect in RL scenarios, the normal cases should remain unchanged.
ok, I'll change it to the warning level , and maybe I can set a flag for rl scenarios |
|
ref #5416 |
|
I have simply fixed this problem yesterday. I only use the cuda device in the RL scene, and keep the same cpu device in other scenes. I think this alarm message is still necessary. Please pull the latest main code for verification. Thank you. |
do you mean this #5416 @lambert0312 |
After my discussion with @zhaochenyang20 , we decided that we could set an environment variable to make this situation more general, https://github.com/sgl-project/sglang/pull/5427/files#r2045630379 |
34d09d5 to
4b2ff4d
Compare
…1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
…erl-project#1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
…erl-project#1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default sgl-project/sglang#5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch #1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default sgl-project/sglang#5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch verl-project#1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
* fix: update pr-test-sgl-kernel (sgl-project#5399) * kernel: support slightly faster merge_state_v2 cuda kernel (sgl-project#5381) * chore: bump sgl-kernel 0.0.9 (sgl-project#5400) * chore: upgrade sgl-kernel 0.0.9 (sgl-project#5401) * Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (sgl-project#5406) * Fix bench_serving with random-ids (sgl-project#5214) * [misc] fix ci flaky case (sgl-project#5352) * [FIX] Fix concatenation error in capture_bs when open --disable-cuda-graph-padding and without MTP (sgl-project#5412) * Support dynamic connection and TP 16 (sgl-project#5351) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> * Fix broadcast use cuda device lead to memory capacity unbalanced (sgl-project#5416) * [PD] Fix dynamic port support and MLA buffer for Mooncake (sgl-project#5415) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: ybyang <ybyang7@iflytek.com> * Distinguish bootstrap key only in decode server (sgl-project#5422) * [PD] Remove unused bootstrap param and fix port table type (sgl-project#5423) * [minor] cleanup cmakelists.txt (sgl-project#5420) * bugfix: fix merge_state_v2 cuda graph (sgl-project#5419) * chore: bump sgl-kernel v0.0.9.post1 (sgl-project#5430) * fix: solve release issue (sgl-project#5434) * BLackwell cutlass mla: Add check for bad page size/block num combinations (sgl-project#5431) * feat: update model_specific_adjustment (sgl-project#5344) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> * chore: upgrade sgl-kernel 0.0.9.post1 (sgl-project#5436) * Fix ignore_eos parameter when loading a chat template (sgl-project#5264) * add attention backend supporting matrix in the doc (sgl-project#5211) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> * Support BNB quantization for llama/mllama (sgl-project#5038) Co-authored-by: Yuhao Yang <yyh073@foxmail.com> * [Docs] Update start/install.md (sgl-project#5398) * [Minor] Move torch.compile patch to a better place (sgl-project#5397) * [Bug fix] need record start time in pd mode (sgl-project#5425) * Support MHA with chunked prefix cache for DeepSeek chunked prefill (sgl-project#5113) * chore: bump v0.4.5.post1 (sgl-project#5445) * Fix several minor issues in PD disaggregation (sgl-project#5444) * [doc] Update benchmark_and_profiling.md (sgl-project#5449) * Update cutlass dependency. (sgl-project#5447) * add multi-lora feature in README.md (sgl-project#5463) * Clean up imports (sgl-project#5467) * [verl] Modify the update_weights func to align with verl's resharding (sgl-project#5345) Co-authored-by: Chayenne <zhaochen20@outlook.com> * [Model Support] unsloth/Phi-4-mini bnb model (sgl-project#4982) Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> * Update attention_backend.md: plural form (sgl-project#5489) * Add test for flash_attn_varlen_func kernel (sgl-project#5484) * Deprecate disable-mla (sgl-project#5481) * Deprecate enable-flashinfer-mla and enable-flashmla (sgl-project#5480) * Feat/support encoder model (like bert) (sgl-project#4887) * Enable local attention during decode (sgl-project#5479) * Refactor DeepSeek decoder layer branches (sgl-project#5205) * Fix a link in sgl-kernel/README.md (sgl-project#5493) * [Bug fix] use correct func path in deepseek (sgl-project#5496) Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> * Doc: fix problems of the 'Execute Notebooks / run-all-notebooks' ci caused by the unstability of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (sgl-project#5503) * [Feat] Update sgl-kernel flashinfer to latest main version (sgl-project#5500) Co-authored-by: zhyncs <me@zhyncs.com> * Fix: Incorrect parameters passed to forward_batch_generation (sgl-project#5506) (sgl-project#5511) * Fix: fix the exception 'the memory capacity is unbalanced. Some GPUs … (sgl-project#5426) Co-authored-by: ocss884 <ocss.lin@gmail.com> * [docs] Fix several consistency issues in sampling_params.md (sgl-project#5373) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> * Configuration qwen2_moe.py - qkv_bias now in transformers (sgl-project#5512) * Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (sgl-project#4836) * Sgl kernel fused_moe_gate support n_shared_experts (sgl-project#5440) * chore: bump sgl-kernel 0.0.9.post2 (sgl-project#5518) * use sglang_per_token_group_quant_fp8 from sgl-kernel instead of trion kernel (sgl-project#5473) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com> * fix kimi vl running bug after rebase main (sgl-project#5461) * fix bug of VLLM_AVAILABLE not defined (sgl-project#5497) * Avoid computing lse in Ragged Prefill when there's no prefix. (sgl-project#5476) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> * [Model] Adding Qwen3 and Qwen3MoE (sgl-project#4693) * fix util import (sgl-project#5542) * Revert "Avoid computing lse in Ragged Prefill when there's no prefix.… (sgl-project#5544) * chore: upgrade sgl-kernel 0.0.9.post2 (sgl-project#5540) * Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (sgl-project#5340) * Make profiler output file names consistent (sgl-project#5548) * [PD] Tiny fix timeout error when generate (sgl-project#5545) * [PD] Fix no cache connect for recevier (sgl-project#5534) * feat: use flashinfer jit package (sgl-project#5547) * [PD] Remove the requirement of config file for mooncake backend (sgl-project#5460) * restruct compressed_tensors_w8a8_fp8 (sgl-project#5475) * simplify the control logic for using shared experts fusion (sgl-project#5504) * Remove one kernel in per_tensor_quant_mla_fp8 (sgl-project#5549) * Fix sampler nan check when calling top_k_top_p_sampling_from_probs (sgl-project#5546) * [PD] Support page size > 1 (sgl-project#5561) * fix hicache write back (sgl-project#5543) * Minor update for ROCm variable style (sgl-project#5562) * Fix bench_one_batch producing unnatural results for expert parallel (sgl-project#5149) * [perf] introduce deep gemm group_gemm_masked as bmm (sgl-project#5432) * [PD] Fix DeepSeek cannot be run on latest master (sgl-project#5568) * Fix BumpAllocator error when no input_ids (sgl-project#5564) * enable DeepSeek V3 shared_experts_fusion in sm90 (sgl-project#5571) * [Fix] fix outlines and xgrammar (sgl-project#4947) * [Doc]Add instruction for profiling with bench_one_batch (sgl-project#5581) * Release v0.4.5.post2 (sgl-project#5582) * Fix bench_serving fail when zero warmup requests (sgl-project#5574) * Fix DeepEP cannot run on latest master (sgl-project#5567) * Fix torch memory saver not enabled in DP scenario (sgl-project#5560) * Super tiny fix typo (sgl-project#5559) * Add document for LoRA serving (sgl-project#5521) * Tiny improve error message (sgl-project#5526) * [PD] Fix server crash when using batch requests (sgl-project#5531) * [Feat] upgrade pytorch2.6 (sgl-project#5417) * Fix enable chunked prefill for Llama4 (sgl-project#5575) * fix: use fa3 for gemma2 (sgl-project#5586) * Fix ChatCompletionMessageGenericParam to allow for None content (sgl-project#5452) * [PD] Fix large page size + chunk prefill (sgl-project#5588) * Add test config yamls for Deepseek v3 (sgl-project#5433) * [Feature] Prefill assistant response - add continue_final_message parameter (sgl-project#4226) Co-authored-by: Chayenne <zhaochen20@outlook.com> * add function call parser for DeepSeek V3 (sgl-project#5224) * smaller and non gated models for docs (sgl-project#5378) * Feat: Implement JSON Mode (response_format.type="json_object") (sgl-project#4733) Co-authored-by: Kyle Pena <kylepena@kyles-macbook-pro.turkey-marlin.ts.net> * check marlin format before attempting conversion (sgl-project#4675) * compressed_tensors: port w8a16 fp8 from vllm (sgl-project#4852) * Fix one more issue reported by torchfix (sgl-project#4859) * Add sanity check for max_running_requests (sgl-project#5016) * Correct grafana heatmap. (sgl-project#5019) * Perform Batch Tokenization. (sgl-project#5141) * Speedup shared expert weight construction by avoid cloning (sgl-project#5188) * Tiny add Engine.flush_cache API (sgl-project#5241) * [misc] remove is_cuda_available (sgl-project#5319) * Fix flush cache (sgl-project#5590) * Add Speculative Decoding Eagle3 topk > 1 (sgl-project#5318) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Yubo Wang <yubowang2019@gmail.com> * upstream hicache fixes (sgl-project#5570) * Tiny add warning when cannot recognize bool env var (sgl-project#5348) * Modify metrics service endpoint (sgl-project#3443) * Update protocol.py to fix sgl-project#4589 (sgl-project#4590) * [Feat.] Enable grafana to show metrics (sgl-project#4718) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> * [Fix] Enhance DP Attention for IPv6 Compatibility (sgl-project#4937) * Support o1 model on Azure (sgl-project#4980) Co-authored-by: Shan Yu <shanyu1@g.ucla.edu> * Tiny remove duplicated code (sgl-project#5021) * Tiny update error hint (sgl-project#5037) * Support PD bootstrap fields on /v1/chat/completions endpoint (sgl-project#5488) * [PD] Fix generate endpoint of min_lb for PD (sgl-project#5598) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * [PD] Fix edge case and simplify large page size + chunked prefill (sgl-project#5589) * [PD] Add NIXL transfer backend (sgl-project#5477) * [PD] Support decode overlap schedule (sgl-project#5608) * [PD] Support prefill overlap + Ensure no race condition (sgl-project#5609) * Enhance GPU memory settings (sgl-project#5604) * [feature] enable pre compile jit deep_gemm (sgl-project#5580) * Clean up mem settings (sgl-project#5610) * Support aiter RMSNorm in AMD (sgl-project#5510) Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com> * chore: bump v0.4.5.post3 (sgl-project#5611) * Remove extra copy in deepseek forward absorb (sgl-project#5578) Co-authored-by: saienduri <saimanas.enduri@amd.com> * [Doc] Fix a 404 link to llama-405b (sgl-project#5615) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> * [fix] force use deepgemm in compile_deep_gemm (sgl-project#5618) * [fix] fix compile_deep_gemm missing kv_b_proj (sgl-project#5620) * fix: gemma 3 not use softcap (sgl-project#5622) * Fix FA3 DeepSeek prefill performance regression (sgl-project#5624) Co-authored-by: ispobock <ispobaoke@gmail.com> * [NFC] Remove duplicate `compressed-tensors` (sgl-project#5640) * Fix shared experts fusion error without quantization (sgl-project#5632) * [feature] Add H20 fp8_w8a8 FusedMoE config for --n-share-experts-fusion=16 (sgl-project#5641) Co-authored-by: yuethe <yuethe@tencent.com> * fix flashmla bug (sgl-project#5272) * [fix] reduce dp capture bs (sgl-project#5634) Co-authored-by: alcanerian <alcanerian@gmail.com> * Remove q concat in FA3 backend for DeepSeek decode (sgl-project#5638) * Revert "Support aiter RMSNorm in AMD" (sgl-project#5646) * fix: update bench_speculative (sgl-project#5649) * Turn on DeepGemm By Default and Update Doc (sgl-project#5628) * Fuse q_a_proj and kv_a_proj (sgl-project#5619) * Remove unnecessary `torch.full` in DeepSeek (sgl-project#5601) * [1/2] Add FP8 Blockscale MoE CUTLASS kernel for Blackwell (sgl-project#5281) * fix sgl-kernel unit tests (sgl-project#5666) * fix awq_dequantize import (sgl-project#5669) * Integrating PD disaggregation with DP attention and DeepEP (sgl-project#5435) Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> * fix gemma3 unit test (sgl-project#5670) * fix torchvision::nms not exist (sgl-project#5671) * [PD] Add support for dp attention with mooncake (sgl-project#5530) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * tune the threshold of gemma-2-27b-it in test_nightly_gsm8k_eval.py (sgl-project#5677) * [Doc] Fix two 404 links caused by sglang typo (sgl-project#5667) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> * fix: update truss bench_serving (sgl-project#5683) * fix: only compile ApplyTokenBitmaskInplace cu124+ (sgl-project#5686) * chore: bump sgl-kernel 0.1.0 (sgl-project#5688) * vlm: enable radix cache for qwen-vl models (sgl-project#5349) Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> * [BugFix] Fix combination of MTP and `--n-share-experts-fusion`with R1 (sgl-project#5707) * Fix weight loading bug for Deepseek v3+nextn (sgl-project#5684) * Add example to use sgl engine with fastapi (sgl-project#5648) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local> * [Doc] Fix a link to Weilin Zhao (sgl-project#5706) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> * Add MMMU benchmark results (sgl-project#4491) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local> * [Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct) (sgl-project#5078) Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> * [PD] Better logs (sgl-project#5715) * [PD] Add kvargs table and thread pool for kvcache sender of mooncake (sgl-project#5738) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> * [PD]: Support Muti Prefill in one node (sgl-project#5704) Co-authored-by: shuaills <shishuaiuoe@gmail.com> * Fix: deepseek forward absorb (sgl-project#5723) Co-authored-by: ispobock <ispobaoke@163.com> * Pin torch audio to 2.6.0 (sgl-project#5750) * Revert "[Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct)" (sgl-project#5754) * Disable flaky eagle tests (sgl-project#5753) * update triton 3.2.0 h200 fused moe triton config and add warning about triton fused_moe_kernel performance degradation due to different Triton versions. (sgl-project#5740) * [Docs] Update runtime/engine/readme.md (sgl-project#5737) Signed-off-by: windsonsea <haifeng.yao@daocloud.io> * Reorder loop in shared expert weight loading (sgl-project#5719) * fix: fix one more bug from merging mm_inputs (sgl-project#5718) Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com> * [Fix]: support deepseek-vl2-tiny model (sgl-project#5552) Co-authored-by: bppps <zouyu.zzx@alibaba-inc.com> * Bugfix for minicpmo vision test (sgl-project#5760) * [Minor] fix documentations (sgl-project#5756) * Add an assertion to enhance the robustness of the operator (sgl-project#5736) * fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512 (sgl-project#5733) * Use device_id in dist init to reduce NCCL communicator warmup & creation overhead (sgl-project#5728) * [fix] fix potential bumpy throughtput with deepgemm (sgl-project#5722) * Resolves the `404 Not Found` error when running `compile_deep_gemm.py` in multi-node setups (sgl-project#5720) * perf: update H20 fused_moe_triton kernel config to get higher throughput during prefilling (sgl-project#5716) * we fix the non existent access of `decrypted_config_file` (sgl-project#5685) * CI: rewrite test_vision_chunked_prefill to speedup (sgl-project#5682) * Fuse MLA set kv cache kernel (sgl-project#5748) * Update amd docker image to `sglang:v0.4.5.post3-rocm630`. (sgl-project#5697) * [feature] support for roberta embedding models (sgl-project#5730) * [fix] fix bench_one_batch_server (sgl-project#5607) * support for the DeepSeek model by enabling streaming response parsing (sgl-project#5592) * fix: Use `is not None` instead of `!= None` for None checks. (sgl-project#5687) * Add Llama 4 to FA3 test (sgl-project#5509) * [misc] more decode step log for batch_one_batch (sgl-project#5565) * Handle JSONDecodeError while processing request data (sgl-project#5599) * fix(srt): check if sample_indices is not None before usage. (sgl-project#5633) * update llguidance to 0.7.11; adds StructTag (sgl-project#4870) * Use sgl-kernel sgl_per_token_group_quant_int8 (sgl-project#4971) * Add memory_saver check (sgl-project#4986) Signed-off-by: Kebe <mail@kebe7jun.com> * add switch to disable open api doc (sgl-project#3744) Signed-off-by: congcongke <zhanweidu@163.com> * Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512" (sgl-project#5772) * Fix eagle test case (sgl-project#5776) * Split local attention test from fa3 test (sgl-project#5774) * Revert "Revert "fix: import vllm_rotary_embedding error when head_size not in 64, 128, 256, 512"" (sgl-project#5777) * Simplify FA3 tests (sgl-project#5779) * Revert "[fix] fix bench_one_batch_server" (sgl-project#5785) * Revert "Use device_id in dist init to reduce NCCL communicator warmup & creation overhead" (sgl-project#5786) * [CI] Tune threshold (sgl-project#5787) * [CI] fix port conflicts (sgl-project#5789) * [CI] Fix ci tests (sgl-project#5769) * [PD]Reduce kv transfer threads (sgl-project#5791) * [CI] Fix test case (sgl-project#5790) * Add 8-GPU Test for Deepseek-V3 (sgl-project#5691) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Release v0.4.6 (sgl-project#5795) * Update nightly-test.yml (sgl-project#5797) * [CI] Improve github summary & enable fa3 for more models (sgl-project#5796) * [Docs] update grafana setup guide in production metrics (sgl-project#5643) Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com> * [Misc] add structure logging, write to file and log tracing for SGL Router * Improve overlap scheduling (sgl-project#5788) * Add Cutlass MLA attention backend (sgl-project#5390) * chore: upgrade sgl-kernel 0.1.0 (sgl-project#5690) * Dockerfile.dev pip scikit_build_core (sgl-project#5807) * Add a doc to fix sgl-kernel build link error in py39 with ccache (sgl-project#5809) * Turn on overlap scheduler for multimodal models (sgl-project#5771) * Tiny refactor DefaultModelLoader.Source (sgl-project#5482) * [Docs] Replace lists with tables for cleanup and readability in server_arguments (sgl-project#5276) * Revert "Tiny refactor DefaultModelLoader.Source" (sgl-project#5825) * Feat: add support for thinking mode via chat_template_kwargs.enable_t… (sgl-project#5551) Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Yineng Zhang <me@zhyncs.com> * fix: fix the error where the content is None when reasoning and tool … (sgl-project#5838) * feat: Add fused moe triton config for qwen3 moe on h100 (sgl-project#5833) * fused moe triton tuning script support qwen3 (sgl-project#5842) * feat: Add fused moe triton config for qwen3bf16 moe on h20 (sgl-project#5839) * [PD] support pd fake transfer for warmup (sgl-project#5726) * [config] qwen3moe_tune_h20 fp8 tp4 (sgl-project#5846) * [Doc] Recover history of server_arguments.md (sgl-project#5851) * feat: Add fused moe triton config for qwen3-30b-fp8 moe on h20 (sgl-project#5850) * [CI] test chunked prefill more (sgl-project#5798) * ROCm: update AITER (sgl-project#5816) * [Feat] QWen-1M context support[1/2]: Update block sparse attention backend utils kernel (sgl-project#5847) Co-authored-by: sighingnow <sighingnow@gmail.com> * [Fix] Missing bootstrap_port field (sgl-project#5823) * feat: update is_fa3_default_architecture (sgl-project#5854) * add fused moe config for qwen3moe fp8/bf16 (sgl-project#5849) * chore: bump v0.4.6.post1 (sgl-project#5845) * fix for hpu backend in model runner and server args Signed-off-by: Mohit Sinha <msinha@habana.ai> * rebase formatting issue Signed-off-by: Mohit Sinha <msinha@habana.ai> * [SW-228218]: Fix device mismatch in frequency penalty. Ensure tensors in BatchedFrequencyPenalizer are on the same device by moving output_ids and frequency_penalties to the device of cumulated_frequency_penalties. This resolves a RuntimeError caused by tensors on cpu and hpu:0 during logits subtraction. --------- Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Signed-off-by: windsonsea <haifeng.yao@daocloud.io> Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: congcongke <zhanweidu@163.com> Signed-off-by: Mohit Sinha <msinha@habana.ai> Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: DefTruth <31974251+DefTruth@users.noreply.github.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com> Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com> Co-authored-by: Zhaoyang Hao <77828610+Muuuchen@users.noreply.github.com> Co-authored-by: Yuan Luo <yuan.luo@hotmail.com> Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com> Co-authored-by: lambert0312 <lambert80.ios@gmail.com> Co-authored-by: shangmingc <caishangming@linux.alibaba.com> Co-authored-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Trevor Morris <tmorris@nvidia.com> Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: mRSun15 <3150105645@zju.edu.cn> Co-authored-by: ryang <38470282+ryang-max@users.noreply.github.com> Co-authored-by: Yuhao Yang <yyh073@foxmail.com> Co-authored-by: Michael Yao <haifeng.yao@daocloud.io> Co-authored-by: ybyang <10629930+whybeyoung@users.noreply.github.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: BearBiscuit <55008898+BearBiscuit05@users.noreply.github.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: eigen <52445717+yyihuang@users.noreply.github.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: woodx <124784234+woodx9@users.noreply.github.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: mlmz <54172054+minleminzui@users.noreply.github.com> Co-authored-by: PGFLMG <1106310035@qq.com> Co-authored-by: u4lr451 <u4lr451@gmail.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by: strgrb <zhangkaihong.zkh@antgroup.com> Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com> Co-authored-by: liwenju0 <like4hub@gmail.com> Co-authored-by: Wenxuan Tan <wtan45@wisc.edu> Co-authored-by: yhyang201 <47235274+yhyang201@users.noreply.github.com> Co-authored-by: Yubo Wang <yubowang2019@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: Zhaoyi Li <36555117+Lzy17@users.noreply.github.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: tarinkk <129432511+tarinkk@users.noreply.github.com> Co-authored-by: AmadeusW <41280211+Amadeus-Winarto@users.noreply.github.com> Co-authored-by: Adarsh Shirawalmath <114558126+adarshxs@users.noreply.github.com> Co-authored-by: Yi Zhou <zhouyi920521@gmail.com> Co-authored-by: simveit <69345428+simveit@users.noreply.github.com> Co-authored-by: kyle-pena-kuzco <kyle.pena@kuzco.xyz> Co-authored-by: Kyle Pena <kylepena@kyles-macbook-pro.turkey-marlin.ts.net> Co-authored-by: Enrique Shockwave <33002121+qeternity@users.noreply.github.com> Co-authored-by: Juwan Yoo <ryan@tmfi.us> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: mac0ne <mac0ne@users.noreply.github.com> Co-authored-by: Sundara Raman Ramachandran <sundar24295@gmail.com> Co-authored-by: Qingquan Song <ustcsqq@gmail.com> Co-authored-by: moontidef <53668275+relic-yuexi@users.noreply.github.com> Co-authored-by: Huapeng Zhou <73010314+PopSoda2002@users.noreply.github.com> Co-authored-by: Lucius <souzou@foxmail.com> Co-authored-by: Chuyue Sun <33578456+ChuyueSun@users.noreply.github.com> Co-authored-by: Shan Yu <shanyu1@g.ucla.edu> Co-authored-by: Yongtong Wu <914554688@qq.com> Co-authored-by: michael-amd <Michael.Zhang@amd.com> Co-authored-by: Ke Bao <ISPObaoke@163.com> Co-authored-by: saienduri <saimanas.enduri@amd.com> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: Connector Switch <c8ef@outlook.com> Co-authored-by: saltyfish66 <38240284+saltyfish66@users.noreply.github.com> Co-authored-by: yuethe <yuethe@tencent.com> Co-authored-by: alcanerian <alcanerian@gmail.com> Co-authored-by: HAI <hixiao@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Ravi Theja <ravi03071991@gmail.com> Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local> Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Co-authored-by: IAN <50618241+hcyz33@users.noreply.github.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: XinyuanTong <115166877+JustinTong0323@users.noreply.github.com> Co-authored-by: ZXN <44322223+bppps@users.noreply.github.com> Co-authored-by: bppps <zouyu.zzx@alibaba-inc.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com> Co-authored-by: vzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: DavidBao <121073073+DavidBao03@users.noreply.github.com> Co-authored-by: Frankey_8080 <32973306+Frank-Jie@users.noreply.github.com> Co-authored-by: yan97ao <580776+yan97ao@users.noreply.github.com> Co-authored-by: aoshen524 <aoshen524@gmail.com> Co-authored-by: Michał Moskal <michal@moskal.me> Co-authored-by: Kebe <mail@kebe7jun.com> Co-authored-by: zhanweidu <zhanweidu@163.com> Co-authored-by: NoahM <88418672+zhudianGG@users.noreply.github.com> Co-authored-by: Simo Lin <linsimo.mark@gmail.com> Co-authored-by: JiLi <leege233@gmail.com> Co-authored-by: sighingnow <sighingnow@gmail.com> Co-authored-by: XTY <xutianyi1999@live.com> Co-authored-by: vikram singh shekhawat <vshekhawat@habana.ai>
* clean codes (#1219)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* Update the ray debug tutorial (#1204)
## Motivation
The existing Ray tutorial is difficult to follow and doesn’t explain how
to debug across multiple breakpoints.
## Modifications
- Updated `multinode.rst`
## Checklist
- [x] Created independent `ray_debugger.rst` with step‑by‑step
instructions
* fix util reward_score/math_dapo.py notes. (#1185)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* fixt: typo (#1217)
Alternatively, we should properly expand on the role of the parameter
`mapping`
* docker: update Dockerfile.sglang (#1207)
Install ray[default] to include missing components
* Update ray_debug_tutorial.rst (#1228)
* [vllm] update moe patch for megatron and fsdp (#1200)
## Motivation
This is a fix for the issue where the `weight_loader` in FusedMoe of the
vLLM code could not be used correctly during the resharding phase,
addressed in #923, #1137, and #1139 respectively. Currently, the results
of these PRs can be used together, allow both FSDP and Megatron to use
the same function, reducing code maintenance costs.
* [mcore] refactor: remove the mcore patches (#1229)
* Fix docs about config page. (#1236)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* Migrate to new image with FlashInfer 0.2.2 + vLLM 0.8.3 + SGLang 0.4.5 + MCore 0.12.0 + TE 2.2 + cuDNN 9.8.0 (#1237)
As support both, we let TE to choose attention backend now.
New Image:
`whatcanyousee/verl:ngc-cu124-vllm0.8.3-sglang0.4.5-mcore0.12.0-te2.2`
* fix: validation top_p=0.7 for DAPO full (#1241)
* [misc] refactor moe bash (#1245)
* [logging] feat: Add Rollout and Validation dumps to file (#916)
Co-authored-by: Mert Unsal <mertunsal1905@gmail.com>
* [AMD] Add AMD performance tuning documentation (#1240)
* [logging] feat: Add step and epoch metrics (#1250)
Solves #1251
Right now the current global step and current epoch are not being
logged. This would be a useful feature.
* [SGLang] feat: upgrade to 0.4.5.post3 & fix ipv6 (#1203)
The ipv6 part is picked from
https://github.com/volcengine/verl/pull/1184 cc @BearBiscuit05
---------
Co-authored-by: BearBiscuit05 <xiangyongan@bytedance.com>
Co-authored-by: Gelee-Q <leege233@gmail.com>
* [proto] feat: Add bool-type index selection for DataProto (#1082)
After the last change, current DataProto cannot use bool-type index due
to hard-coded batch_size equal to idxs.shape[0].
This patch changes the new batch_size for bool-type idx to idxs.sum().
It's useful when users filter the batch with bool-type masks.
* [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138)
### Summary
Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710
### Architecture

**New Components**:
- AsyncLLMWorker: standalone vllm server instance
- FastAPI: provide OpenAI-compatible HTTP server
- AsyncLLM: async LLMEngine for online serving, for more details:
[AsyncLLM](https://github.com/vllm-project/vllm/pull/9826),
[LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine)
- ExternalRayDistributedExecutor: custom executor backend manages
workers in worker group, it grabs corresponding workers by actor names
- AsyncLLManager: manages a group of vllm server
instances(AsyncLLMWorker)
- AsyncLLM lifecycle: initialization, wake_up, sleep.
- FastAPI service discovery
- ChatScheduler: schedule multiple chat completion requests with
multiple server instances
- Least requests load balance
- Sticky session with prefix caching
- Chat completion callback: tools calling
### TODO
- [x] AsyncLLM: intialization/wake_up/sleep
- [x] OpenAI API: support `/v1/chat/completions`
- [x] RayPPOTrainer integration: replace `generate_sequences` to http
call `/v1/chat/completions`
- [x] GSM8K e2e training
- [ ] Add document
---------
Co-authored-by: shengguangming <shengguangming@bytedance.com>
* [AMD] Update AMD performance tuning documentation (#1256)
Update AMD performance tuning documentation according to
@yushengsu-thu's suggestion.
1. fix git branch and link
2. fix tab
* fix: remove deprecated remove_previous_ckpt key in prime_ray_trainer.py (#1254)
deprecated remove_previous_ckpt key cause save checkpoint crash.
See: https://github.com/volcengine/verl/issues/1183
* fix: Correct sampling params setting in sglang evaluation (#1181)
This PR fixes an issue where parameters in `val_kwargs` are not
effectively passed during sglang evaluation when `do_sample=True` is
set. Additionally, since the validation data has already been repeated
in `ray_trainer`, the `n` parameter in `sampling_params` needs to be
correctly configured to prevent errors caused by dimension mismatches.
* distro: clean req packages. (#1253)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* [rollout] feat: support rollout.n > 1 in hf_rollout (#1199)
Currently, the hf rollout backend only support `rollout.n == 1`, when
`rollout.n > 1` it will lead to an error
(https://github.com/volcengine/verl/issues/1134)
This PR make hf rollout support `do_sample` and `is_validate` to make it
consistent with vllm and sglang backend, and correctly support
`rollout.n > 1`.
* [bugfix] fix: add `await` for `_validate()` (#1269)
As titled.
* [profile] add profile for megatron train (#1146)
## Motivation
This is a new feature that adds the functionality of collecting profiles
during the training phase. Since the RL process repeatedly enters the
training process, by default, the profile temporarily captures the
results of the first `update_policy`. Moreover, this modification should
be seamlessly integrated into other training frameworks.
* [mcore] add offload param and opt function for magetron (#1162)
## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.
## TODO
- [x] Accuracy testing
* [CI] feat: only test for push to main (#1271)
* [misc] add offload and profile doc, add validate in profile (#1272)
* Adding GUI-R1 to the Awesome work (#1275)
* feat: move AsyncLLM ChatCompletionScheduler to separate thread (#1274)
Move AsyncLLM ChatCompletionScheduler to separate thread to avoid making
PPOTrainer async class.
* [profile] print cuda system memory and offload actor model after init (#1118)
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
* [Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors:
```
examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120)
examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120)
examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120)
examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120)
examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120)
examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120)
examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120)
examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell
recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120)
recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120)
recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120)
recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120)
recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120)
recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120)
recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120)
recipe/r1/data_process.py:131:9: E722 Do not use bare `except`
recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except`
scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used
scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found
scripts/model_merger.py:42:121: E501 Line too long (184 > 120)
scripts/model_merger.py:146:13: E722 Do not use bare `except`
tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120)
tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used
tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used
tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used
tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used
tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120)
tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20
tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120)
tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18
tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120)
tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120)
tests/sanity/check_license.py:22:1: E402 Module level import not at top of file
tests/sanity/check_license.py:23:1: E402 Module level import not at top of file
tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/__init__.py:22:1: E402 Module level import not at top of file
verl/__init__.py:24:1: E402 Module level import not at top of file
verl/__init__.py:25:1: E402 Module level import not at top of file
verl/__init__.py:29:1: E402 Module level import not at top of file
verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120)
verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120)
verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file
verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120)
verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120)
verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120)
verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120)
verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120)
verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120)
verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120)
verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120)
verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120)
verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120)
verl/protocol.py:303:121: E501 Line too long (125 > 120)
verl/protocol.py:352:121: E501 Line too long (171 > 120)
verl/protocol.py:578:121: E501 Line too long (142 > 120)
verl/protocol.py:580:121: E501 Line too long (150 > 120)
verl/protocol.py:583:121: E501 Line too long (167 > 120)
verl/protocol.py:715:1: E402 Module level import not at top of file
verl/protocol.py:725:121: E501 Line too long (121 > 120)
verl/protocol.py:766:1: E402 Module level import not at top of file
verl/protocol.py:768:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file
verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120)
verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120)
verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+`
verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused
verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused
verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120)
verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120)
verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged
verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120)
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120)
verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120)
verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120)
verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82
verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file
verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found
verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120)
verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120)
verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120)
verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used
verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key`
verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key`
verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120)
verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120)
verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition
verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120)
verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120)
verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string
verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120)
verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string
verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120)
verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused
verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused
verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/model.py:464:121: E501 Line too long (124 > 120)
verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string
verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120)
verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120)
verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused
verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120)
verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120)
verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120)
verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120)
verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120)
verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120)
verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120)
verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120)
verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120)
verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120)
verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file
verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120)
verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l`
verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used
verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file
verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used
verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used
verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120)
verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120)
verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120)
verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120)
verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120)
verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120)
verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120)
verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used
verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used
verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120)
verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used
verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120)
verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used
verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120)
verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120)
verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120)
verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used
verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120)
verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used
Found 305 errors.
```
---------
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
* [logging] fix: typo of fsdp_checkpoint_manager saving optim path (#1276)
fix a minor typo of printing optim saving path in
fsdp_checkpoint_manager.py
* [doc] fix: fix 2 minor issues in installation and reward explanation (#1215)
close
- #1214
- #1213
Co-authored-by: HL <linhaibin.eric@gmail.com>
* [merger] fix: merged generation config is inconsistent with hf pre-trained model (#1277)
https://github.com/volcengine/verl/blob/afeac9a0230a0980e990a3c59e08e8e0890baaa4/scripts/model_merger.py#L195-L200
Model created by `from_config` won't load the `generation_config.json`
from `args.hf_model_path`, instead it create a generation config
separately.
This inconsistency will lead to strange generating error when user using
vllm/hf rollout without carefully override
sampling_params/generation_config, see issue here:
https://github.com/volcengine/verl/issues/1246
This PR introduce a `patch_model_generation_config` function which patch
the model from config to correctly use the pretrained generation config.
Fix https://github.com/volcengine/verl/issues/1246.
* Option to make model private when pushing to hub, pushing the tokenizer for convenience (#1259)
Very small changes to `model_merger.py` so that tokenizer is pushed to
hub and model can be pushed privately.
* [CI] feat: only check changed files (#1294)
* [example] chore: remove verl_getting_started.ipynb (#1281)
remove the out-dated notebook
* [doc] add the multi modal doc (#1292)
## Motivation
There is currently no docs support for multimodal task on verl, so I
think we need to add a related document.
* docs: add DeepWiki and ICLR links (#1283)
* [docs] add pr template (#1287)
# What does this PR do?
add the PR template to improve the readability of PR.
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
* fix: catch any error in math reward function (#1312)
# What does this PR do?
This PR fixes collapse in the math reward function by catch any possible
errors.
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: None
- **Training**: None
- **Inference**: None
* [vllm] add moe patch for qwen3-moe (#1316)
# What does this PR do?
Add moe patch for qwen3-moe. Fix the weight loader issue in vLLM MoE
models. This isn’t a permanent solution, and we may need to contribute
code to vLLM to address the problem caused by FusedMoE. I’m already
seeking suggestions for this.
# ChangeLog:
- Add Qwen3MoeForCausalLM class for moe_patch
* fix reward model and add CI test (#1252)
Fix bugs related to #1165 .
Megatron backend reward model has no CI test, add to current ppo
trainer.
Fix `micro_batch_size_per_gpu` but not sure whether it is right for
reward config.
The output format is also not right with current `forward_micro_batch`
implementation.
* [sglang] feat: Add SGLang async multi-turn rollout with tool support (#1037)
A redesigned version of #917
## Current Status
[Develop log &
Tracker](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/113)
**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.
**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.
## Key Features will be introduced in future version
- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.
**Future Plan**
[Discussion
Thread](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/74#issuecomment-2763192625)
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.
## Contributors & Acknowledgement
- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng
---------
Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
* [fix] Remove grad_offload in rloo example script (#1323)
# What does this PR do?
`grad_offload` option was removed in #284 for fsdp backend, current
script will error out due to this.
# ChangeLog:
- Remove grad_offload in rloo example script
# Usage
- Run the changed script
## Before submitting
- [X] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [X] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [X] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: N/A
- **Training**: FSDP
- **Inference**: None
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* cancel bootstrapping for n=n_samples (#1320)
# What does this PR do?
The validation metrics currently bootstraps its estimates by randomly
sampling 1,2,4,8,16,...,n_samples results out of n_samples results.
However, this bootstrapping doesn't make sense for `n=n_samples` as you
cannot have more information about the estimate for `pass@n_samples` if
you only have `n_samples` samples.
This results in weird results when doing RL with only one problem in the
validation set (best@N is a value between 0 and 1 instead of 0 or 1)
This PR turns off bootstrapping for n=n_samples case and leaves rest of
the computations the same.
* docs: add community blogs and fix link rendering (#1324)
# What does this PR do?
Add one-line overview of what this PR aims to achieve or accomplish.
# ChangeLog:
- Add two reference blogs to README
# Usage
None
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [] Did you write any test cases if neccessary? No tests needed
* [doc] fix dataset path for gsm8k and url error (#1327)
# What does this PR do?
fix dataset path for gsm8k and some url error.
# ChangeLog:
change the readme file to fix gsm8k download path.
# Usage
- You can add one use example below.
```python
# Add code snippet or script demonstrating how to use this
```
- For algorithm implementation and new model support, you can add
training curve plots and evaluatuion results below.
## Before submitting
- [ ] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
* [feat] add FusedWorker (#1278)
on behalf of @zw0610
FusedWorker is designed to enhance the ability of colocated workers.
FusedWorker keeps most of the interfaces as colocated workers: Users
shall use `create_colocated_worker_cls_fused` to create colocated worker
class, use `spawn` to split FusedWorker to dict of workers.
In colocated workers, access the methods of child workers is done by
using `spawn` then access via worker dict or calling
`{worker_group}.{worker}_{method}`. In FusedWorker, the first method was
preserved, while the latter was change to a new way: First use
`{worker_group}.fuse(prefixes)` to bind workers to the worker group,
then use `{worker_group}.{worker}.foo()` to access child workers.
* [test] fix: test arithmetic_sequence failed to run (#1333)
# What does this PR do?
e2e test `arithmetic_sequence` is currently broken, with error
`TypeError: not a string` thrown on code `tokenizer =
AutoTokenizer.from_pretrained(local_path)` when running
`tests/e2e/run_ray_trainer.sh`. This PR aims to fix it.
In the `arithmetic_sequence` task, `tests.e2e.envs.digit_completion`
module was imported in the beginning but not used. This import seems
meaningless. However, when this library is imported,
`AutoTokenizer.register()` will be called to set configurations for
`AutoTokenizer`. Only after that can `AutoTokenizer` be successfully
initialized in test code to perform subsequent tasks.
## Timeline
- In #934 , to improve CI efficiency, the CI corresponding to
`arithmetic_sequence` was removed.
- In #1010 , according to the `unused_import` rule, this import was
deleted, triggering the bug.
# ChangeLog
- `AutoTokenizer.register` was added explicitly, which ensures the
configurations were set before initialization of `AutoTokenizer`.
# Usage
- the original code `tests/e2e/run_ray_trainer.sh` is available for
tests.
```python
bash tests/e2e/run_ray_trainer.sh
```
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: none
- **Training**: none
- **Inference**: none
* [FIX] metric_utils log best, worst, maj only for n_resps > 1 (#1248)
Solves #1249
Instead of logging best@1/mean and worst@1/mean, which is identical to
mean@1, just do not log it when there is only one validation response
per prompt (`n_resps == 1`). Same applies to std.
Otherwise we get many duplicated plots that show the same thing.
The only change is the addition of the `if n_resps > 1:` statement.
* [dev] feat: improve PR template (#1343)
This PR tries to imporve the PR template itself.
* [recipe] feat: latest reproduction of DAPO (#1336)
# What does this PR do?
This PR updates the latest reproduction results of DAPO.
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: none
- **Training**: none
- **Inference**: none
* [docs] fix: typo (#1351)
* [installation] doc: Fix pip install instructions (#1353)
### Checklist Before Starting
- [X] Search for similar PR(s).
### What does this PR do?
There should be no space between `.` and `[vllm]` or `[sglang]`, or it
will result in error:
```logs
ERROR: Invalid requirement: '[vllm]': Expected package name at the start of dependency specifier
[vllm]
```
In addition, I rewrite this part to make the instructions more clear (as
`.. or ..` can't be executed by bash directly)
### Additional Info.
- **Issue Number**: none
- **Training**: none
- **Inference**: none
### Checklist Before Submitting
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if neccessary.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* [fsdp] feat: support fsdp2 training and inference in fsdp_workers (#1026)
# What does this PR do?
This PR supports fsdp2 for fsdp_worker. Torch version 2.4 or higher is
required.
# Usage Example
```
sh examples/grpo_trainer/run_qwen2-7b.sh \
actor_rollout_ref.ref.strategy=fsdp2 \
actor_rollout_ref.actor.strategy=fsdp2
```
To save more memory, you can add the parameter below to enable the fsdp2
OffloadPolicy:
```
actor_rollout_ref.actor.offload_policy=True
```
You can see the profile comparison between fsdp1 and fsdp2 here:
https://github.com/volcengine/verl/pull/1026#issuecomment-2824343860
---------
Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: shengguangming <shengguangming@bytedance.com>
* [docs] fix: Fix Arxiv Link (#1364)
Arxiv link is not rendering on github or
https://verl.readthedocs.io/en/latest/index.html#
### Checklist Before Starting
- [x ] Search for similar PR(s).
### What does this PR do?
Makes external link to arxiv paper resolve properly.
### High-Level Design
N/A
### Specific Changes
Single line doc change
### API
N/A
### Usage Example
N/A
### Test
N/A
### Additional Info.
### Checklist Before Submitting
All N/A
* [dataproto] feat: Add auto padding for DataProto (#1356)
### Checklist Before Starting
- [x] Search for similar PR(s).
Coming from #577 , credit to @zw0610
### What does this PR do?
Today, users must manually duplicate (repeat) a DataProto so its batch
size matches the data‑parallel (dp) size of the target WorkerGroup. This
PR enables `auto_padding` to pad the `DataProto` when chunk is called.
### Specific Changes
* Enriched the `DataProto` so that it can have context of padding during
chunking;
* Modified the `decorator.py` that a DataProto can be automatically
padded and chunked with `dispatch_dp_compute_data_proto`;
* Added unit tests under `tests/ray/test_auto_padding.py`.
### API
Two new API under `DataProto` are introduced, which are `padding` and
`is_padding_enabled`
### Test
Tests added to `tests/ray/test_auto_padding.py`
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
---------
Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: Wang Zhang <zhangwang.nozomi@bytedance.com>
Co-authored-by: Wang Zhang <zw199006@gmail.com>
* [ray] feat: Making decorator register available for async function (#1370)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
This PR enables the decorators to be able to be applied onto async
functions.
### High-Level Design
* Simply added a inner wrapper function available for async func inside
the `register` function.
### Usage Example
```python
@register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False)
async def async_fn(self, sleep_time):
return await asyncio.sleep(sleep_time * 0.1)
```
### Test
* `tests/ray/test_decorator.py`
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
---------
Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
* docs: Add runllm widget for VeRL Doc sites (#1366)
### Checklist Before Starting
- [ ] Search for similar PR(s).
### What does this PR do?
Add runllm widget for https://app.readthedocs.org/projects/verl/
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
* [trainer] breaking: pass dataset as required args to SFTTrainer; also change ppo ray trainer to take custom datasets as inputs (#1282)
* [ci][fix] Enable part of ray test to be run on CPU machine (#1372)
* [fix][ci] fix two pipelines that fails on the main branch (#1378)
* [feat] Enable `update_model_config` to take nested dict to update `AutoConfig` of transformers (#1379)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
* Enable `update_model_config` to take nested dict to update
`AutoConfig` of transformers
* Added a test pipeline for all the tests under `tests/utils`, Any
future unit tests for `verl/utils` should be added here
* Re-organized the tests file structure.
### Usage Example
For the new `update_model_config`, an example looks like below:
```python
override_config_kwargs = {
"bos_token_id": self.tokenizer.bos_token_id,
...
"nested_config": {k1: v1, k2, v2},
}
update_model_config(actor_model_config, override_config_kwargs=override_config_kwargs)
```
### Test
Added `tests/verl/utils/test_model.py::test_update_model_config`
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
---------
Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
* [rollout] misc: add demo chat completion scheduler described in ReTool paper (#1297)
Co-authored-by: shengguangming <shengguangming@bytedance.com>
* [dev] fix: validation metrics (#1374)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
1. Fix the error that `metric` is not added when `n == 1`.
2. Remove `std@1`.
3. Add assertation for doing initial validation but `val_metrics` is
empty.
### Additional Info.
- **Issue Number**: none
- **Training**: none
- **Inference**: none
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
* [sglang] Upgrade sglang to 0.4.6.post1 & misc fixes (#1385)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
- [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3
- [x] fix: flush_cache was never awaited
- [x] remove unused env
- [x] fix: add rank num to port to avoid SGLang picking the same port
when random.seed being set
- [x] feat: disable SGLang memory inbalance check by default
https://github.com/sgl-project/sglang/pull/5426
- [x] update setup.py to avoid old version pip can not resolving deps
- [x] fix: tools_kwargs length mismatch with batch #1380
> Add one-line overview of what this PR aims to achieve or accomplish.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title …
…erl-project#1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default sgl-project/sglang#5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch verl-project#1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
…(#1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default sgl-project/sglang#5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch #1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
…erl-project#1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default sgl-project/sglang#5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch verl-project#1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
* clean codes (#1219)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* Update the ray debug tutorial (#1204)
## Motivation
The existing Ray tutorial is difficult to follow and doesn’t explain how
to debug across multiple breakpoints.
## Modifications
- Updated `multinode.rst`
## Checklist
- [x] Created independent `ray_debugger.rst` with step‑by‑step
instructions
* fix util reward_score/math_dapo.py notes. (#1185)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* fixt: typo (#1217)
Alternatively, we should properly expand on the role of the parameter
`mapping`
* docker: update Dockerfile.sglang (#1207)
Install ray[default] to include missing components
* Update ray_debug_tutorial.rst (#1228)
* [vllm] update moe patch for megatron and fsdp (#1200)
## Motivation
This is a fix for the issue where the `weight_loader` in FusedMoe of the
vLLM code could not be used correctly during the resharding phase,
addressed in #923, #1137, and #1139 respectively. Currently, the results
of these PRs can be used together, allow both FSDP and Megatron to use
the same function, reducing code maintenance costs.
* [mcore] refactor: remove the mcore patches (#1229)
* Fix docs about config page. (#1236)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* Migrate to new image with FlashInfer 0.2.2 + vLLM 0.8.3 + SGLang 0.4.5 + MCore 0.12.0 + TE 2.2 + cuDNN 9.8.0 (#1237)
As support both, we let TE to choose attention backend now.
New Image:
`whatcanyousee/verl:ngc-cu124-vllm0.8.3-sglang0.4.5-mcore0.12.0-te2.2`
* fix: validation top_p=0.7 for DAPO full (#1241)
* [misc] refactor moe bash (#1245)
* [logging] feat: Add Rollout and Validation dumps to file (#916)
Co-authored-by: Mert Unsal <mertunsal1905@gmail.com>
* [AMD] Add AMD performance tuning documentation (#1240)
* [logging] feat: Add step and epoch metrics (#1250)
Solves #1251
Right now the current global step and current epoch are not being
logged. This would be a useful feature.
* [SGLang] feat: upgrade to 0.4.5.post3 & fix ipv6 (#1203)
The ipv6 part is picked from
https://github.com/volcengine/verl/pull/1184 cc @BearBiscuit05
---------
Co-authored-by: BearBiscuit05 <xiangyongan@bytedance.com>
Co-authored-by: Gelee-Q <leege233@gmail.com>
* [proto] feat: Add bool-type index selection for DataProto (#1082)
After the last change, current DataProto cannot use bool-type index due
to hard-coded batch_size equal to idxs.shape[0].
This patch changes the new batch_size for bool-type idx to idxs.sum().
It's useful when users filter the batch with bool-type masks.
* [rollout] feat: introduce vLLM AsyncLLM to support multi-turn rollout (#1138)
### Summary
Introduce vLLM AsyncLLM to support multi-turn rollout and #385 #398 #710
### Architecture

**New Components**:
- AsyncLLMWorker: standalone vllm server instance
- FastAPI: provide OpenAI-compatible HTTP server
- AsyncLLM: async LLMEngine for online serving, for more details:
[AsyncLLM](https://github.com/vllm-project/vllm/pull/9826),
[LLMEngine](https://docs.vllm.ai/en/latest/design/arch_overview.html#llmengine)
- ExternalRayDistributedExecutor: custom executor backend manages
workers in worker group, it grabs corresponding workers by actor names
- AsyncLLManager: manages a group of vllm server
instances(AsyncLLMWorker)
- AsyncLLM lifecycle: initialization, wake_up, sleep.
- FastAPI service discovery
- ChatScheduler: schedule multiple chat completion requests with
multiple server instances
- Least requests load balance
- Sticky session with prefix caching
- Chat completion callback: tools calling
### TODO
- [x] AsyncLLM: intialization/wake_up/sleep
- [x] OpenAI API: support `/v1/chat/completions`
- [x] RayPPOTrainer integration: replace `generate_sequences` to http
call `/v1/chat/completions`
- [x] GSM8K e2e training
- [ ] Add document
---------
Co-authored-by: shengguangming <shengguangming@bytedance.com>
* [AMD] Update AMD performance tuning documentation (#1256)
Update AMD performance tuning documentation according to
@yushengsu-thu's suggestion.
1. fix git branch and link
2. fix tab
* fix: remove deprecated remove_previous_ckpt key in prime_ray_trainer.py (#1254)
deprecated remove_previous_ckpt key cause save checkpoint crash.
See: https://github.com/volcengine/verl/issues/1183
* fix: Correct sampling params setting in sglang evaluation (#1181)
This PR fixes an issue where parameters in `val_kwargs` are not
effectively passed during sglang evaluation when `do_sample=True` is
set. Additionally, since the validation data has already been repeated
in `ray_trainer`, the `n` parameter in `sampling_params` needs to be
correctly configured to prevent errors caused by dimension mismatches.
* distro: clean req packages. (#1253)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
* [rollout] feat: support rollout.n > 1 in hf_rollout (#1199)
Currently, the hf rollout backend only support `rollout.n == 1`, when
`rollout.n > 1` it will lead to an error
(https://github.com/volcengine/verl/issues/1134)
This PR make hf rollout support `do_sample` and `is_validate` to make it
consistent with vllm and sglang backend, and correctly support
`rollout.n > 1`.
* [bugfix] fix: add `await` for `_validate()` (#1269)
As titled.
* [profile] add profile for megatron train (#1146)
## Motivation
This is a new feature that adds the functionality of collecting profiles
during the training phase. Since the RL process repeatedly enters the
training process, by default, the profile temporarily captures the
results of the first `update_policy`. Moreover, this modification should
be seamlessly integrated into other training frameworks.
* [mcore] add offload param and opt function for magetron (#1162)
## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.
## TODO
- [x] Accuracy testing
* [CI] feat: only test for push to main (#1271)
* [misc] add offload and profile doc, add validate in profile (#1272)
* Adding GUI-R1 to the Awesome work (#1275)
* feat: move AsyncLLM ChatCompletionScheduler to separate thread (#1274)
Move AsyncLLM ChatCompletionScheduler to separate thread to avoid making
PPOTrainer async class.
* [profile] print cuda system memory and offload actor model after init (#1118)
Co-authored-by: hiyouga <hiyouga@buaa.edu.cn>
* [Lint] fix: linting errors in all files (#1280)
This PR enables checking on all files after fixing all the errors:
```
examples/data_preprocess/geo3k.py:41:121: E501 Line too long (121 > 120)
examples/data_preprocess/multiturn.py:54:121: E501 Line too long (185 > 120)
examples/data_preprocess/multiturn.py:59:121: E501 Line too long (210 > 120)
examples/data_preprocess/multiturn.py:73:121: E501 Line too long (229 > 120)
examples/data_preprocess/multiturn.py:78:121: E501 Line too long (211 > 120)
examples/ray/tutorial.ipynb:cell 9:1:121: E501 Line too long (179 > 120)
examples/ray/tutorial.ipynb:cell 15:1:121: E501 Line too long (143 > 120)
examples/ray/tutorial.ipynb:cell 42:14:1: E402 Module level import not at top of cell
recipe/prime/prime_dp_rm.py:145:121: E501 Line too long (153 > 120)
recipe/prime/prime_dp_rm.py:156:121: E501 Line too long (137 > 120)
recipe/prime/prime_dp_rm.py:292:121: E501 Line too long (148 > 120)
recipe/r1/data_process.py:56:121: E501 Line too long (289 > 120)
recipe/r1/data_process.py:113:121: E501 Line too long (166 > 120)
recipe/r1/data_process.py:118:121: E501 Line too long (137 > 120)
recipe/r1/data_process.py:123:121: E501 Line too long (297 > 120)
recipe/r1/data_process.py:131:9: E722 Do not use bare `except`
recipe/r1/tasks/livecodebench.py:61:5: E722 Do not use bare `except`
scripts/diagnose.py:55:9: F841 Local variable `ip` is assigned to but never used
scripts/diagnose.py:165:13: B028 No explicit `stacklevel` keyword argument found
scripts/model_merger.py:42:121: E501 Line too long (184 > 120)
scripts/model_merger.py:146:13: E722 Do not use bare `except`
tests/e2e/arithmetic_sequence/model/create_model_tokenizer.py:28:121: E501 Line too long (440 > 120)
tests/gpu_utility/test_memory_buffers.py:42:5: F841 Local variable `model_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:43:5: F841 Local variable `model_copy_named_params` is assigned to but never used
tests/gpu_utility/test_memory_buffers.py:53:5: F841 Local variable `model_wrapper` is assigned to but never used
tests/model/test_transformers_ulysses.py:102:5: F841 Local variable `response_length` is assigned to but never used
tests/model/test_transformers_ulysses.py:181:5: F841 Local variable `response_length` is assigned to but never used
tests/ray/detached_worker/server.py:83:13: F841 Local variable `vpp_rank` is assigned to but never used
tests/ray/test_check_worker_alive.py:37:121: E501 Line too long (121 > 120)
tests/rollout/run_fsdp_vllm.py:22:64: F811 Redefinition of unused `ShardingStrategy` from line 20
tests/rollout/test_sglang_spmd.py:210:121: E501 Line too long (157 > 120)
tests/rollout/test_vllm_spmd.py:20:64: F811 Redefinition of unused `ShardingStrategy` from line 18
tests/sandbox/test_sandbox.py:86:121: E501 Line too long (1615 > 120)
tests/sandbox/test_sandbox.py:87:121: E501 Line too long (1596 > 120)
tests/sanity/check_license.py:22:1: E402 Module level import not at top of file
tests/sanity/check_license.py:23:1: E402 Module level import not at top of file
tests/verl/utils/dataset/test_rl_dataset.py:23:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_rm_dataset.py:36:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:22:5: F841 Local variable `url` is assigned to but never used
tests/verl/utils/dataset/test_sft_dataset.py:50:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
tests/verl/utils/dataset/test_sft_dataset.py:75:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/__init__.py:22:1: E402 Module level import not at top of file
verl/__init__.py:24:1: E402 Module level import not at top of file
verl/__init__.py:25:1: E402 Module level import not at top of file
verl/__init__.py:29:1: E402 Module level import not at top of file
verl/__init__.py:29:15: F401 `.single_controller` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:16:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:18:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:20:5: F401 `.modeling_llama_megatron.ParallelLlamaForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:21:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:22:5: F401 `.modeling_llama_megatron.ParallelLlamaForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/__init__.py:24:5: F401 `.modeling_llama_megatron.ParallelLlamaModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/checkpoint_utils/llama_loader.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:92:121: E501 Line too long (168 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_loader_depracated.py:274:121: E501 Line too long (127 > 120)
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/checkpoint_utils/llama_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/llama/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelLlamaAttention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelLlamaDecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelLlamaDecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelLlamaMLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelLlamaRMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/llama/megatron/layers/parallel_attention.py:196:121: E501 Line too long (134 > 120)
verl/models/llama/megatron/layers/parallel_attention.py:341:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:342:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:343:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:366:1: E402 Module level import not at top of file
verl/models/llama/megatron/layers/parallel_attention.py:420:121: E501 Line too long (122 > 120)
verl/models/llama/megatron/layers/parallel_linear.py:82:1: E402 Module level import not at top of file
verl/models/mcore/loader.py:273:121: E501 Line too long (134 > 120)
verl/models/mcore/util.py:26:121: E501 Line too long (202 > 120)
verl/models/qwen2/megatron/__init__.py:16:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLM` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:18:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:20:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForCausalLMRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:21:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:22:5: F401 `.modeling_qwen2_megatron.ParallelQwen2ForValueRmPadPP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/__init__.py:24:5: F401 `.modeling_qwen2_megatron.ParallelQwen2Model` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader.py:256:121: E501 Line too long (172 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:90:121: E501 Line too long (169 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_loader_depracated.py:272:121: E501 Line too long (127 > 120)
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:170:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:211:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/checkpoint_utils/qwen2_saver.py:261:9: F841 Local variable `tp_rank` is assigned to but never used
verl/models/qwen2/megatron/layers/__init__.py:15:33: F401 `.parallel_attention.ParallelQwen2Attention` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:31: F401 `.parallel_decoder.ParallelQwen2DecoderLayer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:16:58: F401 `.parallel_decoder.ParallelQwen2DecoderLayerRmPad` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:17:27: F401 `.parallel_mlp.ParallelQwen2MLP` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/__init__.py:18:31: F401 `.parallel_rmsnorm.ParallelQwen2RMSNorm` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/models/qwen2/megatron/layers/parallel_attention.py:163:121: E501 Line too long (134 > 120)
verl/models/qwen2/megatron/layers/parallel_attention.py:282:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:283:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:284:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:307:1: E402 Module level import not at top of file
verl/models/qwen2/megatron/layers/parallel_attention.py:361:121: E501 Line too long (122 > 120)
verl/models/qwen2/megatron/modeling_qwen2_megatron.py:630:121: E501 Line too long (130 > 120)
verl/models/transformers/llama.py:106:121: E501 Line too long (180 > 120)
verl/models/transformers/llama.py:214:121: E501 Line too long (128 > 120)
verl/models/transformers/llama.py:215:121: E501 Line too long (135 > 120)
verl/models/transformers/monkey_patch.py:145:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:146:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:148:1: E402 Module level import not at top of file
verl/models/transformers/monkey_patch.py:157:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/models/transformers/qwen2.py:215:121: E501 Line too long (128 > 120)
verl/models/transformers/qwen2.py:216:121: E501 Line too long (135 > 120)
verl/protocol.py:303:121: E501 Line too long (125 > 120)
verl/protocol.py:352:121: E501 Line too long (171 > 120)
verl/protocol.py:578:121: E501 Line too long (142 > 120)
verl/protocol.py:580:121: E501 Line too long (150 > 120)
verl/protocol.py:583:121: E501 Line too long (167 > 120)
verl/protocol.py:715:1: E402 Module level import not at top of file
verl/protocol.py:725:121: E501 Line too long (121 > 120)
verl/protocol.py:766:1: E402 Module level import not at top of file
verl/protocol.py:768:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:23:1: E402 Module level import not at top of file
verl/single_controller/__init__.py:24:1: E402 Module level import not at top of file
verl/single_controller/base/decorator.py:149:16: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/decorator.py:198:121: E501 Line too long (134 > 120)
verl/single_controller/base/decorator.py:310:12: E721 Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
verl/single_controller/base/worker.py:137:121: E501 Line too long (131 > 120)
verl/single_controller/base/worker_group.py:89:33: G003 Logging statement uses `+`
verl/single_controller/base/worker_group.py:202:21: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/single_controller/ray/__init__.py:15:19: F401 `.base.RayClassWithInitArgs` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:41: F401 `.base.RayResourcePool` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:58: F401 `.base.RayWorkerGroup` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/single_controller/ray/__init__.py:15:74: F401 `.base.create_colocated_worker_cls` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/sglang/parallel_state.py:135:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/__init__.py:40:40: F401 `.vllm_v_0_6_3.llm.LLMEngine` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/third_party/vllm/__init__.py:45:22: F401 `vllm.LLM` imported but unused
verl/third_party/vllm/__init__.py:46:34: F401 `vllm.distributed.parallel_state` imported but unused
verl/third_party/vllm/__init__.py:50:121: E501 Line too long (141 > 120)
verl/third_party/vllm/vllm_v_0_5_4/dtensor_weight_loaders.py:189:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/llm.py:136:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_5_4/llm.py:196:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:174:5: F811 Redefinition of unused `llama_megatron_core_te_weight_loader` from line 90
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:205:5: F811 Redefinition of unused `llama_megatron_core_weight_loader` from line 121
verl/third_party/vllm/vllm_v_0_5_4/megatron_weight_loaders.py:254:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:21: F811 Redefinition of unused `LoadConfig` from line 24
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:36:45: F811 Redefinition of unused `ModelConfig` from line 26
verl/third_party/vllm/vllm_v_0_5_4/model_loader.py:323:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:127:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/parallel_state.py:245:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_5_4/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_5_4/worker.py:220:121: E501 Line too long (127 > 120)
verl/third_party/vllm/vllm_v_0_6_3/config.py:46:92: B026 Star-arg unpacking after a keyword argument is strongly discouraged
verl/third_party/vllm/vllm_v_0_6_3/dtensor_weight_loaders.py:225:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/llm.py:141:121: E501 Line too long (132 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm.py:169:121: E501 Line too long (161 > 120)
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:52:24: F811 Redefinition of unused `EngineArgs` from line 35
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:21: F811 Redefinition of unused `LoadConfig` from line 25
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:53:33: F811 Redefinition of unused `ModelConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:354:9: F841 Local variable `distributed_executor_backend` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/llm_engine_sp.py:360:121: E501 Line too long (152 > 120)
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:199:5: F841 Local variable `params_mapping` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/megatron_weight_loaders.py:229:121: E501 Line too long (150 > 120)
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:21: F811 Redefinition of unused `LoadConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:28:45: F811 Redefinition of unused `ModelConfig` from line 22
verl/third_party/vllm/vllm_v_0_6_3/model_loader.py:312:1: E402 Module level import not at top of file
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:21: F811 Redefinition of unused `LoadConfig` from line 27
verl/third_party/vllm/vllm_v_0_6_3/model_runner.py:44:33: F811 Redefinition of unused `ModelConfig` from line 29
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:129:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/parallel_state.py:247:5: F841 Local variable `rank` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:147:121: E501 Line too long (144 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:152:121: E501 Line too long (143 > 120)
verl/third_party/vllm/vllm_v_0_6_3/spmd_gpu_executor.py:232:5: F841 Local variable `port` is assigned to but never used
verl/third_party/vllm/vllm_v_0_6_3/worker.py:217:121: E501 Line too long (127 > 120)
verl/trainer/fsdp_sft_trainer.py:298:121: E501 Line too long (158 > 120)
verl/trainer/fsdp_sft_trainer.py:501:121: E501 Line too long (121 > 120)
verl/trainer/fsdp_sft_trainer.py:550:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:551:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:1: E402 Module level import not at top of file
verl/trainer/fsdp_sft_trainer.py:553:43: F811 Redefinition of unused `FSDPSFTTrainer` from line 82
verl/trainer/fsdp_sft_trainer.py:554:1: E402 Module level import not at top of file
verl/utils/__init__.py:16:24: F401 `.tokenizer.hf_processor` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/__init__.py:16:38: F401 `.tokenizer.hf_tokenizer` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/checkpoint/checkpoint_manager.py:48:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:51:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/fsdp_checkpoint_manager.py:56:13: B028 No explicit `stacklevel` keyword argument found
verl/utils/checkpoint/fsdp_checkpoint_manager.py:81:121: E501 Line too long (121 > 120)
verl/utils/checkpoint/fsdp_checkpoint_manager.py:98:121: E501 Line too long (124 > 120)
verl/utils/checkpoint/megatron_checkpoint_manager.py:64:37: B006 Do not use mutable data structures for argument defaults
verl/utils/checkpoint/megatron_checkpoint_manager.py:219:121: E501 Line too long (124 > 120)
verl/utils/dataset/__init__.py:15:25: F401 `.rl_dataset.RLHFDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:16:25: F401 `.rm_dataset.RMDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/__init__.py:17:26: F401 `.sft_dataset.SFTDataset` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/dataset/multiturn_sft_dataset.py:96:9: F841 Local variable `current_length` is assigned to but never used
verl/utils/dataset/sft_dataset.py:95:79: B023 Function definition does not bind loop variable `key`
verl/utils/dataset/sft_dataset.py:103:83: B023 Function definition does not bind loop variable `key`
verl/utils/debug/__init__.py:15:26: F401 `.performance.GPUMemoryLogger` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/__init__.py:15:43: F401 `.performance.log_gpu_memory_usage` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/utils/debug/performance.py:68:121: E501 Line too long (127 > 120)
verl/utils/debug/performance.py:71:121: E501 Line too long (126 > 120)
verl/utils/debug/profile.py:15:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/debug/profile.py:19:15: UP039 [*] Unnecessary parentheses after class definition
verl/utils/debug/profile.py:50:23: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:52:49: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:53:47: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:67: F541 [*] f-string without any placeholders
verl/utils/debug/profile.py:54:121: E501 Line too long (122 > 120)
verl/utils/flops_counter.py:175:121: E501 Line too long (124 > 120)
verl/utils/hdfs_io.py:135:32: G004 Logging statement uses f-string
verl/utils/import_utils.py:78:9: B904 Within an `except` clause, raise exceptions with `raise ... from err` or `raise ... from None` to distinguish them from errors in exception handling
verl/utils/logger/aggregate_logger.py:46:121: E501 Line too long (131 > 120)
verl/utils/logger/aggregate_logger.py:64:41: G004 Logging statement uses f-string
verl/utils/megatron/tensor_parallel.py:152:121: E501 Line too long (123 > 120)
verl/utils/megatron_utils.py:17:1: I001 [*] Import block is un-sorted or un-formatted
verl/utils/megatron_utils.py:22:20: F401 [*] `torch.nn` imported but unused
verl/utils/megatron_utils.py:34:38: F401 [*] `verl.utils.memory_buffer.build_memory_reference_from_module` imported but unused
verl/utils/megatron_utils.py:332:30: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/megatron_utils.py:366:27: B009 [*] Do not call `getattr` with a constant attribute value. It is not any safer than normal property access.
verl/utils/model.py:464:121: E501 Line too long (124 > 120)
verl/utils/rendezvous/ray_backend.py:39:25: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:41:22: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:63:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:65:30: G004 Logging statement uses f-string
verl/utils/rendezvous/ray_backend.py:72:26: G004 Logging statement uses f-string
verl/utils/reward_score/gsm8k.py:47:121: E501 Line too long (201 > 120)
verl/utils/reward_score/math.py:213:121: E501 Line too long (142 > 120)
verl/utils/reward_score/prime_code/__init__.py:16:8: F401 `re` imported but unused
verl/utils/reward_score/prime_code/testing_util.py:131:121: E501 Line too long (688 > 120)
verl/utils/reward_score/prime_code/testing_util.py:168:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:222:9: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:254:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:255:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:259:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:260:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:264:13: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:265:17: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:269:121: E501 Line too long (132 > 120)
verl/utils/reward_score/prime_code/testing_util.py:293:21: E722 Do not use bare `except`
verl/utils/reward_score/prime_code/testing_util.py:294:25: B018 Found useless expression. Either assign it to a variable or remove it.
verl/utils/reward_score/prime_code/testing_util.py:335:121: E501 Line too long (165 > 120)
verl/utils/reward_score/prime_code/testing_util.py:386:121: E501 Line too long (209 > 120)
verl/utils/reward_score/prime_code/testing_util.py:390:121: E501 Line too long (183 > 120)
verl/utils/reward_score/prime_code/testing_util.py:455:121: E501 Line too long (211 > 120)
verl/utils/reward_score/prime_code/testing_util.py:459:121: E501 Line too long (185 > 120)
verl/utils/reward_score/prime_code/testing_util.py:582:121: E501 Line too long (197 > 120)
verl/utils/reward_score/prime_code/testing_util.py:586:121: E501 Line too long (171 > 120)
verl/utils/reward_score/prime_math/__init__.py:106:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:119:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:246:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:315:121: E501 Line too long (128 > 120)
verl/utils/reward_score/prime_math/__init__.py:331:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/__init__.py:407:1: E402 Module level import not at top of file
verl/utils/reward_score/prime_math/__init__.py:429:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/grader.py:302:21: B005 Using `.strip()` with multi-character strings is misleading
verl/utils/reward_score/prime_math/math_normalize.py:54:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:70:17: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:101:5: E722 Do not use bare `except`
verl/utils/reward_score/prime_math/math_normalize.py:181:121: E501 Line too long (142 > 120)
verl/utils/tokenizer.py:30:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:33:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/tokenizer.py:55:9: B028 No explicit `stacklevel` keyword argument found
verl/utils/torch_functional.py:86:72: E741 Ambiguous variable name: `l`
verl/utils/torch_functional.py:177:5: F841 Local variable `total_params` is assigned to but never used
verl/utils/torch_functional.py:397:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:399:1: E402 Module level import not at top of file
verl/utils/torch_functional.py:400:1: E402 Module level import not at top of file
verl/utils/ulysses.py:246:5: F841 Local variable `sp_size` is assigned to but never used
verl/workers/actor/dp_actor.py:244:13: F841 Local variable `response_mask` is assigned to but never used
verl/workers/actor/megatron_actor.py:22:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/actor/megatron_actor.py:85:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:86:121: E501 Line too long (128 > 120)
verl/workers/actor/megatron_actor.py:89:121: E501 Line too long (133 > 120)
verl/workers/actor/megatron_actor.py:96:121: E501 Line too long (126 > 120)
verl/workers/actor/megatron_actor.py:175:121: E501 Line too long (135 > 120)
verl/workers/actor/megatron_actor.py:237:121: E501 Line too long (150 > 120)
verl/workers/actor/megatron_actor.py:243:121: E501 Line too long (144 > 120)
verl/workers/actor/megatron_actor.py:245:121: E501 Line too long (130 > 120)
verl/workers/actor/megatron_actor.py:247:121: E501 Line too long (122 > 120)
verl/workers/actor/megatron_actor.py:286:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/critic/dp_critic.py:227:21: F841 Local variable `input_ids` is assigned to but never used
verl/workers/critic/dp_critic.py:230:21: F841 Local variable `position_ids` is assigned to but never used
verl/workers/megatron_workers.py:18:1: I001 [*] Import block is un-sorted or un-formatted
verl/workers/reward_manager/__init__.py:15:20: F401 `.batch.BatchRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:16:19: F401 `.dapo.DAPORewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:17:20: F401 `.naive.NaiveRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/__init__.py:18:20: F401 `.prime.PrimeRewardManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_manager/prime.py:61:121: E501 Line too long (217 > 120)
verl/workers/reward_model/__init__.py:15:19: F401 `.base.BasePPORewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/__init__.py:15:27: F401 `.reward_model.MegatronRewardModel` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/reward_model/megatron/reward_model.py:65:9: F841 Local variable `ori_bs` is assigned to but never used
verl/workers/reward_model/megatron/reward_model.py:89:121: E501 Line too long (132 > 120)
verl/workers/reward_model/megatron/reward_model.py:215:9: F841 Local variable `input_shapes` is assigned to but never used
verl/workers/rollout/naive/__init__.py:15:28: F401 `.naive_rollout.NaiveRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/sglang_rollout/__init__.py:14:29: F401 `.sglang_rollout.SGLangRollout` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:51:121: E501 Line too long (157 > 120)
verl/workers/rollout/vllm_rollout/fire_vllm_rollout.py:153:13: F841 Local variable `log_probs` is assigned to but never used
verl/workers/rollout/vllm_rollout/vllm_rollout.py:22:121: E501 Line too long (129 > 120)
verl/workers/rollout/vllm_rollout/vllm_rollout.py:60:121: E501 Line too long (157 > 120)
verl/workers/sharding_manager/__init__.py:16:5: F401 `verl.utils.import_utils.is_megatron_core_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:17:5: F401 `verl.utils.import_utils.is_sglang_available` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:21:19: F401 `.base.BaseShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:22:27: F401 `.fsdp_ulysses.FSDPUlyssesShardingManager` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
verl/workers/sharding_manager/__init__.py:29:121: E501 Line too long (149 > 120)
verl/workers/sharding_manager/__init__.py:32:121: E501 Line too long (126 > 120)
verl/workers/sharding_manager/fsdp_sglang.py:99:9: F841 Local variable `load_format` is assigned to but never used
verl/workers/sharding_manager/fsdp_sglang.py:123:121: E501 Line too long (178 > 120)
verl/workers/sharding_manager/fsdp_ulysses.py:59:13: F841 Local variable `sp_size` is assigned to but never used
Found 305 errors.
```
---------
Co-authored-by: Haibin Lin <haibin.lin@bytedance.com>
* [logging] fix: typo of fsdp_checkpoint_manager saving optim path (#1276)
fix a minor typo of printing optim saving path in
fsdp_checkpoint_manager.py
* [doc] fix: fix 2 minor issues in installation and reward explanation (#1215)
close
- #1214
- #1213
Co-authored-by: HL <linhaibin.eric@gmail.com>
* [merger] fix: merged generation config is inconsistent with hf pre-trained model (#1277)
https://github.com/volcengine/verl/blob/afeac9a0230a0980e990a3c59e08e8e0890baaa4/scripts/model_merger.py#L195-L200
Model created by `from_config` won't load the `generation_config.json`
from `args.hf_model_path`, instead it create a generation config
separately.
This inconsistency will lead to strange generating error when user using
vllm/hf rollout without carefully override
sampling_params/generation_config, see issue here:
https://github.com/volcengine/verl/issues/1246
This PR introduce a `patch_model_generation_config` function which patch
the model from config to correctly use the pretrained generation config.
Fix https://github.com/volcengine/verl/issues/1246.
* Option to make model private when pushing to hub, pushing the tokenizer for convenience (#1259)
Very small changes to `model_merger.py` so that tokenizer is pushed to
hub and model can be pushed privately.
* [CI] feat: only check changed files (#1294)
* [example] chore: remove verl_getting_started.ipynb (#1281)
remove the out-dated notebook
* [doc] add the multi modal doc (#1292)
## Motivation
There is currently no docs support for multimodal task on verl, so I
think we need to add a related document.
* docs: add DeepWiki and ICLR links (#1283)
* [docs] add pr template (#1287)
# What does this PR do?
add the PR template to improve the readability of PR.
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
* fix: catch any error in math reward function (#1312)
# What does this PR do?
This PR fixes collapse in the math reward function by catch any possible
errors.
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: None
- **Training**: None
- **Inference**: None
* [vllm] add moe patch for qwen3-moe (#1316)
# What does this PR do?
Add moe patch for qwen3-moe. Fix the weight loader issue in vLLM MoE
models. This isn’t a permanent solution, and we may need to contribute
code to vLLM to address the problem caused by FusedMoE. I’m already
seeking suggestions for this.
# ChangeLog:
- Add Qwen3MoeForCausalLM class for moe_patch
* fix reward model and add CI test (#1252)
Fix bugs related to #1165 .
Megatron backend reward model has no CI test, add to current ppo
trainer.
Fix `micro_batch_size_per_gpu` but not sure whether it is right for
reward config.
The output format is also not right with current `forward_micro_batch`
implementation.
* [sglang] feat: Add SGLang async multi-turn rollout with tool support (#1037)
A redesigned version of #917
## Current Status
[Develop log &
Tracker](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/113)
**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.
**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.
## Key Features will be introduced in future version
- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.
**Future Plan**
[Discussion
Thread](https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/74#issuecomment-2763192625)
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.
## Contributors & Acknowledgement
- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng
---------
Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
* [fix] Remove grad_offload in rloo example script (#1323)
# What does this PR do?
`grad_offload` option was removed in #284 for fsdp backend, current
script will error out due to this.
# ChangeLog:
- Remove grad_offload in rloo example script
# Usage
- Run the changed script
## Before submitting
- [X] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [X] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [X] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: N/A
- **Training**: FSDP
- **Inference**: None
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* cancel bootstrapping for n=n_samples (#1320)
# What does this PR do?
The validation metrics currently bootstraps its estimates by randomly
sampling 1,2,4,8,16,...,n_samples results out of n_samples results.
However, this bootstrapping doesn't make sense for `n=n_samples` as you
cannot have more information about the estimate for `pass@n_samples` if
you only have `n_samples` samples.
This results in weird results when doing RL with only one problem in the
validation set (best@N is a value between 0 and 1 instead of 0 or 1)
This PR turns off bootstrapping for n=n_samples case and leaves rest of
the computations the same.
* docs: add community blogs and fix link rendering (#1324)
# What does this PR do?
Add one-line overview of what this PR aims to achieve or accomplish.
# ChangeLog:
- Add two reference blogs to README
# Usage
None
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [] Did you write any test cases if neccessary? No tests needed
* [doc] fix dataset path for gsm8k and url error (#1327)
# What does this PR do?
fix dataset path for gsm8k and some url error.
# ChangeLog:
change the readme file to fix gsm8k download path.
# Usage
- You can add one use example below.
```python
# Add code snippet or script demonstrating how to use this
```
- For algorithm implementation and new model support, you can add
training curve plots and evaluatuion results below.
## Before submitting
- [ ] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [ ] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [ ] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
* [feat] add FusedWorker (#1278)
on behalf of @zw0610
FusedWorker is designed to enhance the ability of colocated workers.
FusedWorker keeps most of the interfaces as colocated workers: Users
shall use `create_colocated_worker_cls_fused` to create colocated worker
class, use `spawn` to split FusedWorker to dict of workers.
In colocated workers, access the methods of child workers is done by
using `spawn` then access via worker dict or calling
`{worker_group}.{worker}_{method}`. In FusedWorker, the first method was
preserved, while the latter was change to a new way: First use
`{worker_group}.fuse(prefixes)` to bind workers to the worker group,
then use `{worker_group}.{worker}.foo()` to access child workers.
* [test] fix: test arithmetic_sequence failed to run (#1333)
# What does this PR do?
e2e test `arithmetic_sequence` is currently broken, with error
`TypeError: not a string` thrown on code `tokenizer =
AutoTokenizer.from_pretrained(local_path)` when running
`tests/e2e/run_ray_trainer.sh`. This PR aims to fix it.
In the `arithmetic_sequence` task, `tests.e2e.envs.digit_completion`
module was imported in the beginning but not used. This import seems
meaningless. However, when this library is imported,
`AutoTokenizer.register()` will be called to set configurations for
`AutoTokenizer`. Only after that can `AutoTokenizer` be successfully
initialized in test code to perform subsequent tasks.
## Timeline
- In #934 , to improve CI efficiency, the CI corresponding to
`arithmetic_sequence` was removed.
- In #1010 , according to the `unused_import` rule, this import was
deleted, triggering the bug.
# ChangeLog
- `AutoTokenizer.register` was added explicitly, which ensures the
configurations were set before initialization of `AutoTokenizer`.
# Usage
- the original code `tests/e2e/run_ray_trainer.sh` is available for
tests.
```python
bash tests/e2e/run_ray_trainer.sh
```
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: none
- **Training**: none
- **Inference**: none
* [FIX] metric_utils log best, worst, maj only for n_resps > 1 (#1248)
Solves #1249
Instead of logging best@1/mean and worst@1/mean, which is identical to
mean@1, just do not log it when there is only one validation response
per prompt (`n_resps == 1`). Same applies to std.
Otherwise we get many duplicated plots that show the same thing.
The only change is the addition of the `if n_resps > 1:` statement.
* [dev] feat: improve PR template (#1343)
This PR tries to imporve the PR template itself.
* [recipe] feat: latest reproduction of DAPO (#1336)
# What does this PR do?
This PR updates the latest reproduction results of DAPO.
## Before submitting
- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.
# Additional Info:
- **Issue Number**: none
- **Training**: none
- **Inference**: none
* [docs] fix: typo (#1351)
* [installation] doc: Fix pip install instructions (#1353)
### Checklist Before Starting
- [X] Search for similar PR(s).
### What does this PR do?
There should be no space between `.` and `[vllm]` or `[sglang]`, or it
will result in error:
```logs
ERROR: Invalid requirement: '[vllm]': Expected package name at the start of dependency specifier
[vllm]
```
In addition, I rewrite this part to make the instructions more clear (as
`.. or ..` can't be executed by bash directly)
### Additional Info.
- **Issue Number**: none
- **Training**: none
- **Inference**: none
### Checklist Before Submitting
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [X] Add `[BREAKING]` to the PR title if it breaks any API.
- [X] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add CI test(s) if neccessary.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* [fsdp] feat: support fsdp2 training and inference in fsdp_workers (#1026)
# What does this PR do?
This PR supports fsdp2 for fsdp_worker. Torch version 2.4 or higher is
required.
# Usage Example
```
sh examples/grpo_trainer/run_qwen2-7b.sh \
actor_rollout_ref.ref.strategy=fsdp2 \
actor_rollout_ref.actor.strategy=fsdp2
```
To save more memory, you can add the parameter below to enable the fsdp2
OffloadPolicy:
```
actor_rollout_ref.actor.offload_policy=True
```
You can see the profile comparison between fsdp1 and fsdp2 here:
https://github.com/volcengine/verl/pull/1026#issuecomment-2824343860
---------
Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: shengguangming <shengguangming@bytedance.com>
* [docs] fix: Fix Arxiv Link (#1364)
Arxiv link is not rendering on github or
https://verl.readthedocs.io/en/latest/index.html#
### Checklist Before Starting
- [x ] Search for similar PR(s).
### What does this PR do?
Makes external link to arxiv paper resolve properly.
### High-Level Design
N/A
### Specific Changes
Single line doc change
### API
N/A
### Usage Example
N/A
### Test
N/A
### Additional Info.
### Checklist Before Submitting
All N/A
* [dataproto] feat: Add auto padding for DataProto (#1356)
### Checklist Before Starting
- [x] Search for similar PR(s).
Coming from #577 , credit to @zw0610
### What does this PR do?
Today, users must manually duplicate (repeat) a DataProto so its batch
size matches the data‑parallel (dp) size of the target WorkerGroup. This
PR enables `auto_padding` to pad the `DataProto` when chunk is called.
### Specific Changes
* Enriched the `DataProto` so that it can have context of padding during
chunking;
* Modified the `decorator.py` that a DataProto can be automatically
padded and chunked with `dispatch_dp_compute_data_proto`;
* Added unit tests under `tests/ray/test_auto_padding.py`.
### API
Two new API under `DataProto` are introduced, which are `padding` and
`is_padding_enabled`
### Test
Tests added to `tests/ray/test_auto_padding.py`
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
---------
Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: Wang Zhang <zhangwang.nozomi@bytedance.com>
Co-authored-by: Wang Zhang <zw199006@gmail.com>
* [ray] feat: Making decorator register available for async function (#1370)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
This PR enables the decorators to be able to be applied onto async
functions.
### High-Level Design
* Simply added a inner wrapper function available for async func inside
the `register` function.
### Usage Example
```python
@register(dispatch_mode=Dispatch.ONE_TO_ALL, blocking=False)
async def async_fn(self, sleep_time):
return await asyncio.sleep(sleep_time * 0.1)
```
### Test
* `tests/ray/test_decorator.py`
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
---------
Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
* docs: Add runllm widget for VeRL Doc sites (#1366)
### Checklist Before Starting
- [ ] Search for similar PR(s).
### What does this PR do?
Add runllm widget for https://app.readthedocs.org/projects/verl/
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if neccessary.
* [trainer] breaking: pass dataset as required args to SFTTrainer; also change ppo ray trainer to take custom datasets as inputs (#1282)
* [ci][fix] Enable part of ray test to be run on CPU machine (#1372)
* [fix][ci] fix two pipelines that fails on the main branch (#1378)
* [feat] Enable `update_model_config` to take nested dict to update `AutoConfig` of transformers (#1379)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
* Enable `update_model_config` to take nested dict to update
`AutoConfig` of transformers
* Added a test pipeline for all the tests under `tests/utils`, Any
future unit tests for `verl/utils` should be added here
* Re-organized the tests file structure.
### Usage Example
For the new `update_model_config`, an example looks like below:
```python
override_config_kwargs = {
"bos_token_id": self.tokenizer.bos_token_id,
...
"nested_config": {k1: v1, k2, v2},
}
update_model_config(actor_model_config, override_config_kwargs=override_config_kwargs)
```
### Test
Added `tests/verl/utils/test_model.py::test_update_model_config`
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
---------
Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
* [rollout] misc: add demo chat completion scheduler described in ReTool paper (#1297)
Co-authored-by: shengguangming <shengguangming@bytedance.com>
* [dev] fix: validation metrics (#1374)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
1. Fix the error that `metric` is not added when `n == 1`.
2. Remove `std@1`.
3. Add assertation for doing initial validation but `val_metrics` is
empty.
### Additional Info.
- **Issue Number**: none
- **Training**: none
- **Inference**: none
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
* [sglang] Upgrade sglang to 0.4.6.post1 & misc fixes (#1385)
### Checklist Before Starting
- [x] Search for similar PR(s).
### What does this PR do?
- [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3
- [x] fix: flush_cache was never awaited
- [x] remove unused env
- [x] fix: add rank num to port to avoid SGLang picking the same port
when random.seed being set
- [x] feat: disable SGLang memory inbalance check by default
https://github.com/sgl-project/sglang/pull/5426
- [x] update setup.py to avoid old version pip can not resolving deps
- [x] fix: tools_kwargs length mismatch with batch #1380
> Add one-line overview of what this PR aims to achieve or accomplish.
### High-Level Design
> Demonstrate the high-level design if this PR is complex.
### Specific Changes
> List the specific changes.
### API
> Demonstrate how the API changes if any.
### Usage Example
> Provide usage example(s) for easier usage.
```python
# Add code snippet or script demonstrating how to use this
```
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.
### Additional Info.
- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title …
…erl-project#1105) if we use sglang as the rollout engine, we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK to avoid that the memory capacity is unbalanced, please refer to [#5426 in sglang](sgl-project/sglang#5426) # why we should export SGL_DISABLE_TP_MEMORY_INBALANCE_CHECK when using SGLang as the rollout engine in verl? 1. verl initializes a SGlangRollout module during rollout, which is used to evaluate/generate samples. 2. SGLangRollout will initialize VerlEngine, further initialize a torch. Distributed. DeviceMesh, used to support the TP. 3. DeviceMesh.init () internally checks the free video memory of all participating devices, and if the difference is too large (more than about 10%), it directly reports an error, preventing initialization failures or communication deadlock. # Why might there be inconsistent graphic memory? ## Ray Distributed Actor loads the model at different times: verl uses ray multi-process multi-gpu concurrent training, and each `WorkerDict` may be called at different times: `self.rollout = SGLangRollout(...)` different workers initialize the model at different times → different memory usage. ## Delayed initialization causes memory bias Some workers enter the model loading/infer process earlier than others, such as `generate_sequences()` or `compute_log_prob()`. The early-loaded worker video memory has been eaten by the model, and the late-loaded worker video memory is still empty → the graphic memory gap is large. ## Verl+SGLang's TP initialization goes "all device broadcast", but there is no uniform release timing SGLangRollout only needs to involve the part of the graphics card used by the rollout machine, but its VerlEngine initialization calls torch.distribut.init process group() and broadcast a bunch of weights. Result in: Non-rollout cards also participate in communication; Then initialize DeviceMesh, and the error "inconsistent memory" is reported. ## Different loading modes of FSDP/TP models also cause deviations if the following parameters are set ``` actor.fsdp_config.param_offload=True ref.fsdp_config.param_offload=True ``` Some worker parameters are on the CPU, and some parameters are shard to the GPU in advance. This also creates an asymmetric distribution of video memory. --------- Co-authored-by: ocss884 <ocss.lin@gmail.com>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? - [x] upgrade required sglang version to 0.4.6.post1 which suports Qwen3 - [x] fix: flush_cache was never awaited - [x] remove unused env - [x] fix: add rank num to port to avoid SGLang picking the same port when random.seed being set - [x] feat: disable SGLang memory inbalance check by default sgl-project/sglang#5426 - [x] update setup.py to avoid old version pip can not resolving deps - [x] fix: tools_kwargs length mismatch with batch verl-project#1380 > Add one-line overview of what this PR aims to achieve or accomplish. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.
…may be occupied by other processes.'
Motivation
fix #4233 #3842
Modifications
The fact that 'the memory capacity is unbalanced' appears in some scenes doesn't seem to be enough of an Exception. In #3842, @FrankLeeeee suggestion remove the check at
sglang/python/sglang/srt/model_executor/model_runner.py
Line 378 in 838fa0f
it's probably better off than just deleting
if min_per_gpu_memory < local_gpu_memory * 0.9:Checklist