first pr to support deepspeed in verl#5313
first pr to support deepspeed in verl#5313NenoL2001 wants to merge 18 commits intoverl-project:mainfrom
Conversation
This reverts commit 0f9dca9.
|
Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive support for DeepSpeed as a training backend for PPO and GRPO, including benchmarking scripts, documentation, and necessary code integrations. The changes are extensive and well-structured. The addition of DeepSpeed support is a significant feature. My review focuses on potential security and correctness issues. I've identified a security concern in the benchmark scripts related to the creation of temporary directories, which should be addressed to ensure safe execution in multi-user environments. The rest of the changes, including the complex workarounds for DeepSpeed's API, appear robust and well-implemented.
| local ray_tmp="/tmp/r$(printf '%s' "${case_name}_a${attempt}" | md5sum | cut -c1-12)" | ||
| rm -rf "$ray_tmp" | ||
| mkdir -p "$ray_tmp" |
There was a problem hiding this comment.
Using predictable names in /tmp is insecure. The rm -rf followed by mkdir -p is particularly dangerous as it can be exploited via a symlink attack (Time-of-check to time-of-use vulnerability). Please use mktemp to create temporary directories securely.
| local ray_tmp="/tmp/r$(printf '%s' "${case_name}_a${attempt}" | md5sum | cut -c1-12)" | |
| rm -rf "$ray_tmp" | |
| mkdir -p "$ray_tmp" | |
| local ray_tmp | |
| ray_tmp=$(mktemp -d "/tmp/r${case_name}_a${attempt}.XXXXXX") |
| local short_id | ||
| short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8) | ||
| local ray_tmp="/tmp/ray_${short_id}" |
There was a problem hiding this comment.
Creating temporary directories in /tmp with predictable names can lead to security vulnerabilities in a multi-user environment. Another user could predict the directory name and create it before this script does, potentially leading to data leakage or corruption. It's safer to use mktemp to create a secure temporary directory.
| local short_id | |
| short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8) | |
| local ray_tmp="/tmp/ray_${short_id}" | |
| local ray_tmp | |
| ray_tmp=$(mktemp -d "/tmp/ray_${case_name}.XXXXXX") |
| local short_id | ||
| short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8) |
There was a problem hiding this comment.
Creating temporary directories in /tmp with predictable names can lead to security vulnerabilities in a multi-user environment. Another user could predict the directory name and create it before this script does, potentially leading to data leakage or corruption. It's safer to use mktemp to create a secure temporary directory.
| local short_id | |
| short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8) | |
| local ray_tmp | |
| ray_tmp=$(mktemp -d "/tmp/rgrpo_${case_name}.XXXXXX") |
|
Thanks for you contribution, since DeepSpeed has much overlap with FSDP, we're not going to support DeepSpeed backend. |
What does this PR do?
This PR makes DeepSpeed a first-class RL training backend in
verlfor PPO/GRPO, with production-ready scripts/docs for 6-case benchmarking and PR reporting.Scope:
GRPO-30+PPO-60).Related:
Checklist Before Starting
[{modules}] {type}: {description}Suggested title:
[worker,trainer,rollout,ckpt,perf,doc,tool] feat: deepspeed rl backend hardening with ppo/grpo six-case benchesNo API-breaking change, so no
[BREAKING].Test
Bench setup:
Qwen/Qwen3-0.6BPPO 60-step (6 cases, all PASS)
Source:
outputs/ppo_zero_six_60_seed7777_lt_fix_20260213_023008/summary.tsvGRPO 30-step aligned snapshot (6 cases)
Source:
outputs/pr_curves_20260213_100545/grpo_zero_six_30step_metrics.tsvCurves:
GRPO

PPO

API and Usage Example
New/updated scripts:
scripts/bench/run_ds_zero_six_case_bench.sh(PPO 6-case)scripts/bench/run_ds_zero_six_case_grpo_bench.sh(GRPO 6-case)scripts/bench/run_ds_ppo_grpo_twelve_case_bench.sh(12-case end-to-end)scripts/bench/export_pr_curves.sh(export PR curves)New plotting option:
--max-stepin:scripts/bench/plot_ds_zero_six_curves.pyscripts/bench/plot_ds_zero_six_grpo_curves.pyUsage: