Skip to content

Comments

first pr to support deepspeed in verl#5313

Open
NenoL2001 wants to merge 18 commits intoverl-project:mainfrom
NenoL2001:first-pr-ds-backend-v1-20260212
Open

first pr to support deepspeed in verl#5313
NenoL2001 wants to merge 18 commits intoverl-project:mainfrom
NenoL2001:first-pr-ds-backend-v1-20260212

Conversation

@NenoL2001
Copy link

@NenoL2001 NenoL2001 commented Feb 13, 2026

What does this PR do?

This PR makes DeepSpeed a first-class RL training backend in verl for PPO/GRPO, with production-ready scripts/docs for 6-case benchmarking and PR reporting.

Scope:

  • Stabilize DeepSpeed worker path for ZeRO1/2/3 + DP + offload + checkpoint.
  • Add GRPO 6-case bench tooling and unified PPO+GRPO 12-case orchestration.
  • Add reproducible curve export for PR evidence (GRPO-30 + PPO-60).
  • Improve docs and user-facing launch scripts for PPO/GRPO DeepSpeed usage.

Related:


Checklist Before Starting

Suggested title:
[worker,trainer,rollout,ckpt,perf,doc,tool] feat: deepspeed rl backend hardening with ppo/grpo six-case benches

No API-breaking change, so no [BREAKING].


Test

Bench setup:

  • Model: Qwen/Qwen3-0.6B
  • Data: GSM8K parquet
  • Backend: vLLM async rollout
  • Cases: ZeRO1/2/3 × {no_offload, cpu_offload}

PPO 60-step (6 cases, all PASS)

case status score@60 throughput@60 peak mem reserved (GB)
zero1_no_offload PASS 0.6719 571.47 20.35
zero2_no_offload PASS 0.7031 543.08 21.11
zero3_no_offload PASS 0.5938 200.41 21.91
zero1_cpu_offload PASS 0.5938 526.45 20.33
zero2_cpu_offload PASS 0.5625 394.61 13.81
zero3_cpu_offload PASS 0.6406 115.48 10.69

Source: outputs/ppo_zero_six_60_seed7777_lt_fix_20260213_023008/summary.tsv

GRPO 30-step aligned snapshot (6 cases)

case score@30 throughput@30 mem reserved@30 (GB)
zero1_no_offload 0.6906 400.00 15.87
zero2_no_offload 0.6660 386.59 16.78
zero3_no_offload 0.6977 217.42 13.56
zero1_cpu_offload 0.7188 398.31 15.76
zero2_cpu_offload 0.5254 349.72 12.87
zero3_cpu_offload 0.7117 176.64 8.49

Source: outputs/pr_curves_20260213_100545/grpo_zero_six_30step_metrics.tsv

Curves:

GRPO
image

PPO
image


API and Usage Example

New/updated scripts:

  • scripts/bench/run_ds_zero_six_case_bench.sh (PPO 6-case)
  • scripts/bench/run_ds_zero_six_case_grpo_bench.sh (GRPO 6-case)
  • scripts/bench/run_ds_ppo_grpo_twelve_case_bench.sh (12-case end-to-end)
  • scripts/bench/export_pr_curves.sh (export PR curves)

New plotting option:

  • --max-step in:
    • scripts/bench/plot_ds_zero_six_curves.py
    • scripts/bench/plot_ds_zero_six_grpo_curves.py

Usage:

# export PR curves from existing runs
./scripts/bench/export_pr_curves.sh

# custom run roots
PPO_RUN_ROOT=outputs/<ppo_run_dir> \
GRPO_RUN_ROOT=outputs/<grpo_run_dir> \
OUT_DIR=outputs/pr_curves_manual \
./scripts/bench/export_pr_curves.sh

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ NenoL2001
❌ Ubuntu


Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for DeepSpeed as a training backend for PPO and GRPO, including benchmarking scripts, documentation, and necessary code integrations. The changes are extensive and well-structured. The addition of DeepSpeed support is a significant feature. My review focuses on potential security and correctness issues. I've identified a security concern in the benchmark scripts related to the creation of temporary directories, which should be addressed to ensure safe execution in multi-user environments. The rest of the changes, including the complex workarounds for DeepSpeed's API, appear robust and well-implemented.

Comment on lines +113 to +115
local ray_tmp="/tmp/r$(printf '%s' "${case_name}_a${attempt}" | md5sum | cut -c1-12)"
rm -rf "$ray_tmp"
mkdir -p "$ray_tmp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using predictable names in /tmp is insecure. The rm -rf followed by mkdir -p is particularly dangerous as it can be exploited via a symlink attack (Time-of-check to time-of-use vulnerability). Please use mktemp to create temporary directories securely.

Suggested change
local ray_tmp="/tmp/r$(printf '%s' "${case_name}_a${attempt}" | md5sum | cut -c1-12)"
rm -rf "$ray_tmp"
mkdir -p "$ray_tmp"
local ray_tmp
ray_tmp=$(mktemp -d "/tmp/r${case_name}_a${attempt}.XXXXXX")

Comment on lines +92 to +94
local short_id
short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8)
local ray_tmp="/tmp/ray_${short_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Creating temporary directories in /tmp with predictable names can lead to security vulnerabilities in a multi-user environment. Another user could predict the directory name and create it before this script does, potentially leading to data leakage or corruption. It's safer to use mktemp to create a secure temporary directory.

Suggested change
local short_id
short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8)
local ray_tmp="/tmp/ray_${short_id}"
local ray_tmp
ray_tmp=$(mktemp -d "/tmp/ray_${case_name}.XXXXXX")

Comment on lines +115 to +116
local short_id
short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Creating temporary directories in /tmp with predictable names can lead to security vulnerabilities in a multi-user environment. Another user could predict the directory name and create it before this script does, potentially leading to data leakage or corruption. It's safer to use mktemp to create a secure temporary directory.

Suggested change
local short_id
short_id=$(printf '%s' "$case_name" | md5sum | cut -c1-8)
local ray_tmp
ray_tmp=$(mktemp -d "/tmp/rgrpo_${case_name}.XXXXXX")

@wuxibin89
Copy link
Collaborator

Thanks for you contribution, since DeepSpeed has much overlap with FSDP, we're not going to support DeepSpeed backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants