[config] remove deprecated actor_rollout_ref.rollout.mode config field by dhruvnigam93 · Pull Request #5352 · verl-project/verl

dhruvnigam93 · 2026-02-19T06:42:22Z

What does this PR do?

The mode field in RolloutConfig (i.e. actor_rollout_ref.rollout.mode) only ever supported "async" — sync mode was removed in v0.7. Every internal code path already hard-coded async behaviour, leaving ~50+ lines of dead conditional branches throughout the codebase.

This PR removes all internal branching on mode / async_rollout_mode, simplifying the codebase to always use the async code path, while keeping the field in the config schema as Optional[str] = None so that old configs with mode: async continue to load without errors and display a clear deprecation message.

Related issue: #4604

Checklist Before Starting

Search for similar PRs. No existing PR found for this specific cleanup. Closest: [misc] refactor: delete remaining batch-mode code in single controller #5319 (single-controller batch-mode removal, same pattern of dead-code cleanup).

Test

All changes are pure refactoring (no algorithmic or behavioral change). Tested as follows:

Pre-commit hooks — all pass:
- ruff (lint) ✅
- ruff format ✅
- mypy ✅
- autogen-trainer-cfg (generated YAMLs regenerated — mode field no longer appears) ✅
- check-docstring-coverage ✅
- check-license ✅
- compile-python ✅
CPU unit tests — pytest -s --asyncio-mode=auto tests/ — 295 passed, 16 failed, 27 skipped. The 16 failures are pre-existing (unrelated huggingface_hub validation error on absolute local paths). Zero new failures introduced.
End-to-end PPO training — a full training job ran successfully through multiple steps with healthy metrics (actor loss, critic loss, grad norms, ~1482 tokens/sec throughput) and no DeprecationWarning for rollout.mode, confirming the production config does not set mode.

API and Usage Example

Old configs (still work, emit a deprecation warning):

actor_rollout_ref:
  rollout:
    mode: async   # DeprecationWarning shown; field has no effect
    name: vllm
    ...

New configs (no warning):

actor_rollout_ref:
  rollout:
    name: vllm
    ...

Deprecation warning text (emitted from RolloutConfig.__post_init__):

DeprecationWarning: The 'mode' field in RolloutConfig is deprecated and has no effect.
All rollouts now use async mode. Please remove 'mode: async' from your configuration file.
This field will be removed in a future version.

Rollout class lookup — simplified API (internal):

# Before
from verl.workers.rollout.base import get_rollout_class
cls = get_rollout_class("vllm", mode="async")

# After
cls = get_rollout_class("vllm")

Design & Code Changes

High-level design

Keep the mode field in the dataclass as Optional[str] = None (backward-compatible default) but never read it anywhere in the code. A DeprecationWarning is emitted in __post_init__ when the field is explicitly set, guiding users to remove it from their configs. All internal code always uses the async path directly.

Specific changes

File	Change
`verl/workers/config/rollout.py`	`mode: str = "async"` → `mode: Optional[str] = None`; `__post_init__` emits `DeprecationWarning` when field is set; removed old `ValueError` for sync mode
`verl/workers/rollout/base.py`	`_ROLLOUT_REGISTRY` keys changed from `(name, mode)` tuples to plain `name` strings; `get_rollout_class(name, mode)` → `get_rollout_class(name)`
`verl/workers/fsdp_workers.py`	`get_rollout_class(name, mode)` → `get_rollout_class(name)`
`verl/workers/megatron_workers.py`	Same
`verl/workers/engine_workers.py`	Same
`verl/checkpoint_engine/base.py`	Same
`verl/trainer/ppo/ray_trainer.py`	Removed `self.async_rollout_mode = True` assignment and associated dead comment
`verl/experimental/fully_async_policy/fully_async_rollouter.py`	Removed `assert rollout.mode == "async"` and `self.async_rollout_mode = True`
`verl/experimental/fully_async_policy/fully_async_trainer.py`	Same
`verl/experimental/one_step_off_policy/ray_trainer.py`	Same
`verl/experimental/transfer_queue/ray_trainer.py`	Removed `if rollout.mode == "async":` conditional; always sets up async manager directly
`verl/experimental/vla/rob_ray_trainer.py`	Removed mode check; always uses `EnvLoop` directly
`verl/experimental/vla/sac/sac_ray_trainer.py`	Same
`verl/trainer/config/rollout/rollout.yaml`	Removed `mode: async` default line
`verl/trainer/config/_generated_ppo_trainer.yaml`	Regenerated (no `mode` field)
`verl/trainer/config/_generated_ppo_megatron_trainer.yaml`	Regenerated
`verl/trainer/config/_generated_ppo_veomni_trainer.yaml`	Regenerated
`tests/experimental/agent_loop/agent_utils.py`	Always uses `AsyncActorRolloutRefWorker`; removed conditional and unused `ActorRolloutRefWorker` import
`tests/trainer/config/test_legacy_config_on_cpu.py`	Added `"mode"` to `ignored_keys` (intentional schema field removal)

Checklist Before Submitting

Read the Contribute Guide.
Applied pre-commit checks: pre-commit run --all-files — all hooks pass ✅
Add / Update the documentation — deferred - will make the changes if they are good to go by maitainers. see below.
Unit tests pass on CPU (pytest --asyncio-mode=auto tests/); no new failures
GPU CI requested — pending (will notify via Slack/Feishu once PR is open)
recipe submodule — not affected by this PR

Open Questions for Maintainers

The following decisions are not blocking this PR, but I will make more changes based on the answers.

Q1: Deprecation timeline — when to remove `mode` entirely?

The mode field is now Optional[str] = None with a DeprecationWarning so that this is not a breaking change. No behavior depends on it. Should we remove it now or in the next minor/major release?

Q2: Should example scripts and tutorial configs remove `mode: async` now?

I have not changed the examples and docs yet, but this is a straightforward change. Let me know if you want me to include it in this PR or defer to a follow-up.

Q3: Should the 10 GPU test files that set `mode = "async"` be cleaned up?

The following test files still set config.actor_rollout_ref.rollout.mode = "async" and will emit a DeprecationWarning in CI logs:

tests/workers/rollout/rollout_vllm/test_vllm_abort.py
tests/workers/rollout/perf/vllm_async_rollout.py
tests/workers/rollout/rollout_trtllm/test_adapter.py
tests/workers/rollout/rollout_trtllm/test_async_server.py
tests/experimental/agent_loop/test_basic_agent_loop.py
tests/experimental/agent_loop/test_multi_modal.py
tests/experimental/agent_loop/test_standalone_rollout.py
tests/experimental/reward_loop/test_agent_reward_loop_colocate.py
tests/experimental/reward_loop/test_agent_reward_loop_standalone.py
tests/experimental/reward_loop/test_math_verify.py

Additionally, tests/experimental/reward_loop/test_agent_reward_loop_colocate.py:83 has a dead code branch that still reads rollout.mode to select a worker class — this should be fixed regardless of the deprecation timeline. Happy to include it in this PR.

Q4: Phase 7 documentation — now or deferred?

docs/advance/fully_async.md and related docs still reference actor_rollout_ref.rollout.mode in example scripts. Should it be removed now or later?

The `mode` field in `RolloutConfig` only ever supported `"async"` (sync mode was already removed). This PR removes all internal branching on `mode`/`async_rollout_mode` and simplifies the codebase to always use the async code path. Changes: - `verl/workers/config/rollout.py`: change `mode: str = "async"` to `mode: Optional[str] = None`; emit `DeprecationWarning` in `__post_init__` when field is set, so old configs continue to work - `verl/workers/rollout/base.py`: simplify `_ROLLOUT_REGISTRY` keys from `(name, mode)` tuples to plain `name` strings; remove `mode` parameter from `get_rollout_class()` - `verl/workers/fsdp_workers.py`, `megatron_workers.py`, `engine_workers.py`, `checkpoint_engine/base.py`: update all `get_rollout_class(name, mode)` call sites to `get_rollout_class(name)` - `verl/trainer/ppo/ray_trainer.py`: remove `self.async_rollout_mode` assignment and dead conditional branches - `verl/experimental/fully_async_policy/`, `one_step_off_policy/`, `transfer_queue/`: remove `assert rollout.mode == "async"` guards and dead `async_rollout_mode` assignments - `verl/experimental/vla/rob_ray_trainer.py`, `sac/sac_ray_trainer.py`: remove `mode == "async_envloop"` check; always use `EnvLoop` - `verl/trainer/config/rollout/rollout.yaml`: remove `mode: async` default; generated reference YAMLs regenerated accordingly - `tests/experimental/agent_loop/agent_utils.py`: always use `AsyncActorRolloutRefWorker`; remove dead sync-mode branch - `tests/trainer/config/test_legacy_config_on_cpu.py`: add `"mode"` to `ignored_keys` to account for intentional field removal from schema Old configs with `mode: async` continue to load without errors and display a clear deprecation message guiding users to remove the field. Co-authored-by: Cursor <cursoragent@cursor.com>

CLAassistant · 2026-02-19T06:42:30Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request effectively removes the deprecated rollout.mode configuration field, simplifying the codebase by eliminating dead conditional branches. The changes are well-executed, replacing conditional logic with a single async code path and updating the configuration schema to issue a DeprecationWarning for backward compatibility. The refactoring is consistent, clean, and improves maintainability. Both the code review and security review found no critical or high-severity issues or vulnerabilities in the submitted changes.

vermouth1992 · 2026-02-19T13:51:19Z

Could you resolve conflict?

dhruvnigam93 · 2026-02-20T05:37:10Z

resolved conflict, was a minor whitespace issue.

dhruvnigam93 · 2026-02-21T02:19:17Z

@vermouth1992 please have a look

dhruvnigam93 requested review from ISEEKYAN, PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners February 19, 2026 06:42

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

Merge branch 'main' into feat/ENG-4604-remove-rollout-mode-config

09e2228

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[config] remove deprecated actor_rollout_ref.rollout.mode config field#5352

[config] remove deprecated actor_rollout_ref.rollout.mode config field#5352
dhruvnigam93 wants to merge 2 commits intoverl-project:mainfrom
dhruvnigam93:feat/ENG-4604-remove-rollout-mode-config

dhruvnigam93 commented Feb 19, 2026

Uh oh!

CLAassistant commented Feb 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

vermouth1992 commented Feb 19, 2026

Uh oh!

dhruvnigam93 commented Feb 20, 2026

Uh oh!

dhruvnigam93 commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

dhruvnigam93 commented Feb 19, 2026

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

High-level design

Specific changes

Checklist Before Submitting

Open Questions for Maintainers

Q1: Deprecation timeline — when to remove mode entirely?

Q2: Should example scripts and tutorial configs remove mode: async now?

Q3: Should the 10 GPU test files that set mode = "async" be cleaned up?

Q4: Phase 7 documentation — now or deferred?

Uh oh!

CLAassistant commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

vermouth1992 commented Feb 19, 2026

Uh oh!

dhruvnigam93 commented Feb 20, 2026

Uh oh!

dhruvnigam93 commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Q1: Deprecation timeline — when to remove `mode` entirely?

Q2: Should example scripts and tutorial configs remove `mode: async` now?

Q3: Should the 10 GPU test files that set `mode = "async"` be cleaned up?

CLAassistant commented Feb 19, 2026 •

edited

Loading