[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter by denismegerle · Pull Request #5359 · verl-project/verl

denismegerle · 2026-02-20T13:06:18Z

What does this PR do?

Make FullyAsyncRollouter honor existing data.train_max_samples / data.val_max_samples config by passing them through to create_rl_dataset(max_samples=...) (matching the behavior of the non-fully-async trainer paths).

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: https://github.com/verl-project/verl/pulls?q=is%3Apr+train_max_samples
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Ran formatting/type/lint suite via pre-commit:
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always

API and Usage Example

No API changes. This just makes fully-async respect existing data config keys.

# Example YAML config snippet:
# data:
#   train_max_samples: 100000  # default: -1 (no limit)
#   val_max_samples: 10000     # default: -1 (no limit)

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

verl/experimental/fully_async_policy/fully_async_rollouter.py: pass max_samples= when calling create_rl_dataset() for train/val.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
(not required) Add / Update the documentation.
(not required) Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: small config plumbing in an experimental component; change is a thin pass-through and mirrors existing trainer wiring.
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
(not required) If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

…ly async rollouter

gemini-code-assist

Code Review

The pull request successfully implements the plumbing for train_max_samples and val_max_samples in the FullyAsyncRollouter, bringing it closer to parity with the standard PPO trainer. However, there are a few issues regarding API consistency and potential runtime crashes when these new configuration options are used in conjunction with the use_trainer_do_validate flag.

gemini-code-assist · 2026-02-20T13:08:21Z

verl/experimental/fully_async_policy/fully_async_rollouter.py

+        train_dataset = create_rl_dataset(
+            config.data.train_files,
+            config.data,
+            tokenizer,
+            processor,
+            max_samples=config.data.get("train_max_samples", -1),
+        )


For consistency with the main PPO trainer path (verl/trainer/main_ppo.py) and to ensure the call matches the intended API usage, it is recommended to explicitly pass is_train=True when creating the training dataset.

train_dataset = create_rl_dataset( config.data.train_files, config.data, tokenizer, processor, is_train=True, max_samples=config.data.get("train_max_samples", -1), )

gemini-code-assist · 2026-02-20T13:08:21Z

verl/experimental/fully_async_policy/fully_async_rollouter.py

+        val_dataset = create_rl_dataset(
+            config.data.val_files,
+            config.data,
+            tokenizer,
+            processor,
+            max_samples=config.data.get("val_max_samples", -1),
+        )


The call to create_rl_dataset for the validation dataset is missing the is_train=False argument. While this parameter is currently unused in the default implementation of create_rl_dataset, it is part of the function signature and is explicitly passed in the main PPO trainer path. Including it ensures logical correctness and prevents potential issues if the dataset initialization logic is updated to rely on this flag in the future.

val_dataset = create_rl_dataset( config.data.val_files, config.data, tokenizer, processor, is_train=False, max_samples=config.data.get("val_max_samples", -1), )

gemini-code-assist · 2026-02-20T13:08:22Z

verl/experimental/fully_async_policy/fully_async_rollouter.py

+            config.data,
+            tokenizer,
+            processor,
+            max_samples=config.data.get("val_max_samples", -1),


Setting val_max_samples introduces a risk of runtime crashes when async_training.use_trainer_do_validate is enabled. At line 123, val_dataset.split(total_gpus) is called, and the RLHFDataset.split implementation raises a ValueError if the dataset size is not exactly divisible by the number of splits. If a user sets a val_max_samples value that isn't a multiple of the total GPU count, the training will fail during initialization. It is recommended to either document this constraint or ensure the dataset size is adjusted to be divisible by total_gpus before splitting.

[rollout] feat: add max_samples parameter to dataset creation for ful…

0430407

…ly async rollouter

gemini-code-assist bot reviewed Feb 20, 2026

View reviewed changes

denismegerle changed the title ~~[rollout] fix: honor train_max_samples/val_max_samples in fully async rollouter~~ [rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter Feb 20, 2026

denismegerle marked this pull request as ready for review February 20, 2026 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter#5359

[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter#5359
denismegerle wants to merge 1 commit intoverl-project:mainfrom
denismegerle:public/fully-async-max-samples

denismegerle commented Feb 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

gemini-code-assist bot Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

denismegerle commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

denismegerle commented Feb 20, 2026 •

edited

Loading