Skip to content

Comments

[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter#5359

Open
denismegerle wants to merge 1 commit intoverl-project:mainfrom
denismegerle:public/fully-async-max-samples
Open

[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter#5359
denismegerle wants to merge 1 commit intoverl-project:mainfrom
denismegerle:public/fully-async-max-samples

Conversation

@denismegerle
Copy link
Contributor

@denismegerle denismegerle commented Feb 20, 2026

What does this PR do?

Make FullyAsyncRollouter honor existing data.train_max_samples / data.val_max_samples config by passing them through to create_rl_dataset(max_samples=...) (matching the behavior of the non-fully-async trainer paths).

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: https://github.com/verl-project/verl/pulls?q=is%3Apr+train_max_samples
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Ran formatting/type/lint suite via pre-commit:
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always

API and Usage Example

No API changes. This just makes fully-async respect existing data config keys.

# Example YAML config snippet:
# data:
#   train_max_samples: 100000  # default: -1 (no limit)
#   val_max_samples: 10000     # default: -1 (no limit)

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

  • verl/experimental/fully_async_policy/fully_async_rollouter.py: pass max_samples= when calling create_rl_dataset() for train/val.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

  • Read the Contribute Guide.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • (not required) Add / Update the documentation.
  • (not required) Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: small config plumbing in an experimental component; change is a thin pass-through and mirrors existing trainer wiring.
  • Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
  • (not required) If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully implements the plumbing for train_max_samples and val_max_samples in the FullyAsyncRollouter, bringing it closer to parity with the standard PPO trainer. However, there are a few issues regarding API consistency and potential runtime crashes when these new configuration options are used in conjunction with the use_trainer_do_validate flag.

Comment on lines +101 to +107
train_dataset = create_rl_dataset(
config.data.train_files,
config.data,
tokenizer,
processor,
max_samples=config.data.get("train_max_samples", -1),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency with the main PPO trainer path (verl/trainer/main_ppo.py) and to ensure the call matches the intended API usage, it is recommended to explicitly pass is_train=True when creating the training dataset.

        train_dataset = create_rl_dataset(
            config.data.train_files,
            config.data,
            tokenizer,
            processor,
            is_train=True,
            max_samples=config.data.get("train_max_samples", -1),
        )

Comment on lines +108 to +114
val_dataset = create_rl_dataset(
config.data.val_files,
config.data,
tokenizer,
processor,
max_samples=config.data.get("val_max_samples", -1),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to create_rl_dataset for the validation dataset is missing the is_train=False argument. While this parameter is currently unused in the default implementation of create_rl_dataset, it is part of the function signature and is explicitly passed in the main PPO trainer path. Including it ensures logical correctness and prevents potential issues if the dataset initialization logic is updated to rely on this flag in the future.

        val_dataset = create_rl_dataset(
            config.data.val_files,
            config.data,
            tokenizer,
            processor,
            is_train=False,
            max_samples=config.data.get("val_max_samples", -1),
        )

config.data,
tokenizer,
processor,
max_samples=config.data.get("val_max_samples", -1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting val_max_samples introduces a risk of runtime crashes when async_training.use_trainer_do_validate is enabled. At line 123, val_dataset.split(total_gpus) is called, and the RLHFDataset.split implementation raises a ValueError if the dataset size is not exactly divisible by the number of splits. If a user sets a val_max_samples value that isn't a multiple of the total GPU count, the training will fail during initialization. It is recommended to either document this constraint or ensure the dataset size is adjusted to be divisible by total_gpus before splitting.

@denismegerle denismegerle changed the title [rollout] fix: honor train_max_samples/val_max_samples in fully async rollouter [rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter Feb 20, 2026
@denismegerle denismegerle marked this pull request as ready for review February 20, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant