Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions verl/experimental/fully_async_policy/fully_async_rollouter.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,20 @@ def __init__(
from verl.trainer.main_ppo import create_rl_dataset, create_rl_sampler
from verl.utils.dataset.rl_dataset import collate_fn

train_dataset = create_rl_dataset(config.data.train_files, config.data, tokenizer, processor)
val_dataset = create_rl_dataset(config.data.val_files, config.data, tokenizer, processor)
train_dataset = create_rl_dataset(
config.data.train_files,
config.data,
tokenizer,
processor,
max_samples=config.data.get("train_max_samples", -1),
)
Comment on lines +101 to +107
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For consistency with the main PPO trainer path (verl/trainer/main_ppo.py) and to ensure the call matches the intended API usage, it is recommended to explicitly pass is_train=True when creating the training dataset.

        train_dataset = create_rl_dataset(
            config.data.train_files,
            config.data,
            tokenizer,
            processor,
            is_train=True,
            max_samples=config.data.get("train_max_samples", -1),
        )

val_dataset = create_rl_dataset(
config.data.val_files,
config.data,
tokenizer,
processor,
max_samples=config.data.get("val_max_samples", -1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting val_max_samples introduces a risk of runtime crashes when async_training.use_trainer_do_validate is enabled. At line 123, val_dataset.split(total_gpus) is called, and the RLHFDataset.split implementation raises a ValueError if the dataset size is not exactly divisible by the number of splits. If a user sets a val_max_samples value that isn't a multiple of the total GPU count, the training will fail during initialization. It is recommended to either document this constraint or ensure the dataset size is adjusted to be divisible by total_gpus before splitting.

)
Comment on lines +108 to +114
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to create_rl_dataset for the validation dataset is missing the is_train=False argument. While this parameter is currently unused in the default implementation of create_rl_dataset, it is part of the function signature and is explicitly passed in the main PPO trainer path. Including it ensures logical correctness and prevents potential issues if the dataset initialization logic is updated to rely on this flag in the future.

        val_dataset = create_rl_dataset(
            config.data.val_files,
            config.data,
            tokenizer,
            processor,
            is_train=False,
            max_samples=config.data.get("val_max_samples", -1),
        )

train_sampler = create_rl_sampler(config.data, train_dataset)

self._validate_config()
Expand Down