Bug Context
When running the code with exactly the hyperparameters specified in the paper, i.e. rollout batch_size=32 and ppo_minibatch_size=128, a ValueError is raised:
training_batch_size must be greater than ppo_mini_batch
The error originates from verl/workers/config/actor.py at line 198, caused by an incorrect validation check on line 197.
Root Cause
The current validation logic does not account for rollout_n, leading to a false positive error when using the paper's official hyperparameters.
Fix Suggestion
Modify line 197 in verl/workers/config/actor.py from the original check to:
if train_batch_size * self.rollout_n < self.ppo_mini_batch_size:
This corrects the validation to use the total effective batch size (accounting for rollout_n) when comparing against ppo_mini_batch_size.
Bug Context
When running the code with exactly the hyperparameters specified in the paper, i.e. rollout batch_size=32 and ppo_minibatch_size=128, a
ValueErroris raised:training_batch_size must be greater than ppo_mini_batchThe error originates from
verl/workers/config/actor.pyat line 198, caused by an incorrect validation check on line 197.Root Cause
The current validation logic does not account for
rollout_n, leading to a false positive error when using the paper's official hyperparameters.Fix Suggestion
Modify line 197 in
verl/workers/config/actor.pyfrom the original check to:This corrects the validation to use the total effective batch size (accounting for rollout_n) when comparing against
ppo_mini_batch_size.