Feature Request: Adding GenRM to Fully Async

### Feature request

Add GenRM capabilities to the fully async pipeline. 

FullyAsyncRollouter hardcodes self.use_rm = False ([code](https://github.com/verl-project/verl/blob/main/verl/experimental/fully_async_policy/fully_async_rollouter.py#L78)), preventing GenRM from being used with fully async GRPO. The sync pipeline already supports this via the reward loop infrastructure.

Happy to submit a PR.


### Motivation

Fully async GRPO currently only supports rule-based rewards. If you want to use a reward model that evaluates reasoning quality (GenRM / LLM-as-a-judge), there's no way to do it without modifying the source — even though the sync pipeline already supports it and the underlying infrastructure (reward loop, reward router, agent loop worker scoring) is all there. It's just not wired up.

I would prefer to manage my own judge model and not rely on external APIs.

### Your contribution

Happy to submit the PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Adding GenRM to Fully Async #5949

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Adding GenRM to Fully Async #5949

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions