Skip to content

Feature Request: Adding GenRM to Fully Async #5949

@shayansadeghieh

Description

@shayansadeghieh

Feature request

Add GenRM capabilities to the fully async pipeline.

FullyAsyncRollouter hardcodes self.use_rm = False (code), preventing GenRM from being used with fully async GRPO. The sync pipeline already supports this via the reward loop infrastructure.

Happy to submit a PR.

Motivation

Fully async GRPO currently only supports rule-based rewards. If you want to use a reward model that evaluates reasoning quality (GenRM / LLM-as-a-judge), there's no way to do it without modifying the source — even though the sync pipeline already supports it and the underlying infrastructure (reward loop, reward router, agent loop worker scoring) is all there. It's just not wired up.

I would prefer to manage my own judge model and not rely on external APIs.

Your contribution

Happy to submit the PR.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions