[trainer] fix: address serialization issues when using async reward function and ray ppo trainer by benprofessionaledition · Pull Request #3769 · verl-project/verl

benprofessionaledition · 2025-10-15T03:02:21Z

What does this PR do?

This PR fixes a serialization bug that occurs in async reward functions, arising from relatively recent changes to implement caching of the reward function.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+custom_module https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+serialization+bug https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+async+reward+serialization+error
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

Tested on our internal cluster

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

n/a

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

This addresses a bug in the serialization behavior when using Ray. It seems that you cannot pass reward_fn, a partial function, to compute_reward_async because @ray.remote will attempt to serialize a partial function reference that is imported under the name custom_module, which is not present on the worker process. A minimal example to reproduce this issue is as follows:

# example_module.py

def reward_function(name: str) -> None:
    print(f"Hello, {name}")

and then a facsimile of the trainer code:

# main.py
from functools import partial
import importlib.util
import os
import sys
import ray

FILE_PATH = "example_module.py"
FUNCTION_NAME = "reward_function"


def _call_with_kwargs(raw_fn, extra_kwargs, *args, **kwargs):
    """
    cf. https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/reward.py#L33
    """
    merged_kwargs = {**kwargs, **extra_kwargs}
    return raw_fn(*args, **merged_kwargs)


def load_custom_reward_fn(file_path: str, function_name: str):
    """
    cf. https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/reward.py#L42
    """
    module = sys.modules.get("custom_module", None)
    if module is None:
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"Reward function file '{file_path}' not found.")

        spec = importlib.util.spec_from_file_location("custom_module", file_path)
        assert spec is not None
        module = importlib.util.module_from_spec(spec)
        try:
            sys.modules["custom_module"] = module
            assert spec.loader is not None
            spec.loader.exec_module(module)
        except Exception as e:
            raise RuntimeError(f"Error loading module from '{file_path}': {e}") from e

    if not hasattr(module, function_name):
        raise AttributeError(f"Reward function '{function_name}' not found in '{module.__file__}'.")

    print(f"using customized reward function '{function_name}' from '{module.__file__}'")
    raw_fn = getattr(module, function_name)

    reward_kwargs = {"name": "ben"}

    return partial(_call_with_kwargs, raw_fn, reward_kwargs)

@ray.remote(num_cpus=1)
def async_call_reward_fn(reward_fn=None):
    """
    cf.  https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/reward.py#L182
    """
    if reward_fn is None:
        load_custom_reward_fn() # this runs on the main process, not the child!
    reward_fn()


def test_get_reward_fn():
    myfn = load_custom_reward_fn(FILE_PATH, FUNCTION_NAME)
    print(f"{os.getpid()} calling reward function on main process")
    myfn()
    print(f"{os.getpid()} calling reward function on ray remote")
    async_call_reward_fn.remote(reward_fn=myfn)

if __name__ == "__main__":
    ray.init()
    test_get_reward_fn()

Running the above example is roughly the equivalent of running a command like this:

ray job submit --address=http://127.0.0.1:8265 -- python -m verl.trainer.main_ppo \
        --config-path=... \ 
        --config-name=... \ 
        ...
        custom_reward_function.path=/some/path/example_module.py \
        custom_reward_function.name=reward_function \
        reward_model.reward_manager=batch \
        reward_model.launch_reward_fn_async=True \
        ...

You will get the expected error message (identical to VERL's in real-world training):

(async_call_reward_fn pid=1442270) No module named 'custom_module'
(async_call_reward_fn pid=1442270) Traceback (most recent call last):
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 460, in deserialize_objects
(async_call_reward_fn pid=1442270)     obj = self._deserialize_object(data, metadata, object_ref)
(async_call_reward_fn pid=1442270)           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 317, in _deserialize_object
(async_call_reward_fn pid=1442270)     return self._deserialize_msgpack_data(data, metadata_fields)
(async_call_reward_fn pid=1442270)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 272, in _deserialize_msgpack_data
(async_call_reward_fn pid=1442270)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(async_call_reward_fn pid=1442270)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 262, in _deserialize_pickle5_data
(async_call_reward_fn pid=1442270)     obj = pickle.loads(in_band)
(async_call_reward_fn pid=1442270)           ^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270) ModuleNotFoundError: No module named 'custom_module'

By passing the config and tokenizer, as is done in the SPPO Ray Trainer recipe, instead of the reward_fn reference, we force re-initialization of this module so that it's available on the worker.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: I'm not sure how to add an appropriate test to e2e_ppo_trainer--it seems like if this were feasible then a test would already be there? But let me know, I'm willing to learn and add one
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request fixes a serialization bug that occurs when using an asynchronous custom reward function with the Ray PPO trainer. The issue stems from ray.remote being unable to serialize a partial function that references a dynamically loaded module (custom_module). The fix correctly addresses this by no longer passing the reward_fn to the remote worker. Instead, it passes the config and tokenizer, allowing the reward function to be reconstructed on the worker, thus avoiding the serialization problem. The change is well-reasoned and directly solves the described issue. My only concern is that this fix relies on a code path that is marked as deprecated, which introduces a maintenance risk.

gemini-code-assist · 2025-10-15T03:03:26Z

verl/trainer/ppo/ray_trainer.py

+                            future_reward = compute_reward_async.remote(
+                                data=batch, config=self.config, tokenizer=self.tokenizer
+                            )


This change correctly resolves the serialization error. However, by passing config and tokenizer, this code now relies on a path in compute_reward_async that is marked as deprecated with a warnings.warn in verl/trainer/ppo/reward.py. This introduces a maintenance risk, as the deprecated path could be removed in the future. Since this path is necessary to fix the bug, the deprecation warning seems incorrect and should probably be removed to avoid future confusion and potential breakage. This design inconsistency should be addressed to ensure long-term maintainability.

vermouth1992 · 2025-10-15T03:06:59Z

@yyDing1 Please help review

yyDing1 · 2025-10-15T08:10:01Z

I checked that the reuse of reward_fn comes from #2557.
This PR looks good to me, as I think the reward manager (including the reward function) should be initialized within the remote class to avoid serialization errors.

Also recommend that future implementations will deprecate this feature and directly support loading asynchronous reward functions for better efficiency (in progress).

benprofessionaledition · 2025-10-15T15:24:22Z

correct @yyDing1, it was actually acknowledged originally in this PR conversation, but I guess it fell through the cracks as these things do.

Please let me know if there's anything I can do to expedite approval, it looks like the CI is failing due to infra errors?

yyDing1 · 2025-10-16T12:58:42Z

@benprofessionaledition I've repeatedly tried running these CI workflows, but they still fail. I suspect that the CI on your base branch might be broken, so you may need to merge it with the latest branch first.

benprofessionaledition · 2025-10-16T16:07:37Z

got it! I've merged current main into this; hopefully that solves it @yyDing1

yyDing1 · 2025-10-17T05:33:53Z

@vermouth1992 This PR looks good to me and can be merged.

…unction and ray ppo trainer (verl-project#3769)

benprofessionaledition added 2 commits October 14, 2025 19:11

fix bug

05eba2e

ran precommit

4dc727b

benprofessionaledition requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners October 15, 2025 03:02

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fix-async-reward

ee7fe74

vermouth1992 approved these changes Oct 18, 2025

View reviewed changes

vermouth1992 merged commit f0539a5 into verl-project:main Oct 18, 2025
69 of 70 checks passed

vermouth1992 mentioned this pull request Oct 18, 2025

Revert "[trainer] fix: address serialization issues when using async reward function and ray ppo trainer" #3819

Closed

masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025

[trainer] fix: address serialization issues when using async reward f…

1dc7b88

…unction and ray ppo trainer (verl-project#3769)

wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025

[trainer] fix: address serialization issues when using async reward f…

0ab1901

…unction and ray ppo trainer (verl-project#3769)

NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 3, 2025

[trainer] fix: address serialization issues when using async reward f…

96c1160

…unction and ray ppo trainer (verl-project#3769)

AlexJJ009 pushed a commit to AlexJJ009/verl that referenced this pull request Nov 5, 2025

[trainer] fix: address serialization issues when using async reward f…

115a3ec

…unction and ray ppo trainer (verl-project#3769)

chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025

[trainer] fix: address serialization issues when using async reward f…

0c40925

…unction and ray ppo trainer (verl-project#3769)

chenhaiq pushed a commit to The-Hierophant/verl-1 that referenced this pull request Nov 18, 2025

[trainer] fix: address serialization issues when using async reward f…

1789990

…unction and ray ppo trainer (verl-project#3769)

NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025

[trainer] fix: address serialization issues when using async reward f…

a99f924

…unction and ray ppo trainer (verl-project#3769)

TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025

[trainer] fix: address serialization issues when using async reward f…

1401b4a

…unction and ray ppo trainer (verl-project#3769)

vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026

[trainer] fix: address serialization issues when using async reward f…

29038a0

…unction and ray ppo trainer (verl-project#3769)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[trainer] fix: address serialization issues when using async reward function and ray ppo trainer#3769

[trainer] fix: address serialization issues when using async reward function and ray ppo trainer#3769
vermouth1992 merged 3 commits intoverl-project:mainfrom
benprofessionaledition:fix-async-reward

benprofessionaledition commented Oct 15, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 15, 2025

Uh oh!

vermouth1992 commented Oct 15, 2025

Uh oh!

yyDing1 commented Oct 15, 2025 •

edited

Loading

Uh oh!

benprofessionaledition commented Oct 15, 2025

Uh oh!

yyDing1 commented Oct 16, 2025

Uh oh!

benprofessionaledition commented Oct 16, 2025 •

edited

Loading

Uh oh!

yyDing1 commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

benprofessionaledition commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 commented Oct 15, 2025

Uh oh!

yyDing1 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benprofessionaledition commented Oct 15, 2025

Uh oh!

yyDing1 commented Oct 16, 2025

Uh oh!

benprofessionaledition commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yyDing1 commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benprofessionaledition commented Oct 15, 2025 •

edited

Loading

yyDing1 commented Oct 15, 2025 •

edited

Loading

benprofessionaledition commented Oct 16, 2025 •

edited

Loading