Skip to content

Comments

[trainer] fix: address serialization issues when using async reward function and ray ppo trainer#3769

Merged
vermouth1992 merged 3 commits intoverl-project:mainfrom
benprofessionaledition:fix-async-reward
Oct 18, 2025
Merged

[trainer] fix: address serialization issues when using async reward function and ray ppo trainer#3769
vermouth1992 merged 3 commits intoverl-project:mainfrom
benprofessionaledition:fix-async-reward

Conversation

@benprofessionaledition
Copy link
Contributor

@benprofessionaledition benprofessionaledition commented Oct 15, 2025

What does this PR do?

This PR fixes a serialization bug that occurs in async reward functions, arising from relatively recent changes to implement caching of the reward function.

Checklist Before Starting

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

Tested on our internal cluster

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

n/a

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

This addresses a bug in the serialization behavior when using Ray. It seems that you cannot pass reward_fn, a partial function, to compute_reward_async because @ray.remote will attempt to serialize a partial function reference that is imported under the name custom_module, which is not present on the worker process. A minimal example to reproduce this issue is as follows:

# example_module.py

def reward_function(name: str) -> None:
    print(f"Hello, {name}")

and then a facsimile of the trainer code:

# main.py
from functools import partial
import importlib.util
import os
import sys
import ray

FILE_PATH = "example_module.py"
FUNCTION_NAME = "reward_function"


def _call_with_kwargs(raw_fn, extra_kwargs, *args, **kwargs):
    """
    cf. https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/reward.py#L33
    """
    merged_kwargs = {**kwargs, **extra_kwargs}
    return raw_fn(*args, **merged_kwargs)


def load_custom_reward_fn(file_path: str, function_name: str):
    """
    cf. https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/reward.py#L42
    """
    module = sys.modules.get("custom_module", None)
    if module is None:
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"Reward function file '{file_path}' not found.")

        spec = importlib.util.spec_from_file_location("custom_module", file_path)
        assert spec is not None
        module = importlib.util.module_from_spec(spec)
        try:
            sys.modules["custom_module"] = module
            assert spec.loader is not None
            spec.loader.exec_module(module)
        except Exception as e:
            raise RuntimeError(f"Error loading module from '{file_path}': {e}") from e

    if not hasattr(module, function_name):
        raise AttributeError(f"Reward function '{function_name}' not found in '{module.__file__}'.")

    print(f"using customized reward function '{function_name}' from '{module.__file__}'")
    raw_fn = getattr(module, function_name)

    reward_kwargs = {"name": "ben"}

    return partial(_call_with_kwargs, raw_fn, reward_kwargs)

@ray.remote(num_cpus=1)
def async_call_reward_fn(reward_fn=None):
    """
    cf.  https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/reward.py#L182
    """
    if reward_fn is None:
        load_custom_reward_fn() # this runs on the main process, not the child!
    reward_fn()


def test_get_reward_fn():
    myfn = load_custom_reward_fn(FILE_PATH, FUNCTION_NAME)
    print(f"{os.getpid()} calling reward function on main process")
    myfn()
    print(f"{os.getpid()} calling reward function on ray remote")
    async_call_reward_fn.remote(reward_fn=myfn)

if __name__ == "__main__":
    ray.init()
    test_get_reward_fn()

Running the above example is roughly the equivalent of running a command like this:

ray job submit --address=http://127.0.0.1:8265 -- python -m verl.trainer.main_ppo \
        --config-path=... \ 
        --config-name=... \ 
        ...
        custom_reward_function.path=/some/path/example_module.py \
        custom_reward_function.name=reward_function \
        reward_model.reward_manager=batch \
        reward_model.launch_reward_fn_async=True \
        ...

You will get the expected error message (identical to VERL's in real-world training):

(async_call_reward_fn pid=1442270) No module named 'custom_module'
(async_call_reward_fn pid=1442270) Traceback (most recent call last):
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 460, in deserialize_objects
(async_call_reward_fn pid=1442270)     obj = self._deserialize_object(data, metadata, object_ref)
(async_call_reward_fn pid=1442270)           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 317, in _deserialize_object
(async_call_reward_fn pid=1442270)     return self._deserialize_msgpack_data(data, metadata_fields)
(async_call_reward_fn pid=1442270)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 272, in _deserialize_msgpack_data
(async_call_reward_fn pid=1442270)     python_objects = self._deserialize_pickle5_data(pickle5_data)
(async_call_reward_fn pid=1442270)                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270)   File "/home/ben/miniconda3/lib/python3.12/site-packages/ray/_private/serialization.py", line 262, in _deserialize_pickle5_data
(async_call_reward_fn pid=1442270)     obj = pickle.loads(in_band)
(async_call_reward_fn pid=1442270)           ^^^^^^^^^^^^^^^^^^^^^
(async_call_reward_fn pid=1442270) ModuleNotFoundError: No module named 'custom_module'

By passing the config and tokenizer, as is done in the SPPO Ray Trainer recipe, instead of the reward_fn reference, we force re-initialization of this module so that it's available on the worker.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a serialization bug that occurs when using an asynchronous custom reward function with the Ray PPO trainer. The issue stems from ray.remote being unable to serialize a partial function that references a dynamically loaded module (custom_module). The fix correctly addresses this by no longer passing the reward_fn to the remote worker. Instead, it passes the config and tokenizer, allowing the reward function to be reconstructed on the worker, thus avoiding the serialization problem. The change is well-reasoned and directly solves the described issue. My only concern is that this fix relies on a code path that is marked as deprecated, which introduces a maintenance risk.

Comment on lines +1098 to +1100
future_reward = compute_reward_async.remote(
data=batch, config=self.config, tokenizer=self.tokenizer
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change correctly resolves the serialization error. However, by passing config and tokenizer, this code now relies on a path in compute_reward_async that is marked as deprecated with a warnings.warn in verl/trainer/ppo/reward.py. This introduces a maintenance risk, as the deprecated path could be removed in the future. Since this path is necessary to fix the bug, the deprecation warning seems incorrect and should probably be removed to avoid future confusion and potential breakage. This design inconsistency should be addressed to ensure long-term maintainability.

@vermouth1992
Copy link
Collaborator

@yyDing1 Please help review

@yyDing1
Copy link
Collaborator

yyDing1 commented Oct 15, 2025

I checked that the reuse of reward_fn comes from #2557.
This PR looks good to me, as I think the reward manager (including the reward function) should be initialized within the remote class to avoid serialization errors.

Also recommend that future implementations will deprecate this feature and directly support loading asynchronous reward functions for better efficiency (in progress).

@benprofessionaledition
Copy link
Contributor Author

correct @yyDing1, it was actually acknowledged originally in this PR conversation, but I guess it fell through the cracks as these things do.

Please let me know if there's anything I can do to expedite approval, it looks like the CI is failing due to infra errors?

@yyDing1
Copy link
Collaborator

yyDing1 commented Oct 16, 2025

@benprofessionaledition I've repeatedly tried running these CI workflows, but they still fail. I suspect that the CI on your base branch might be broken, so you may need to merge it with the latest branch first.

@benprofessionaledition
Copy link
Contributor Author

benprofessionaledition commented Oct 16, 2025

got it! I've merged current main into this; hopefully that solves it @yyDing1

@yyDing1
Copy link
Collaborator

yyDing1 commented Oct 17, 2025

@vermouth1992 This PR looks good to me and can be merged.

@vermouth1992 vermouth1992 merged commit f0539a5 into verl-project:main Oct 18, 2025
69 of 70 checks passed
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
wangboxiong320 pushed a commit to wangboxiong320/verl that referenced this pull request Nov 1, 2025
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 3, 2025
AlexJJ009 pushed a commit to AlexJJ009/verl that referenced this pull request Nov 5, 2025
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
chenhaiq pushed a commit to The-Hierophant/verl-1 that referenced this pull request Nov 18, 2025
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants