[Bug] Fix the problem of long inference timeouts when using Async rollout by U-rara · Pull Request #1483 · verl-project/verl

U-rara · 2025-05-12T02:55:09Z

Checklist Before Starting

Search for similar PR(s).

What does this PR do?

In Async rollout, AsyncOpenAI has a default 600-second timeout, which can lead to timeouts during longer inference. See details at #1138 (comment).

High-Level Design

See details at #1138 (comment).

Specific Changes

See details at #1138 (comment).

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

Issue Number: Fixes issue # or discussion # if any.
Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title if it breaks any API.
Update the documentation about your changes in the docs.
Add CI test(s) if neccessary.

casper-hansen · 2025-05-12T05:04:17Z

Great bugfix! I think a lot of users are likely to run into this use when using e.g. 32k completion length.

…lout (verl-project#1483) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? In Async rollout, `AsyncOpenAI` has a default 600-second timeout, which can lead to timeouts during longer inference. See details at verl-project#1138 (comment). ### High-Level Design See details at verl-project#1138 (comment). ### Specific Changes See details at verl-project#1138 (comment). ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if neccessary.

Update async_server.py

2dce1b7

wuxibin89 approved these changes May 12, 2025

View reviewed changes

wuxibin89 merged commit cb1adda into verl-project:main May 12, 2025
27 of 28 checks passed

casper-hansen mentioned this pull request May 26, 2025

[rollout] perf: replace AsyncOpenAI to aiohttp client in ChatCompletionScheduler #1588

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Fix the problem of long inference timeouts when using Async rollout#1483

[Bug] Fix the problem of long inference timeouts when using Async rollout#1483
wuxibin89 merged 1 commit intoverl-project:mainfrom
U-rara:bugfix-async-long-inference-timeout

U-rara commented May 12, 2025 •

edited

Loading

Uh oh!

casper-hansen commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

U-rara commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist Before Starting

What does this PR do?

High-Level Design

Specific Changes

API

Usage Example

Test

Additional Info.

Checklist Before Submitting

Uh oh!

casper-hansen commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

U-rara commented May 12, 2025 •

edited

Loading