Skip to content

fix: retry server_is_overloaded for OpenAI Responses stream failures#3040

Merged
tusharmath merged 7 commits intomainfrom
fix/openai-responses-overload-retryable
Apr 17, 2026
Merged

fix: retry server_is_overloaded for OpenAI Responses stream failures#3040
tusharmath merged 7 commits intomainfrom
fix/openai-responses-overload-retryable

Conversation

@amitksingh1490
Copy link
Copy Markdown
Contributor

Summary

Make OpenAI Responses stream overload failures retryable by preserving structured upstream error codes and classifying server_is_overloaded as retryable.

Context

OpenAI Responses streaming failures were being surfaced as formatted strings (Upstream response failed: ...), which dropped structured provider error metadata. Because of that, retry classification could not reliably identify the server_is_overloaded code from stream failures and the request was not retried.

Changes

  • Preserved response.failed stream errors as structured OpenAI DTO errors instead of debug-only strings.
  • Added OpenAI overloaded error classification in retry mapping for server_is_overloaded.
  • Kept transport-code retry matching centralized with TRANSPORT_ERROR_CODES.
  • Added regression tests for both stream error-code preservation and retryability behavior.

Key Implementation Details

  • Added into_response_failed_error(...) to convert ResponseFailedEvent into forge_app::dto::openai::Error::Response, retaining code and message fields from upstream payloads.
  • Updated stream event handling to use structured conversion for ResponseFailed events.
  • Extended retry logic with a typed RetryableApiErrorCode classifier and recursive code lookup through nested OpenAI error payloads.
  • Added explicit overloaded-code constant (server_is_overloaded) and transport constant usage (TRANSPORT_ERROR_CODES).

Use Cases

  • If OpenAI Responses stream emits:
    • code: "server_is_overloaded"
    • message: "Our servers are currently overloaded. Please try again later."
      then Forge now marks the failure as retryable.
  • Existing transport retry behavior for ERR_STREAM_PREMATURE_CLOSE, ECONNRESET, and ETIMEDOUT remains intact.

Testing

cargo test -p forge_repo test_stream_with_response_failed_preserves_error_code
cargo test -p forge_repo test_openai_server_overloaded_error_is_retryable
cargo test -p forge_repo test_into_retry_with_transport_errors

Links

  • Related issues: N/A

Preserve structured OpenAI Responses response.failed errors so retry logic can classify server_is_overloaded as retryable. Also keep transport code matching centralized via TRANSPORT_ERROR_CODES and add regression tests.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
@github-actions github-actions Bot added the type: fix Iterations on existing features or infrastructure. label Apr 16, 2026
@tusharmath tusharmath enabled auto-merge (squash) April 17, 2026 05:20
@tusharmath tusharmath merged commit 782a6a1 into main Apr 17, 2026
8 checks passed
@tusharmath tusharmath deleted the fix/openai-responses-overload-retryable branch April 17, 2026 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: fix Iterations on existing features or infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants