Skip to content

[grpc] Refactor openai module#16511

Merged
slin1237 merged 4 commits intomainfrom
chang/resp-refactor-4
Jan 5, 2026
Merged

[grpc] Refactor openai module#16511
slin1237 merged 4 commits intomainfrom
chang/resp-refactor-4

Conversation

@CatherineSue
Copy link
Collaborator

@CatherineSue CatherineSue commented Jan 5, 2026

Motivation

openai router module has the following problems:

  1. route_responses() is 270 lines of inline logic (830-1097)
route_responses() in router.rs:
├── Metrics recording
├── Worker selection
├── Previous response chain loading (~30 lines)
├── Conversation history loading (~90 lines)
├── Payload transformation
├── Context building
└── Finally: dispatch to streaming/non-streaming

Compare to gRPC's route_responses_impl() (~50 lines):

// gRPC just validates and delegates
validate_worker_availability(&self.worker_registry, model);
if is_harmony {
    serve_harmony_responses(&harmony_ctx, body.clone()).await
} else {
    responses::route_responses(&self.responses_context, ...).await
}
  1. handle_non_streaming_response() is inline in router.rs (371-491)

This 120-line function should be in a separate file like Harmony's non_streaming.rs.

  1. Circular dependency between files:

accumulator.rs ──imports──> streaming.rs (extract_output_index, get_event_type)

These helpers should be in a common utilities file, not streaming.rs.

  1. Mixed responsibilities:
File What it has What it SHOULD have
router.rs Route + non-streaming handler + conversation loading Just routing (like gRPC)
responses.rs Tiny utils only (225 lines) Could be the entry point
streaming.rs Handlers + SSE helpers Just handlers
mcp.rs Non-streaming loop + streaming execution Tool execution only

Modifications

  1. Moved files from openai/ into openai/responses/ subdirectory
  2. Extracted handle_non_streaming_response() from router.rs to responses/non_streaming.rs
  3. Created common.rs with shared SSE helpers used by both streaming and accumulator
  4. Removed unused re-exports of Provider, ProviderError, ProviderRegistry

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments (/tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci) or contact authorized users to do so.
  4. After green CI and required approvals, ask Merge Oncalls to merge.

  Key Changes

  1. Moved files from openai/ into openai/responses/ subdirectory
  2. Extracted handle_non_streaming_response() from router.rs to responses/non_streaming.rs
  3. Created common.rs with shared SSE helpers used by both streaming and accumulator
  4. Removed unused re-exports of Provider, ProviderError, ProviderRegistry
  5. Updated documentation in openai_mcp_flow.md
@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@slin1237 slin1237 merged commit 5154140 into main Jan 5, 2026
68 of 69 checks passed
@slin1237 slin1237 deleted the chang/resp-refactor-4 branch January 5, 2026 19:39
jamesjxliu pushed a commit to jamesjxliu/sglang that referenced this pull request Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments