Skip to content

feat(gateway): add OpenAI Responses API endpoints#1656

Merged
ilblackdragon merged 2 commits intostagingfrom
feat/responses-api
Mar 27, 2026
Merged

feat(gateway): add OpenAI Responses API endpoints#1656
ilblackdragon merged 2 commits intostagingfrom
feat/responses-api

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

Summary

  • Add POST /v1/responses and GET /v1/responses/{id} to the web gateway implementing the OpenAI Responses API
  • Unlike the existing /v1/chat/completions (raw LLM proxy), this routes through the full agent loop — tools, memory, safety, and server-side conversation state
  • External clients can use standard OpenAI SDKs (client.responses.create()) to interact with IronClaw as an agent

Design decisions

  • Response IDs: resp_{uuid_simple} — stateless encoding of thread UUIDs, no lookup table
  • Multi-turn: previous_response_id decoded to thread UUID, enabling conversation continuity
  • Streaming: Background worker subscribes to AppEvent SSE broadcast, filters by thread, maps to Responses API event types (response.created, response.output_text.delta, response.output_item.added/done, response.completed)
  • Non-streaming: Subscribes before sending to agent, collects events with 120s timeout
  • Tool approval: Returns response.failed since Responses API has no interactive approval flow
  • GET endpoint: Reconstructs ResponseObject from conversation_messages in DB

Files changed

  • New: src/channels/web/responses_api.rs — types, handlers, event mapping, 14 unit tests
  • src/channels/web/mod.rs — module declaration
  • src/channels/web/server.rs — route registration

Test plan

  • Unit tests: ID round-trip, input extraction, thread filtering, accumulator (response, chunks, tools, errors, approval)
  • cargo clippy --all --all-features — zero warnings
  • cargo test responses_api — 14/14 pass
  • Manual: POST with stream: false, verify ResponseObject shape
  • Manual: POST with stream: true, verify SSE events
  • Manual: Multi-turn with previous_response_id
  • Manual: GET /v1/responses/{id} retrieves historical response

🤖 Generated with Claude Code

Add POST /v1/responses and GET /v1/responses/{id} to the web gateway,
implementing the OpenAI Responses API. Unlike the existing Chat
Completions proxy which passes through to the raw LLM, the Responses
API routes requests through the full agent loop — giving external
clients access to tools, memory, safety, and server-side conversation
state via a standard OpenAI-compatible interface.

Key design decisions:
- Response IDs encode thread UUIDs statelessly (resp_{uuid_simple})
- previous_response_id enables multi-turn conversations
- Streaming maps AppEvent variants to Responses API SSE events
- Tool approval returns response.failed (no interactive approval flow)
- GET endpoint reconstructs ResponseObject from conversation_messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 00:22
@github-actions github-actions Bot added scope: channel/web Web gateway channel size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 26, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the gateway's capabilities by introducing new API endpoints that conform to the OpenAI Responses API. This integration allows for more sophisticated interactions, enabling external applications to leverage the full power of the agent's tools, memory, and conversational state, moving beyond simple LLM proxies to a more intelligent, stateful interaction model.

Highlights

  • New OpenAI Responses API Endpoints: Added POST /v1/responses for creating new responses and GET /v1/responses/{id} for retrieving historical responses, aligning with the OpenAI Responses API specification.
  • Full Agent Loop Integration: Unlike the existing raw LLM proxy, these new endpoints route requests through the complete agent loop, incorporating tools, memory, safety, and server-side conversation state.
  • OpenAI SDK Compatibility: External clients can now seamlessly interact with IronClaw as an agent using standard OpenAI SDKs (e.g., client.responses.create()).
  • Advanced Conversation Management: Implemented stateless response IDs (resp_{uuid_simple}) and multi-turn conversation continuity via previous_response_id.
  • Streaming and Non-Streaming Support: Supports both streaming responses using Server-Sent Events (SSE) with a background worker and non-streaming responses with a 120-second timeout.
  • Tool Approval Handling: Tool approval requests are now handled by returning a response.failed status, as the Responses API does not support interactive approval flows.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new OpenAI Responses API, including POST /v1/responses and GET /v1/responses/{id} endpoints. This API routes requests through the full agent loop, providing access to tools, memory, and conversation state. Review comments suggest removing the unused Incomplete variant from ResponseStatus and addressing potential issues with call_id handling during the reconstruction of historical responses, specifically regarding the use of unwrap_or("") for FunctionCall and explicitly setting an empty string for FunctionCallOutput, which could lead to incorrect linking and loss of conversational context.

Completed,
Failed,
Incomplete,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Incomplete variant of ResponseStatus is defined but does not appear to be used anywhere in the current implementation. If this status is not intended for future use, it should be removed to avoid dead code and improve maintainability.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the Incomplete variant entirely.

Comment thread src/channels/web/responses_api.rs Outdated
Comment on lines +951 to +955
let name = call
.get("name")
.and_then(|v| v.as_str())
.unwrap_or("unknown")
.to_string();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call_id for FunctionCall is extracted using unwrap_or(""). If the id field is missing or malformed in the stored tool_calls JSON, it will result in an empty string. This could lead to difficulties in correctly linking FunctionCall and FunctionCallOutput items when reconstructing historical responses, potentially breaking the logical flow of the conversation. Consider handling this case more robustly, perhaps by logging a warning or returning an error if a valid call_id is critical.

References
  1. Using unwrap_or("") for a critical field like call_id is analogous to using a default value for a critical field. This can lead to silent failures and incorrect linking, as it prevents deserialization from failing when critical data is missing, which is generally safer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — now uses a fallback chain: call_idtool_call_idid, matching the actual storage format.

Comment thread src/channels/web/responses_api.rs Outdated
// Tool results reference a call_id via the name field pattern.
output.push(ResponseOutputItem::FunctionCallOutput {
id: make_item_id(),
call_id: String::new(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When reconstructing FunctionCallOutput from stored messages, the call_id is explicitly set to String::new(). This means that these outputs will not be correctly linked to their corresponding FunctionCall items. This can lead to a loss of conversational context and make it harder to understand the flow of tool executions when retrieving historical responses. If the call_id can be derived from the msg.name field as suggested by the comment, it should be used here.

References
  1. Explicitly setting call_id to String::new() for FunctionCallOutput can lead to incorrect attribution and a loss of conversational context, similar to the issues that arise from incorrect tool call attribution. It's crucial to ensure call_ids are correctly linked to maintain the flow of tool executions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the tool role reconstruction now looks up the preceding FunctionCall item's call_id to correlate them properly.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenAI-compatible Responses API endpoints to the web gateway, routing requests through IronClaw’s full agent loop (tools/memory/safety/state) instead of the existing raw LLM proxy.

Changes:

  • Register POST /v1/responses and GET /v1/responses/{id} routes on the gateway server.
  • Introduce responses_api.rs with request/response DTOs, thread/response ID encoding, SSE streaming mapper, and a non-streaming event accumulator.
  • Export the new module from src/channels/web/mod.rs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/channels/web/server.rs Adds routing for the new Responses API endpoints.
src/channels/web/mod.rs Exposes the new responses_api module.
src/channels/web/responses_api.rs Implements handlers, SSE streaming worker, event→Responses mapping, GET reconstruction, and unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/channels/web/responses_api.rs Outdated
Comment on lines +946 to +976
// Tool calls are stored as JSON arrays.
if let Ok(calls) =
serde_json::from_str::<Vec<serde_json::Value>>(&msg.content)
{
for call in calls {
let name = call
.get("name")
.and_then(|v| v.as_str())
.unwrap_or("unknown")
.to_string();
let call_id = call
.get("id")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let arguments = call
.get("arguments")
.map(|v| {
if v.is_string() {
v.as_str().unwrap_or("{}").to_string()
} else {
serde_json::to_string(v).unwrap_or_default()
}
})
.unwrap_or_default();
output.push(ResponseOutputItem::FunctionCall {
id: make_item_id(),
call_id,
name,
arguments,
});
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool-call rows are no longer always stored as a plain JSON array; the persistence layer can store an object like { "calls": [...], "narrative": "..." } (with legacy array support). This code only attempts Vec<Value> parsing and will silently drop tool calls for the wrapped format. Also, persisted tool-call IDs use call_id (and sometimes tool_call_id), but this reads id, so call_id will often be empty. Update parsing to handle both shapes and use the correct keys.

Suggested change
// Tool calls are stored as JSON arrays.
if let Ok(calls) =
serde_json::from_str::<Vec<serde_json::Value>>(&msg.content)
{
for call in calls {
let name = call
.get("name")
.and_then(|v| v.as_str())
.unwrap_or("unknown")
.to_string();
let call_id = call
.get("id")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let arguments = call
.get("arguments")
.map(|v| {
if v.is_string() {
v.as_str().unwrap_or("{}").to_string()
} else {
serde_json::to_string(v).unwrap_or_default()
}
})
.unwrap_or_default();
output.push(ResponseOutputItem::FunctionCall {
id: make_item_id(),
call_id,
name,
arguments,
});
// Tool calls may be stored as a raw JSON array (legacy) or as an
// object wrapper like `{ "calls": [...], "narrative": "..." }`.
if let Ok(value) =
serde_json::from_str::<serde_json::Value>(&msg.content)
{
// Helper to push function-call items from a set of call objects.
let mut push_calls = |calls: &[serde_json::Value]| {
for call in calls {
let name = call
.get("name")
.and_then(|v| v.as_str())
.unwrap_or("unknown")
.to_string();
// Prefer `call_id` (or `tool_call_id`) and fall back to `id`.
let call_id = call
.get("call_id")
.or_else(|| call.get("tool_call_id"))
.or_else(|| call.get("id"))
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let arguments = call
.get("arguments")
.map(|v| {
if v.is_string() {
v.as_str().unwrap_or("{}").to_string()
} else {
serde_json::to_string(v).unwrap_or_default()
}
})
.unwrap_or_default();
output.push(ResponseOutputItem::FunctionCall {
id: make_item_id(),
call_id,
name,
arguments,
});
}
};
match value {
serde_json::Value::Array(ref calls) => {
// Legacy format: top-level array of calls.
push_calls(calls);
}
serde_json::Value::Object(ref obj) => {
// New format: { "calls": [...], "narrative": "..." }.
if let Some(calls_val) = obj.get("calls").and_then(|v| v.as_array()) {
push_calls(calls_val);
}
if let Some(narrative) =
obj.get("narrative").and_then(|v| v.as_str())
{
if !narrative.is_empty() {
output.push(ResponseOutputItem::Message {
id: format!("msg_{}", msg.id.simple()),
role: "assistant".to_string(),
content: vec![MessageContent::OutputText {
text: narrative.to_string(),
}],
});
}
}
}
_ => {
// Unsupported shape; ignore to match previous behavior.
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — parsing now handles both the legacy plain array format and the { "calls": [...], "narrative": "..." } object wrapper. Also switched key lookup to call_id / tool_call_id / id fallback chain, and reads parameters alongside arguments.

Comment thread src/channels/web/responses_api.rs Outdated
Comment on lines +543 to +560
// Resolve or create thread.
let thread_uuid = match &req.previous_response_id {
Some(prev_id) => decode_response_id(prev_id).map_err(|e| {
api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")
})?,
None => Uuid::new_v4(),
};
let thread_id_str = thread_uuid.to_string();

// Build the message for the agent loop.
let msg = IncomingMessage::new("gateway", &user.user_id, &content)
.with_thread(&thread_id_str)
.with_metadata(serde_json::json!({
"thread_id": &thread_id_str,
"source": "responses_api",
}));

let resp_id = encode_response_id(&thread_uuid);
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When previous_response_id is None, this handler mints a fresh UUID and always sets it as thread_id. For the gateway channel, UUID-shaped thread IDs are expected to already exist/be owned (see agent hydration logic), otherwise the agent loop rejects the message as an invalid/unauthorized thread. Fix by ensuring the conversation row exists (e.g., store.ensure_conversation(thread_uuid, "gateway", user_id, None) and optionally set thread metadata) before sending the message, or avoid setting a UUID thread_id until after a thread is created via the existing new-thread flow.

Suggested change
// Resolve or create thread.
let thread_uuid = match &req.previous_response_id {
Some(prev_id) => decode_response_id(prev_id).map_err(|e| {
api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")
})?,
None => Uuid::new_v4(),
};
let thread_id_str = thread_uuid.to_string();
// Build the message for the agent loop.
let msg = IncomingMessage::new("gateway", &user.user_id, &content)
.with_thread(&thread_id_str)
.with_metadata(serde_json::json!({
"thread_id": &thread_id_str,
"source": "responses_api",
}));
let resp_id = encode_response_id(&thread_uuid);
// Resolve or reuse an existing thread.
let thread_uuid_opt = match &req.previous_response_id {
Some(prev_id) => Some(decode_response_id(prev_id).map_err(|e| {
api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")
})?),
None => None,
};
let thread_id_str = thread_uuid_opt
.as_ref()
.map(|u| u.to_string())
.unwrap_or_else(|| "".to_string());
// Build the message for the agent loop.
let mut msg = IncomingMessage::new("gateway", &user.user_id, &content)
.with_metadata(serde_json::json!({
"thread_id": if thread_id_str.is_empty() {
serde_json::Value::Null
} else {
serde_json::Value::String(thread_id_str.clone())
},
"source": "responses_api",
}));
if let Some(ref thread_uuid) = thread_uuid_opt {
msg = msg.with_thread(&thread_uuid.to_string());
}
// Each response gets its own ID, independent of any thread UUID.
let resp_uuid = Uuid::new_v4();
let resp_id = encode_response_id(&resp_uuid);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — response IDs now encode both a per-call response_uuid and the thread_uuid (resp_{resp_hex}{thread_hex}). Each POST produces a unique response ID. previous_response_id decodes the thread UUID for conversation continuity.

let msg = IncomingMessage::new("gateway", &user.user_id, &content)
.with_thread(&thread_id_str)
.with_metadata(serde_json::json!({
"thread_id": &thread_id_str,
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IncomingMessage metadata here does not include user_id. Gateway send_status() scopes events to a user only when metadata.user_id is present; otherwise it broadcasts globally. Because this endpoint relies on StreamChunk/tool status events, omitting user_id will leak those events to all connected users (and also deliver them to other users’ subscribe_raw(Some(..)) subscribers as global events). Include user_id in metadata (and keep thread_id) so status events are user-scoped.

Suggested change
"thread_id": &thread_id_str,
"thread_id": &thread_id_str,
"user_id": &user.user_id,

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added "user_id" to the IncomingMessage metadata so SSE events are properly user-scoped.

Comment on lines +504 to +517
fn finish(self) -> ResponseObject {
ResponseObject {
id: self.resp_id,
object: "response",
created_at: unix_timestamp(),
model: self.model,
status: if self.failed {
ResponseStatus::Failed
} else {
ResponseStatus::Completed
},
output: self.output,
usage: self.usage,
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseAccumulator tracks error_message, but it is never surfaced in ResponseObject (and finish() ignores it). This makes failed/time-out responses indistinguishable from other failures and drops useful diagnostics. Either remove error_message entirely or add an error/last_error field to ResponseObject (matching the Responses API shape) and populate it from accumulator state.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added error: Option<ResponseError> field to ResponseObject (with skip_serializing_if = "Option::is_none"). The accumulator's error_message is now surfaced in the final response.

Comment on lines +504 to +517
fn finish(self) -> ResponseObject {
ResponseObject {
id: self.resp_id,
object: "response",
created_at: unix_timestamp(),
model: self.model,
status: if self.failed {
ResponseStatus::Failed
} else {
ResponseStatus::Completed
},
output: self.output,
usage: self.usage,
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created_at is recomputed in finish() via unix_timestamp(), so the timestamp in response.created (streaming) and the final response.completed/failed object can differ. Responses API clients typically expect created_at to remain stable for a given response ID. Consider storing created_at once in the accumulator (or passing it in) and reusing it for all emitted ResponseObjects.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — created_at is now captured once in the accumulator constructor and reused for all emitted ResponseObjects (streaming and final).

Comment on lines +675 to +685
fn emit(
tx: &tokio::sync::mpsc::Sender<Event>,
evt_type: &str,
payload: &impl Serialize,
) -> bool {
if let Ok(data) = serde_json::to_string(payload) {
tx.try_send(sse_event(evt_type, &data)).is_ok()
} else {
true // serialization failure is non-fatal; keep going
}
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming worker uses try_send() and mostly ignores the return value from emit(). If the client disconnects (channel closed) or the buffer is full, the task will keep running until it hits a terminal event or the 120s timeout, wasting resources and potentially dropping important terminal events. Consider treating any failed send as a signal to stop the worker (or switch to send().await with backpressure).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — keeping the current try_send behavior for now since the SSE stream will naturally close when the client disconnects (the Receiver drop triggers cleanup). The 120s timeout caps resource usage. Will revisit if this causes issues in practice.

Comment thread src/channels/web/responses_api.rs Outdated
Comment on lines +771 to +795
AppEvent::ToolCompleted { name, parameters, .. } => {
if let Some(args) = parameters {
for item in acc.output.iter_mut().rev() {
if let ResponseOutputItem::FunctionCall {
name: n,
arguments: a,
..
} = item
&& *n == *name
&& a.is_empty()
{
*a = args.clone();
break;
}
}
}
if let Some(idx) = current_tool_index.take()
&& let Some(item) = acc.output.get(idx)
{
emit(&tx, "response.output_item.done", &ResponseStreamEvent::OutputItemDone {
output_index: idx,
item: item.clone(),
});
}
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In streaming mode, AppEvent::ToolCompleted is treated as “done” for the function_call item, but success/error are ignored. Non-streaming mode emits a function_call_output with an error on tool failure; streaming mode currently drops that information. Handle success == false by emitting a function_call_output item (or failing the response) so clients see tool failures consistently across streaming and non-streaming.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — streaming ToolCompleted now emits a FunctionCallOutput with the error when success == false, matching the non-streaming behavior.

Comment on lines +893 to +920
pub async fn get_response_handler(
State(state): State<Arc<GatewayState>>,
super::auth::AuthenticatedUser(_user): super::auth::AuthenticatedUser,
Path(id): Path<String>,
) -> Result<Json<ResponseObject>, ApiError> {
let thread_uuid = decode_response_id(&id).map_err(|e| {
api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")
})?;

let store = state.store.as_ref().ok_or_else(|| {
api_error(
StatusCode::SERVICE_UNAVAILABLE,
"Database not configured",
"server_error",
)
})?;

// Load messages for this conversation.
let messages = store
.list_conversation_messages(thread_uuid)
.await
.map_err(|e| {
api_error(
StatusCode::INTERNAL_SERVER_ERROR,
format!("Failed to load conversation: {e}"),
"server_error",
)
})?;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET /v1/responses/{id} loads conversation messages by UUID without verifying that the authenticated user owns that conversation. A guessed/obtained response ID would allow cross-user data access. Add an ownership check (e.g., store.conversation_belongs_to_user(thread_uuid, user_id)) and return 404/403 when not owned before calling list_conversation_messages.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added conversation_belongs_to_user() check before loading messages. Returns 404 (not 403) to avoid leaking existence of other users' conversations.

zmanian

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(gateway): add OpenAI Responses API endpoints

I agree with @zmanian's verdict — this won't compile against the current codebase. The compilation blockers (AppEvent vs SseEvent, subscribe_raw/RateLimiter::check signature mismatches, missing AuthenticatedUser extractor) are clear showstoppers. Below I focus on issues not yet raised by other reviewers or where I can add deeper analysis.


HIGH — H1: Response ID = Thread UUID creates a 1:1 coupling that breaks multi-turn

encode_response_id wraps the thread UUID, meaning response_id == thread_id. In OpenAI's Responses API, each POST /v1/responses returns a new response ID even within the same conversation. Here, calling POST /v1/responses with previous_response_id decodes back to the same thread UUID and re-encodes it — so response_id never changes across turns. This means:

  • GET /v1/responses/{id} always returns the entire conversation, not a single response
  • Clients expecting per-turn responses will see all messages from all turns
  • The previous_response_id field becomes meaningless (it decodes to the same thread)

Fix: Response IDs should be separate from thread IDs. Consider storing a per-response marker (e.g., message ID or sequence number) alongside the thread UUID in the encoded ID, or generate a separate response UUID and persist its mapping to the thread + message range.

HIGH — H2: Non-streaming handler holds SSE subscription for up to 120s

handle_non_streaming subscribes to the SSE broadcast and blocks for up to RESPONSE_TIMEOUT (120s). This holds one of the 100 total SSE connection slots (the same pool used by browser clients and WebSocket connections). A few concurrent non-streaming API calls could exhaust the connection pool, blocking the web UI.

Fix: Either use a separate event channel for API consumers that doesn't count against the SSE connection limit, or increase the limit, or document this constraint prominently.

MEDIUM — M1: tools, tool_choice, and instructions fields silently ignored

ResponsesRequest accepts tools, tool_choice, and instructions but create_response_handler never passes them to the agent loop. This is a silent contract violation — clients sending tool definitions or system prompt overrides will believe they're being used. instructions is particularly important since it's the system prompt override in the Responses API, and silently dropping it will cause hard-to-debug behavior.

Fix: Either reject requests containing these fields with a clear error (400 + "not yet supported"), or pass instructions through as a system message prepended to the content.

MEDIUM — M2: Streaming emit() silently swallows serialization failures and disconnects

The inner emit() function returns true (keep going) when serialization fails. Combined with try_send (which also silently drops on buffer full), the streaming path has no mechanism to signal data loss to the client. If the client disconnects, the worker runs until a terminal event or the 120s timeout, wasting resources.

LOW — L1: last_call_id_for returns empty string on miss

If ToolResult arrives before ToolStarted (out-of-order events), the output will have call_id: "" that can never be correlated. This affects both the live accumulator and the GET handler reconstruction paths.

LOW — L2: Incomplete status variant is dead code

ResponseStatus::Incomplete is defined but never set. Add a TODO or remove it.

LOW — L3: GET handler hardcodes model: "default"

The model used for the original request isn't persisted, so reconstruction always returns "default". OpenAI clients may key behavior on the model field.

LOW — L4: No DELETE /v1/responses/{id} endpoint

The OpenAI Responses API includes a DELETE endpoint. Fine to skip in v1, but worth documenting as a known gap.


Verdict

The compilation blockers identified by @zmanian are the primary blocker. Beyond those, the response_id == thread_id coupling (H1) is an architectural issue that will be harder to fix later, and the SSE connection pool exhaustion risk (H2) could impact production stability. I'd recommend addressing H1 and H2 alongside the compilation fixes before merge.

@ilblackdragon
Copy link
Copy Markdown
Member Author

Thanks @zmanian and @serrrfirat for the thorough reviews. Addressed all feedback:

Re: compilation blockers (zmanian #1-4) — After investigation, the code was actually targeting the correct types (AppEvent, subscribe_raw(Option<String>), AuthenticatedUser, PerUserRateLimiter::check(&str)) — these match the current codebase. The CI failures were purely formatting.

Fixes in this update:

Issue Fix
H1: response_id == thread_id (serrrfirat) Response IDs now encode both a per-call response_uuid and the thread_uuid. Each POST produces a unique ID. previous_response_id correctly decodes the thread UUID for continuity.
H5/H6/M1: instructions/tools/tool_choice silently ignored Now returns 400 with clear error if any of these unsupported fields are present
M2: Streaming emit() swallows errors (serrrfirat) Acknowledged — keeping try_send for now; 120s timeout + Receiver drop caps resource usage
Tool call parsing (Copilot) Handles both legacy array and {"calls":[...]} object wrapper. Uses call_id/tool_call_id/id key fallback chain
Missing user_id in metadata (Copilot) Added to metadata for user-scoped SSE events
GET ownership check (Copilot) Added conversation_belongs_to_user() before loading messages; returns 404
created_at recomputed (Copilot) Stored once in accumulator, reused for all emitted ResponseObjects
error_message never surfaced (Copilot) Added error: Option<ResponseError> field to ResponseObject
Streaming tool failures (Copilot) Streaming ToolCompleted now emits FunctionCallOutput with error on success == false
Incomplete dead variant (Gemini) Removed
tool role empty call_id (Gemini/Copilot) GET reconstruction correlates tool results to preceding FunctionCall

Known gaps (v1): DELETE /v1/responses/{id} not implemented; GET returns full conversation rather than per-turn scoped output (would need a responses table for proper per-turn scoping).

zmanian

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction & Re-review: OpenAI Responses API endpoints

Mea culpa on the compilation blockers. My previous two reviews incorrectly claimed AppEvent, AuthenticatedUser, subscribe_raw(user_id), and PerUserRateLimiter::check(user_id) don't exist. They do -- they were added by recently merged PRs (#1614 TenantCtx, #1615 AppEvent extraction, #1626 user management). I verified: the branch compiles cleanly, clippy reports zero warnings, and all 14 tests pass. I apologize for the noise.

Actual issues in the code (fresh review of the real implementation):

High

  1. response_id is 1:1 with thread_id: encode_response_id does format!("resp_{}", thread_id.simple()). Every POST to the same thread returns the same response ID. OpenAI's API returns a unique response ID per call. This means clients can't distinguish between different responses on the same thread, and the GET endpoint returns the entire conversation history rather than a specific response.

  2. GET endpoint has no ownership check (get_response_handler): _user is extracted but unused (underscore prefix). list_conversation_messages(thread_uuid) loads any conversation without verifying it belongs to the authenticated user. Cross-user data access via guessed UUID.

  3. instructions, tools, tool_choice silently ignored: These fields are accepted and deserialized but never validated, rejected, or forwarded. Callers who set instructions (system prompt override) will get unexpected behavior with no error.

Medium

  1. GET always returns Completed status (line 1000): Regardless of whether the conversation is still in progress, failed, or has a pending approval.

  2. GET tool role messages have empty call_id (line 984): call_id: String::new() hardcoded. Tool results don't correlate back to their function calls.

  3. GET only parses legacy Vec<Value> for tool_calls: The persistence layer can store wrapped format { "calls": [...], "narrative": "..." } (from #1513). This parsing only tries Vec<Value>, silently dropping tool calls in the wrapped format.

  4. created_at computed fresh in finish() (line 508): Each call to unix_timestamp() gets a different value. The response created_at should be captured once when the response is created, not when it completes.

  5. error_message accumulated but not surfaced: ResponseObject has no error field. The accumulator tracks errors but finish() doesn't include them in the output. Failed responses are indistinguishable from successful empty ones (except by status).

  6. Streaming worker ignores client disconnect: emit() uses try_send() and continues on failure. If the client disconnects, the worker runs until timeout (120s), wasting resources.

Low

  1. Incomplete variant is dead code (line 116): Defined but never constructed. Will cause clippy warnings under zero-warnings policy.

  2. GET hardcodes model: "default" and usage: ResponseUsage::default(): No model or token info from persisted data.

  3. Streaming ToolCompleted doesn't handle success == false: Non-streaming path correctly emits FunctionCallOutput with error. Streaming path only updates arguments and emits output_item.done, dropping failure info.

What's good

  • Clean architecture: subscribing to SSE broadcast, filtering by thread, accumulating into OpenAI format
  • Proper timeout handling in both streaming and non-streaming paths
  • ApprovalNeeded correctly treated as failure with descriptive message
  • Good test coverage for pure logic (14 tests passing)
  • Streaming SSE format matches OpenAI's response.created -> response.output_text.delta -> response.completed lifecycle
  • Rate limiting and auth properly applied

Verdict

The code compiles and the design is sound. Items 1-3 (response ID uniqueness, ownership check, silent field ignoring) should be fixed before merge. The rest can be follow-ups.

Copy link
Copy Markdown
Contributor

@pranavraja99 pranavraja99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comment below

.with_metadata(serde_json::json!({
"thread_id": &thread_id_str,
"source": "responses_api",
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model is echoed in the response but never forwarded to the agent loop — IncomingMessage has no model field. Abound will think model selection works but the server always uses its configured LLM.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 08ab35e — non-"default" model values are now rejected with 400. IncomingMessage has no model field so we can't forward it; rejecting explicitly is the honest contract until model selection is wired through.

#[serde(default)]
pub stream: Option<bool>,
#[serde(default)]
pub temperature: Option<f32>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temperature and max_output_tokens are also silently ignored, same as instructions/tools/tool_choice. Reject all unsupported fields with 400 for consistency.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 08ab35etemperature and max_output_tokens now also return 400, consistent with instructions/tools/tool_choice.

},
output: self.output,
usage: self.usage,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_message is accumulated but finish() drops it — ResponseObject has no error field. Failed responses give Abound no diagnostic info.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 08ab35e — added error: Option<ResponseError> to ResponseObject, populated from accumulator error_message in finish().

@ilblackdragon ilblackdragon changed the base branch from main to staging March 26, 2026 19:44
- Decouple response ID from thread ID: encode both a per-call
  response_uuid and the thread_uuid so each POST produces a unique ID
- Reject unsupported fields (instructions, tools, tool_choice,
  temperature, max_output_tokens, non-default model) with 400
- Add user_id to IncomingMessage metadata for user-scoped SSE events
- Add conversation_belongs_to_user() ownership check on GET endpoint
- Fix tool call parsing: handle both legacy array and object wrapper
  format; use call_id/tool_call_id/id key fallback chain
- Correlate tool role messages to preceding FunctionCall call_id
- Stabilize created_at (capture once in accumulator, reuse everywhere)
- Surface error_message via new ResponseObject.error field
- Handle streaming tool failures (emit FunctionCallOutput on error)
- Remove dead Incomplete status variant
- Fix formatting (cargo fmt)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ilblackdragon
Copy link
Copy Markdown
Member Author

@zmanian Thanks for the thorough correction and fresh re-review. All items from your actual-issues list are addressed in 08ab35e:

# Issue Status
1 response_id 1:1 with thread_id Fixed — encodes both response_uuid + thread_uuid per POST
2 GET no ownership check Fixedconversation_belongs_to_user() before loading messages
3 instructions/tools/tool_choice silently ignored Fixed — returns 400 (also temperature, max_output_tokens, non-default model)
4 GET always returns Completed Known gap — status not persisted per-response
5 tool role empty call_id Fixed — correlates to preceding FunctionCall
6 Tool calls: only array format Fixed — handles both array and {"calls":[...]} wrapper
7 created_at recomputed Fixed — stored once in accumulator
8 error_message not surfaced Fixed — new error field on ResponseObject
9 Streaming emit() swallows disconnect Acknowledged — keeping try_send + 120s timeout for v1
10 Incomplete dead variant Fixed — removed
11 GET hardcodes model/usage Known gap — not persisted
12 Streaming tool failure not handled Fixed — emits FunctionCallOutput on success == false

@ilblackdragon
Copy link
Copy Markdown
Member Author

@serrrfirat All actionable items from your review are addressed in 08ab35e:

  • H1 (response_id == thread_id): Fixedencode_response_id now takes both a per-call response_uuid and the thread_uuid, producing unique IDs per POST. New test response_ids_differ_across_turns verifies.
  • H2 (SSE slot exhaustion): Acknowledged as a v1 limitation — requires a separate event channel architecture, out of scope for this PR.
  • M1 (instructions/tools/tool_choice silently ignored): Fixed — all unsupported fields (including temperature, max_output_tokens, non-default model) now return 400.
  • M2 (streaming emit swallows disconnects): Acknowledged — keeping try_send + 120s timeout for v1.
  • L1 (last_call_id_for empty on miss): Accepted — out-of-order events aren't expected from the agent loop.
  • L2 (Incomplete dead variant): Fixed — removed.
  • L3 (GET hardcodes model): Known gap — model not persisted per-response.
  • L4 (no DELETE endpoint): Known v1 gap.

Re-requesting your review.

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: OpenAI Responses API (post 08ab35e)

9 of 11 findings fully fixed. Nice work.

# Finding Status
1 response_id uniqueness Fixed -- UUID per call, dual-encoded
2 GET ownership check Fixed -- conversation_belongs_to_user
3 Unsupported fields rejected Fixed -- 400 errors for instructions/tools/tool_choice
4 GET always returns Completed Not fixed -- hardcoded line 1214. Add a TODO.
5 GET empty call_id Fixed -- fallback chain
6 Legacy-only tool_calls parsing Fixed -- handles wrapped format
7 created_at drift Fixed -- captured once at construction
8 error_message not surfaced Fixed -- ResponseError on ResponseObject
9 Streaming disconnect handling Partially fixed -- initial emit only; main loop still runs on disconnect
10 Incomplete dead code Fixed -- removed
11 Streaming tool failure Fixed -- emits FunctionCallOutput with error

Remaining items are medium/low and can be follow-ups. Approve.

@ilblackdragon ilblackdragon merged commit 9c5ba43 into staging Mar 27, 2026
14 checks passed
@ilblackdragon ilblackdragon deleted the feat/responses-api branch March 27, 2026 07:00
DougAnderson444 pushed a commit to DougAnderson444/ironclaw that referenced this pull request Mar 29, 2026
* feat(gateway): add OpenAI Responses API endpoints

Add POST /v1/responses and GET /v1/responses/{id} to the web gateway,
implementing the OpenAI Responses API. Unlike the existing Chat
Completions proxy which passes through to the raw LLM, the Responses
API routes requests through the full agent loop — giving external
clients access to tools, memory, safety, and server-side conversation
state via a standard OpenAI-compatible interface.

Key design decisions:
- Response IDs encode thread UUIDs statelessly (resp_{uuid_simple})
- previous_response_id enables multi-turn conversations
- Streaming maps AppEvent variants to Responses API SSE events
- Tool approval returns response.failed (no interactive approval flow)
- GET endpoint reconstructs ResponseObject from conversation_messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(responses-api): address all review feedback on PR nearai#1656

- Decouple response ID from thread ID: encode both a per-call
  response_uuid and the thread_uuid so each POST produces a unique ID
- Reject unsupported fields (instructions, tools, tool_choice,
  temperature, max_output_tokens, non-default model) with 400
- Add user_id to IncomingMessage metadata for user-scoped SSE events
- Add conversation_belongs_to_user() ownership check on GET endpoint
- Fix tool call parsing: handle both legacy array and object wrapper
  format; use call_id/tool_call_id/id key fallback chain
- Correlate tool role messages to preceding FunctionCall call_id
- Stabilize created_at (capture once in accumulator, reuse everywhere)
- Surface error_message via new ResponseObject.error field
- Handle streaming tool failures (emit FunctionCallOutput on error)
- Remove dead Incomplete status variant
- Fix formatting (cargo fmt)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
* feat(gateway): add OpenAI Responses API endpoints

Add POST /v1/responses and GET /v1/responses/{id} to the web gateway,
implementing the OpenAI Responses API. Unlike the existing Chat
Completions proxy which passes through to the raw LLM, the Responses
API routes requests through the full agent loop — giving external
clients access to tools, memory, safety, and server-side conversation
state via a standard OpenAI-compatible interface.

Key design decisions:
- Response IDs encode thread UUIDs statelessly (resp_{uuid_simple})
- previous_response_id enables multi-turn conversations
- Streaming maps AppEvent variants to Responses API SSE events
- Tool approval returns response.failed (no interactive approval flow)
- GET endpoint reconstructs ResponseObject from conversation_messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(responses-api): address all review feedback on PR nearai#1656

- Decouple response ID from thread ID: encode both a per-call
  response_uuid and the thread_uuid so each POST produces a unique ID
- Reject unsupported fields (instructions, tools, tool_choice,
  temperature, max_output_tokens, non-default model) with 400
- Add user_id to IncomingMessage metadata for user-scoped SSE events
- Add conversation_belongs_to_user() ownership check on GET endpoint
- Fix tool call parsing: handle both legacy array and object wrapper
  format; use call_id/tool_call_id/id key fallback chain
- Correlate tool role messages to preceding FunctionCall call_id
- Stabilize created_at (capture once in accumulator, reuse everywhere)
- Surface error_message via new ResponseObject.error field
- Handle streaming tool failures (emit FunctionCallOutput on error)
- Remove dead Incomplete status variant
- Fix formatting (cargo fmt)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/web Web gateway channel size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants