feat(gateway): add OpenAI Responses API endpoints by ilblackdragon · Pull Request #1656 · nearai/ironclaw

ilblackdragon · 2026-03-26T00:22:36Z

Summary

Add POST /v1/responses and GET /v1/responses/{id} to the web gateway implementing the OpenAI Responses API
Unlike the existing /v1/chat/completions (raw LLM proxy), this routes through the full agent loop — tools, memory, safety, and server-side conversation state
External clients can use standard OpenAI SDKs (client.responses.create()) to interact with IronClaw as an agent

Design decisions

Response IDs: resp_{uuid_simple} — stateless encoding of thread UUIDs, no lookup table
Multi-turn: previous_response_id decoded to thread UUID, enabling conversation continuity
Streaming: Background worker subscribes to AppEvent SSE broadcast, filters by thread, maps to Responses API event types (response.created, response.output_text.delta, response.output_item.added/done, response.completed)
Non-streaming: Subscribes before sending to agent, collects events with 120s timeout
Tool approval: Returns response.failed since Responses API has no interactive approval flow
GET endpoint: Reconstructs ResponseObject from conversation_messages in DB

Files changed

New: src/channels/web/responses_api.rs — types, handlers, event mapping, 14 unit tests
src/channels/web/mod.rs — module declaration
src/channels/web/server.rs — route registration

Test plan

Unit tests: ID round-trip, input extraction, thread filtering, accumulator (response, chunks, tools, errors, approval)
cargo clippy --all --all-features — zero warnings
cargo test responses_api — 14/14 pass
Manual: POST with stream: false, verify ResponseObject shape
Manual: POST with stream: true, verify SSE events
Manual: Multi-turn with previous_response_id
Manual: GET /v1/responses/{id} retrieves historical response

🤖 Generated with Claude Code

Add POST /v1/responses and GET /v1/responses/{id} to the web gateway, implementing the OpenAI Responses API. Unlike the existing Chat Completions proxy which passes through to the raw LLM, the Responses API routes requests through the full agent loop — giving external clients access to tools, memory, safety, and server-side conversation state via a standard OpenAI-compatible interface. Key design decisions: - Response IDs encode thread UUIDs statelessly (resp_{uuid_simple}) - previous_response_id enables multi-turn conversations - Streaming maps AppEvent variants to Responses API SSE events - Tool approval returns response.failed (no interactive approval flow) - GET endpoint reconstructs ResponseObject from conversation_messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-03-26T00:22:53Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the gateway's capabilities by introducing new API endpoints that conform to the OpenAI Responses API. This integration allows for more sophisticated interactions, enabling external applications to leverage the full power of the agent's tools, memory, and conversational state, moving beyond simple LLM proxies to a more intelligent, stateful interaction model.

Highlights

New OpenAI Responses API Endpoints: Added POST /v1/responses for creating new responses and GET /v1/responses/{id} for retrieving historical responses, aligning with the OpenAI Responses API specification.
Full Agent Loop Integration: Unlike the existing raw LLM proxy, these new endpoints route requests through the complete agent loop, incorporating tools, memory, safety, and server-side conversation state.
OpenAI SDK Compatibility: External clients can now seamlessly interact with IronClaw as an agent using standard OpenAI SDKs (e.g., client.responses.create()).
Advanced Conversation Management: Implemented stateless response IDs (resp_{uuid_simple}) and multi-turn conversation continuity via previous_response_id.
Streaming and Non-Streaming Support: Supports both streaming responses using Server-Sent Events (SSE) with a background worker and non-streaming responses with a 120-second timeout.
Tool Approval Handling: Tool approval requests are now handled by returning a response.failed status, as the Responses API does not support interactive approval flows.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new OpenAI Responses API, including POST /v1/responses and GET /v1/responses/{id} endpoints. This API routes requests through the full agent loop, providing access to tools, memory, and conversation state. Review comments suggest removing the unused Incomplete variant from ResponseStatus and addressing potential issues with call_id handling during the reconstruction of historical responses, specifically regarding the use of unwrap_or("") for FunctionCall and explicitly setting an empty string for FunctionCallOutput, which could lead to incorrect linking and loss of conversational context.

gemini-code-assist · 2026-03-26T00:26:32Z

+    Completed,
+    Failed,
+    Incomplete,
+}


The Incomplete variant of ResponseStatus is defined but does not appear to be used anywhere in the current implementation. If this status is not intended for future use, it should be removed to avoid dead code and improve maintainability.

Fixed — removed the Incomplete variant entirely.

gemini-code-assist · 2026-03-26T00:26:32Z

+                        let name = call
+                            .get("name")
+                            .and_then(|v| v.as_str())
+                            .unwrap_or("unknown")
+                            .to_string();


The call_id for FunctionCall is extracted using unwrap_or(""). If the id field is missing or malformed in the stored tool_calls JSON, it will result in an empty string. This could lead to difficulties in correctly linking FunctionCall and FunctionCallOutput items when reconstructing historical responses, potentially breaking the logical flow of the conversation. Consider handling this case more robustly, perhaps by logging a warning or returning an error if a valid call_id is critical.

References

Using unwrap_or("") for a critical field like call_id is analogous to using a default value for a critical field. This can lead to silent failures and incorrect linking, as it prevents deserialization from failing when critical data is missing, which is generally safer.

Fixed — now uses a fallback chain: call_id → tool_call_id → id, matching the actual storage format.

gemini-code-assist · 2026-03-26T00:26:32Z

+                // Tool results reference a call_id via the name field pattern.
+                output.push(ResponseOutputItem::FunctionCallOutput {
+                    id: make_item_id(),
+                    call_id: String::new(),


When reconstructing FunctionCallOutput from stored messages, the call_id is explicitly set to String::new(). This means that these outputs will not be correctly linked to their corresponding FunctionCall items. This can lead to a loss of conversational context and make it harder to understand the flow of tool executions when retrieving historical responses. If the call_id can be derived from the msg.name field as suggested by the comment, it should be used here.

References

Explicitly setting call_id to String::new() for FunctionCallOutput can lead to incorrect attribution and a loss of conversational context, similar to the issues that arise from incorrect tool call attribution. It's crucial to ensure call_ids are correctly linked to maintain the flow of tool executions.

Fixed — the tool role reconstruction now looks up the preceding FunctionCall item's call_id to correlate them properly.

Copilot

Pull request overview

Adds OpenAI-compatible Responses API endpoints to the web gateway, routing requests through IronClaw’s full agent loop (tools/memory/safety/state) instead of the existing raw LLM proxy.

Changes:

Register POST /v1/responses and GET /v1/responses/{id} routes on the gateway server.
Introduce responses_api.rs with request/response DTOs, thread/response ID encoding, SSE streaming mapper, and a non-streaming event accumulator.
Export the new module from src/channels/web/mod.rs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`src/channels/web/server.rs`	Adds routing for the new Responses API endpoints.
`src/channels/web/mod.rs`	Exposes the new `responses_api` module.
`src/channels/web/responses_api.rs`	Implements handlers, SSE streaming worker, event→Responses mapping, GET reconstruction, and unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T00:28:53Z

+                // Tool calls are stored as JSON arrays.
+                if let Ok(calls) =
+                    serde_json::from_str::<Vec<serde_json::Value>>(&msg.content)
+                {
+                    for call in calls {
+                        let name = call
+                            .get("name")
+                            .and_then(|v| v.as_str())
+                            .unwrap_or("unknown")
+                            .to_string();
+                        let call_id = call
+                            .get("id")
+                            .and_then(|v| v.as_str())
+                            .unwrap_or("")
+                            .to_string();
+                        let arguments = call
+                            .get("arguments")
+                            .map(|v| {
+                                if v.is_string() {
+                                    v.as_str().unwrap_or("{}").to_string()
+                                } else {
+                                    serde_json::to_string(v).unwrap_or_default()
+                                }
+                            })
+                            .unwrap_or_default();
+                        output.push(ResponseOutputItem::FunctionCall {
+                            id: make_item_id(),
+                            call_id,
+                            name,
+                            arguments,
+                        });


Tool-call rows are no longer always stored as a plain JSON array; the persistence layer can store an object like { "calls": [...], "narrative": "..." } (with legacy array support). This code only attempts Vec<Value> parsing and will silently drop tool calls for the wrapped format. Also, persisted tool-call IDs use call_id (and sometimes tool_call_id), but this reads id, so call_id will often be empty. Update parsing to handle both shapes and use the correct keys.

Suggested change

// Tool calls are stored as JSON arrays.

if let Ok(calls) =

serde_json::from_str::<Vec<serde_json::Value>>(&msg.content)

{

for call in calls {

let name = call

.get("name")

.and_then(|v| v.as_str())

.unwrap_or("unknown")

.to_string();

let call_id = call

.get("id")

.and_then(|v| v.as_str())

.unwrap_or("")

.to_string();

let arguments = call

.get("arguments")

.map(|v| {

if v.is_string() {

v.as_str().unwrap_or("{}").to_string()

} else {

serde_json::to_string(v).unwrap_or_default()

}

})

.unwrap_or_default();

output.push(ResponseOutputItem::FunctionCall {

id: make_item_id(),

call_id,

name,

arguments,

});

// Tool calls may be stored as a raw JSON array (legacy) or as an

// object wrapper like `{ "calls": [...], "narrative": "..." }`.

if let Ok(value) =

serde_json::from_str::<serde_json::Value>(&msg.content)

{

// Helper to push function-call items from a set of call objects.

let mut push_calls = |calls: &[serde_json::Value]| {

for call in calls {

let name = call

.get("name")

.and_then(|v| v.as_str())

.unwrap_or("unknown")

.to_string();

// Prefer `call_id` (or `tool_call_id`) and fall back to `id`.

let call_id = call

.get("call_id")

.or_else(|| call.get("tool_call_id"))

.or_else(|| call.get("id"))

.and_then(|v| v.as_str())

.unwrap_or("")

.to_string();

let arguments = call

.get("arguments")

.map(|v| {

if v.is_string() {

v.as_str().unwrap_or("{}").to_string()

} else {

serde_json::to_string(v).unwrap_or_default()

}

})

.unwrap_or_default();

output.push(ResponseOutputItem::FunctionCall {

id: make_item_id(),

call_id,

name,

arguments,

});

}

};

match value {

serde_json::Value::Array(ref calls) => {

// Legacy format: top-level array of calls.

push_calls(calls);

}

serde_json::Value::Object(ref obj) => {

// New format: { "calls": [...], "narrative": "..." }.

if let Some(calls_val) = obj.get("calls").and_then(|v| v.as_array()) {

push_calls(calls_val);

}

if let Some(narrative) =

obj.get("narrative").and_then(|v| v.as_str())

{

if !narrative.is_empty() {

output.push(ResponseOutputItem::Message {

id: format!("msg_{}", msg.id.simple()),

role: "assistant".to_string(),

content: vec![MessageContent::OutputText {

text: narrative.to_string(),

}],

});

}

}

}

_ => {

// Unsupported shape; ignore to match previous behavior.

}

Fixed — parsing now handles both the legacy plain array format and the { "calls": [...], "narrative": "..." } object wrapper. Also switched key lookup to call_id / tool_call_id / id fallback chain, and reads parameters alongside arguments.

Copilot · 2026-03-26T00:28:54Z

+    // Resolve or create thread.
+    let thread_uuid = match &req.previous_response_id {
+        Some(prev_id) => decode_response_id(prev_id).map_err(|e| {
+            api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")
+        })?,
+        None => Uuid::new_v4(),
+    };
+    let thread_id_str = thread_uuid.to_string();
+
+    // Build the message for the agent loop.
+    let msg = IncomingMessage::new("gateway", &user.user_id, &content)
+        .with_thread(&thread_id_str)
+        .with_metadata(serde_json::json!({
+            "thread_id": &thread_id_str,
+            "source": "responses_api",
+        }));
+
+    let resp_id = encode_response_id(&thread_uuid);


When previous_response_id is None, this handler mints a fresh UUID and always sets it as thread_id. For the gateway channel, UUID-shaped thread IDs are expected to already exist/be owned (see agent hydration logic), otherwise the agent loop rejects the message as an invalid/unauthorized thread. Fix by ensuring the conversation row exists (e.g., store.ensure_conversation(thread_uuid, "gateway", user_id, None) and optionally set thread metadata) before sending the message, or avoid setting a UUID thread_id until after a thread is created via the existing new-thread flow.

Suggested change

// Resolve or create thread.

let thread_uuid = match &req.previous_response_id {

Some(prev_id) => decode_response_id(prev_id).map_err(|e| {

api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")

})?,

None => Uuid::new_v4(),

};

let thread_id_str = thread_uuid.to_string();

// Build the message for the agent loop.

let msg = IncomingMessage::new("gateway", &user.user_id, &content)

.with_thread(&thread_id_str)

.with_metadata(serde_json::json!({

"thread_id": &thread_id_str,

"source": "responses_api",

}));

let resp_id = encode_response_id(&thread_uuid);

// Resolve or reuse an existing thread.

let thread_uuid_opt = match &req.previous_response_id {

Some(prev_id) => Some(decode_response_id(prev_id).map_err(|e| {

api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")

})?),

None => None,

};

let thread_id_str = thread_uuid_opt

.as_ref()

.map(|u| u.to_string())

.unwrap_or_else(|| "".to_string());

// Build the message for the agent loop.

let mut msg = IncomingMessage::new("gateway", &user.user_id, &content)

.with_metadata(serde_json::json!({

"thread_id": if thread_id_str.is_empty() {

serde_json::Value::Null

} else {

serde_json::Value::String(thread_id_str.clone())

},

"source": "responses_api",

}));

if let Some(ref thread_uuid) = thread_uuid_opt {

msg = msg.with_thread(&thread_uuid.to_string());

}

// Each response gets its own ID, independent of any thread UUID.

let resp_uuid = Uuid::new_v4();

let resp_id = encode_response_id(&resp_uuid);

Fixed — response IDs now encode both a per-call response_uuid and the thread_uuid (resp_{resp_hex}{thread_hex}). Each POST produces a unique response ID. previous_response_id decodes the thread UUID for conversation continuity.

Copilot · 2026-03-26T00:28:54Z

+    let msg = IncomingMessage::new("gateway", &user.user_id, &content)
+        .with_thread(&thread_id_str)
+        .with_metadata(serde_json::json!({
+            "thread_id": &thread_id_str,


IncomingMessage metadata here does not include user_id. Gateway send_status() scopes events to a user only when metadata.user_id is present; otherwise it broadcasts globally. Because this endpoint relies on StreamChunk/tool status events, omitting user_id will leak those events to all connected users (and also deliver them to other users’ subscribe_raw(Some(..)) subscribers as global events). Include user_id in metadata (and keep thread_id) so status events are user-scoped.

Suggested change

"thread_id": &thread_id_str,

"thread_id": &thread_id_str,

"user_id": &user.user_id,

Fixed — added "user_id" to the IncomingMessage metadata so SSE events are properly user-scoped.

Copilot · 2026-03-26T00:28:54Z

+    fn finish(self) -> ResponseObject {
+        ResponseObject {
+            id: self.resp_id,
+            object: "response",
+            created_at: unix_timestamp(),
+            model: self.model,
+            status: if self.failed {
+                ResponseStatus::Failed
+            } else {
+                ResponseStatus::Completed
+            },
+            output: self.output,
+            usage: self.usage,
+        }


ResponseAccumulator tracks error_message, but it is never surfaced in ResponseObject (and finish() ignores it). This makes failed/time-out responses indistinguishable from other failures and drops useful diagnostics. Either remove error_message entirely or add an error/last_error field to ResponseObject (matching the Responses API shape) and populate it from accumulator state.

Fixed — added error: Option<ResponseError> field to ResponseObject (with skip_serializing_if = "Option::is_none"). The accumulator's error_message is now surfaced in the final response.

Copilot · 2026-03-26T00:28:54Z

+    fn finish(self) -> ResponseObject {
+        ResponseObject {
+            id: self.resp_id,
+            object: "response",
+            created_at: unix_timestamp(),
+            model: self.model,
+            status: if self.failed {
+                ResponseStatus::Failed
+            } else {
+                ResponseStatus::Completed
+            },
+            output: self.output,
+            usage: self.usage,
+        }


created_at is recomputed in finish() via unix_timestamp(), so the timestamp in response.created (streaming) and the final response.completed/failed object can differ. Responses API clients typically expect created_at to remain stable for a given response ID. Consider storing created_at once in the accumulator (or passing it in) and reusing it for all emitted ResponseObjects.

Fixed — created_at is now captured once in the accumulator constructor and reused for all emitted ResponseObjects (streaming and final).

Copilot · 2026-03-26T00:28:54Z

+    fn emit(
+        tx: &tokio::sync::mpsc::Sender<Event>,
+        evt_type: &str,
+        payload: &impl Serialize,
+    ) -> bool {
+        if let Ok(data) = serde_json::to_string(payload) {
+            tx.try_send(sse_event(evt_type, &data)).is_ok()
+        } else {
+            true // serialization failure is non-fatal; keep going
+        }
+    }


The streaming worker uses try_send() and mostly ignores the return value from emit(). If the client disconnects (channel closed) or the buffer is full, the task will keep running until it hits a terminal event or the 120s timeout, wasting resources and potentially dropping important terminal events. Consider treating any failed send as a signal to stop the worker (or switch to send().await with backpressure).

Acknowledged — keeping the current try_send behavior for now since the SSE stream will naturally close when the client disconnects (the Receiver drop triggers cleanup). The 120s timeout caps resource usage. Will revisit if this causes issues in practice.

Copilot · 2026-03-26T00:28:55Z

+            AppEvent::ToolCompleted { name, parameters, .. } => {
+                if let Some(args) = parameters {
+                    for item in acc.output.iter_mut().rev() {
+                        if let ResponseOutputItem::FunctionCall {
+                            name: n,
+                            arguments: a,
+                            ..
+                        } = item
+                            && *n == *name
+                            && a.is_empty()
+                        {
+                            *a = args.clone();
+                            break;
+                        }
+                    }
+                }
+                if let Some(idx) = current_tool_index.take()
+                    && let Some(item) = acc.output.get(idx)
+                {
+                    emit(&tx, "response.output_item.done", &ResponseStreamEvent::OutputItemDone {
+                        output_index: idx,
+                        item: item.clone(),
+                    });
+                }
+            }


In streaming mode, AppEvent::ToolCompleted is treated as “done” for the function_call item, but success/error are ignored. Non-streaming mode emits a function_call_output with an error on tool failure; streaming mode currently drops that information. Handle success == false by emitting a function_call_output item (or failing the response) so clients see tool failures consistently across streaming and non-streaming.

Fixed — streaming ToolCompleted now emits a FunctionCallOutput with the error when success == false, matching the non-streaming behavior.

Copilot · 2026-03-26T00:28:55Z

+pub async fn get_response_handler(
+    State(state): State<Arc<GatewayState>>,
+    super::auth::AuthenticatedUser(_user): super::auth::AuthenticatedUser,
+    Path(id): Path<String>,
+) -> Result<Json<ResponseObject>, ApiError> {
+    let thread_uuid = decode_response_id(&id).map_err(|e| {
+        api_error(StatusCode::BAD_REQUEST, e, "invalid_request_error")
+    })?;
+
+    let store = state.store.as_ref().ok_or_else(|| {
+        api_error(
+            StatusCode::SERVICE_UNAVAILABLE,
+            "Database not configured",
+            "server_error",
+        )
+    })?;
+
+    // Load messages for this conversation.
+    let messages = store
+        .list_conversation_messages(thread_uuid)
+        .await
+        .map_err(|e| {
+            api_error(
+                StatusCode::INTERNAL_SERVER_ERROR,
+                format!("Failed to load conversation: {e}"),
+                "server_error",
+            )
+        })?;


GET /v1/responses/{id} loads conversation messages by UUID without verifying that the authenticated user owns that conversation. A guessed/obtained response ID would allow cross-user data access. Add an ownership check (e.g., store.conversation_belongs_to_user(thread_uuid, user_id)) and return 404/403 when not owned before calling list_conversation_messages.

Fixed — added conversation_belongs_to_user() check before loading messages. Returns 404 (not 403) to avoid leaking existence of other users' conversations.

serrrfirat

Review: feat(gateway): add OpenAI Responses API endpoints

I agree with @zmanian's verdict — this won't compile against the current codebase. The compilation blockers (AppEvent vs SseEvent, subscribe_raw/RateLimiter::check signature mismatches, missing AuthenticatedUser extractor) are clear showstoppers. Below I focus on issues not yet raised by other reviewers or where I can add deeper analysis.

HIGH — H1: Response ID = Thread UUID creates a 1:1 coupling that breaks multi-turn

encode_response_id wraps the thread UUID, meaning response_id == thread_id. In OpenAI's Responses API, each POST /v1/responses returns a new response ID even within the same conversation. Here, calling POST /v1/responses with previous_response_id decodes back to the same thread UUID and re-encodes it — so response_id never changes across turns. This means:

GET /v1/responses/{id} always returns the entire conversation, not a single response
Clients expecting per-turn responses will see all messages from all turns
The previous_response_id field becomes meaningless (it decodes to the same thread)

Fix: Response IDs should be separate from thread IDs. Consider storing a per-response marker (e.g., message ID or sequence number) alongside the thread UUID in the encoded ID, or generate a separate response UUID and persist its mapping to the thread + message range.

HIGH — H2: Non-streaming handler holds SSE subscription for up to 120s

handle_non_streaming subscribes to the SSE broadcast and blocks for up to RESPONSE_TIMEOUT (120s). This holds one of the 100 total SSE connection slots (the same pool used by browser clients and WebSocket connections). A few concurrent non-streaming API calls could exhaust the connection pool, blocking the web UI.

Fix: Either use a separate event channel for API consumers that doesn't count against the SSE connection limit, or increase the limit, or document this constraint prominently.

MEDIUM — M1: `tools`, `tool_choice`, and `instructions` fields silently ignored

ResponsesRequest accepts tools, tool_choice, and instructions but create_response_handler never passes them to the agent loop. This is a silent contract violation — clients sending tool definitions or system prompt overrides will believe they're being used. instructions is particularly important since it's the system prompt override in the Responses API, and silently dropping it will cause hard-to-debug behavior.

Fix: Either reject requests containing these fields with a clear error (400 + "not yet supported"), or pass instructions through as a system message prepended to the content.

MEDIUM — M2: Streaming `emit()` silently swallows serialization failures and disconnects

The inner emit() function returns true (keep going) when serialization fails. Combined with try_send (which also silently drops on buffer full), the streaming path has no mechanism to signal data loss to the client. If the client disconnects, the worker runs until a terminal event or the 120s timeout, wasting resources.

LOW — L1: `last_call_id_for` returns empty string on miss

If ToolResult arrives before ToolStarted (out-of-order events), the output will have call_id: "" that can never be correlated. This affects both the live accumulator and the GET handler reconstruction paths.

LOW — L2: `Incomplete` status variant is dead code

ResponseStatus::Incomplete is defined but never set. Add a TODO or remove it.

LOW — L3: GET handler hardcodes `model: "default"`

The model used for the original request isn't persisted, so reconstruction always returns "default". OpenAI clients may key behavior on the model field.

LOW — L4: No `DELETE /v1/responses/{id}` endpoint

The OpenAI Responses API includes a DELETE endpoint. Fine to skip in v1, but worth documenting as a known gap.

Verdict

The compilation blockers identified by @zmanian are the primary blocker. Beyond those, the response_id == thread_id coupling (H1) is an architectural issue that will be harder to fix later, and the SSE connection pool exhaustion risk (H2) could impact production stability. I'd recommend addressing H1 and H2 alongside the compilation fixes before merge.

ilblackdragon · 2026-03-26T18:35:54Z

Thanks @zmanian and @serrrfirat for the thorough reviews. Addressed all feedback:

Re: compilation blockers (zmanian #1-4) — After investigation, the code was actually targeting the correct types (AppEvent, subscribe_raw(Option<String>), AuthenticatedUser, PerUserRateLimiter::check(&str)) — these match the current codebase. The CI failures were purely formatting.

Fixes in this update:

Issue	Fix
H1: response_id == thread_id (serrrfirat)	Response IDs now encode both a per-call `response_uuid` and the `thread_uuid`. Each POST produces a unique ID. `previous_response_id` correctly decodes the thread UUID for continuity.
H5/H6/M1: `instructions`/`tools`/`tool_choice` silently ignored	Now returns 400 with clear error if any of these unsupported fields are present
M2: Streaming `emit()` swallows errors (serrrfirat)	Acknowledged — keeping `try_send` for now; 120s timeout + Receiver drop caps resource usage
Tool call parsing (Copilot)	Handles both legacy array and `{"calls":[...]}` object wrapper. Uses `call_id`/`tool_call_id`/`id` key fallback chain
Missing `user_id` in metadata (Copilot)	Added to metadata for user-scoped SSE events
GET ownership check (Copilot)	Added `conversation_belongs_to_user()` before loading messages; returns 404
`created_at` recomputed (Copilot)	Stored once in accumulator, reused for all emitted `ResponseObject`s
`error_message` never surfaced (Copilot)	Added `error: Option<ResponseError>` field to `ResponseObject`
Streaming tool failures (Copilot)	Streaming `ToolCompleted` now emits `FunctionCallOutput` with error on `success == false`
`Incomplete` dead variant (Gemini)	Removed
`tool` role empty `call_id` (Gemini/Copilot)	GET reconstruction correlates tool results to preceding `FunctionCall`

Known gaps (v1): DELETE /v1/responses/{id} not implemented; GET returns full conversation rather than per-turn scoped output (would need a responses table for proper per-turn scoping).

zmanian

Correction & Re-review: OpenAI Responses API endpoints

Mea culpa on the compilation blockers. My previous two reviews incorrectly claimed AppEvent, AuthenticatedUser, subscribe_raw(user_id), and PerUserRateLimiter::check(user_id) don't exist. They do -- they were added by recently merged PRs (#1614 TenantCtx, #1615 AppEvent extraction, #1626 user management). I verified: the branch compiles cleanly, clippy reports zero warnings, and all 14 tests pass. I apologize for the noise.

Actual issues in the code (fresh review of the real implementation):

High

response_id is 1:1 with thread_id: encode_response_id does format!("resp_{}", thread_id.simple()). Every POST to the same thread returns the same response ID. OpenAI's API returns a unique response ID per call. This means clients can't distinguish between different responses on the same thread, and the GET endpoint returns the entire conversation history rather than a specific response.
GET endpoint has no ownership check (get_response_handler): _user is extracted but unused (underscore prefix). list_conversation_messages(thread_uuid) loads any conversation without verifying it belongs to the authenticated user. Cross-user data access via guessed UUID.
instructions, tools, tool_choice silently ignored: These fields are accepted and deserialized but never validated, rejected, or forwarded. Callers who set instructions (system prompt override) will get unexpected behavior with no error.

Medium

GET always returns Completed status (line 1000): Regardless of whether the conversation is still in progress, failed, or has a pending approval.
GET tool role messages have empty call_id (line 984): call_id: String::new() hardcoded. Tool results don't correlate back to their function calls.
GET only parses legacy Vec<Value> for tool_calls: The persistence layer can store wrapped format { "calls": [...], "narrative": "..." } (from #1513). This parsing only tries Vec<Value>, silently dropping tool calls in the wrapped format.
created_at computed fresh in finish() (line 508): Each call to unix_timestamp() gets a different value. The response created_at should be captured once when the response is created, not when it completes.
error_message accumulated but not surfaced: ResponseObject has no error field. The accumulator tracks errors but finish() doesn't include them in the output. Failed responses are indistinguishable from successful empty ones (except by status).
Streaming worker ignores client disconnect: emit() uses try_send() and continues on failure. If the client disconnects, the worker runs until timeout (120s), wasting resources.

Low

Incomplete variant is dead code (line 116): Defined but never constructed. Will cause clippy warnings under zero-warnings policy.
GET hardcodes model: "default" and usage: ResponseUsage::default(): No model or token info from persisted data.
Streaming ToolCompleted doesn't handle success == false: Non-streaming path correctly emits FunctionCallOutput with error. Streaming path only updates arguments and emits output_item.done, dropping failure info.

What's good

Clean architecture: subscribing to SSE broadcast, filtering by thread, accumulating into OpenAI format
Proper timeout handling in both streaming and non-streaming paths
ApprovalNeeded correctly treated as failure with descriptive message
Good test coverage for pure logic (14 tests passing)
Streaming SSE format matches OpenAI's response.created -> response.output_text.delta -> response.completed lifecycle
Rate limiting and auth properly applied

Verdict

The code compiles and the design is sound. Items 1-3 (response ID uniqueness, ownership check, silent field ignoring) should be fixed before merge. The rest can be follow-ups.

pranavraja99

left a few comment below

pranavraja99 · 2026-03-26T19:39:17Z

+        .with_metadata(serde_json::json!({
+            "thread_id": &thread_id_str,
+            "source": "responses_api",
+        }));


model is echoed in the response but never forwarded to the agent loop — IncomingMessage has no model field. Abound will think model selection works but the server always uses its configured LLM.

Fixed in 08ab35e — non-"default" model values are now rejected with 400. IncomingMessage has no model field so we can't forward it; rejecting explicitly is the honest contract until model selection is wired through.

pranavraja99 · 2026-03-26T19:39:17Z

+    #[serde(default)]
+    pub stream: Option<bool>,
+    #[serde(default)]
+    pub temperature: Option<f32>,


temperature and max_output_tokens are also silently ignored, same as instructions/tools/tool_choice. Reject all unsupported fields with 400 for consistency.

Fixed in 08ab35e — temperature and max_output_tokens now also return 400, consistent with instructions/tools/tool_choice.

pranavraja99 · 2026-03-26T19:39:17Z

+            },
+            output: self.output,
+            usage: self.usage,
+        }


error_message is accumulated but finish() drops it — ResponseObject has no error field. Failed responses give Abound no diagnostic info.

Fixed in 08ab35e — added error: Option<ResponseError> to ResponseObject, populated from accumulator error_message in finish().

- Decouple response ID from thread ID: encode both a per-call response_uuid and the thread_uuid so each POST produces a unique ID - Reject unsupported fields (instructions, tools, tool_choice, temperature, max_output_tokens, non-default model) with 400 - Add user_id to IncomingMessage metadata for user-scoped SSE events - Add conversation_belongs_to_user() ownership check on GET endpoint - Fix tool call parsing: handle both legacy array and object wrapper format; use call_id/tool_call_id/id key fallback chain - Correlate tool role messages to preceding FunctionCall call_id - Stabilize created_at (capture once in accumulator, reuse everywhere) - Surface error_message via new ResponseObject.error field - Handle streaming tool failures (emit FunctionCallOutput on error) - Remove dead Incomplete status variant - Fix formatting (cargo fmt) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ilblackdragon · 2026-03-26T19:55:42Z

@zmanian Thanks for the thorough correction and fresh re-review. All items from your actual-issues list are addressed in 08ab35e:

#	Issue	Status
1	response_id 1:1 with thread_id	Fixed — encodes both `response_uuid` + `thread_uuid` per POST
2	GET no ownership check	Fixed — `conversation_belongs_to_user()` before loading messages
3	`instructions`/`tools`/`tool_choice` silently ignored	Fixed — returns 400 (also `temperature`, `max_output_tokens`, non-default `model`)
4	GET always returns Completed	Known gap — status not persisted per-response
5	`tool` role empty call_id	Fixed — correlates to preceding FunctionCall
6	Tool calls: only array format	Fixed — handles both array and `{"calls":[...]}` wrapper
7	`created_at` recomputed	Fixed — stored once in accumulator
8	`error_message` not surfaced	Fixed — new `error` field on ResponseObject
9	Streaming emit() swallows disconnect	Acknowledged — keeping try_send + 120s timeout for v1
10	`Incomplete` dead variant	Fixed — removed
11	GET hardcodes model/usage	Known gap — not persisted
12	Streaming tool failure not handled	Fixed — emits FunctionCallOutput on `success == false`

ilblackdragon · 2026-03-26T20:13:37Z

@serrrfirat All actionable items from your review are addressed in 08ab35e:

H1 (response_id == thread_id): Fixed — encode_response_id now takes both a per-call response_uuid and the thread_uuid, producing unique IDs per POST. New test response_ids_differ_across_turns verifies.
H2 (SSE slot exhaustion): Acknowledged as a v1 limitation — requires a separate event channel architecture, out of scope for this PR.
M1 (instructions/tools/tool_choice silently ignored): Fixed — all unsupported fields (including temperature, max_output_tokens, non-default model) now return 400.
M2 (streaming emit swallows disconnects): Acknowledged — keeping try_send + 120s timeout for v1.
L1 (last_call_id_for empty on miss): Accepted — out-of-order events aren't expected from the agent loop.
L2 (Incomplete dead variant): Fixed — removed.
L3 (GET hardcodes model): Known gap — model not persisted per-response.
L4 (no DELETE endpoint): Known v1 gap.

Re-requesting your review.

Addressed.

zmanian

Re-review: OpenAI Responses API (post `08ab35e`)

9 of 11 findings fully fixed. Nice work.

#	Finding	Status
1	response_id uniqueness	Fixed -- UUID per call, dual-encoded
2	GET ownership check	Fixed -- `conversation_belongs_to_user`
3	Unsupported fields rejected	Fixed -- 400 errors for instructions/tools/tool_choice
4	GET always returns Completed	Not fixed -- hardcoded line 1214. Add a TODO.
5	GET empty call_id	Fixed -- fallback chain
6	Legacy-only tool_calls parsing	Fixed -- handles wrapped format
7	created_at drift	Fixed -- captured once at construction
8	error_message not surfaced	Fixed -- ResponseError on ResponseObject
9	Streaming disconnect handling	Partially fixed -- initial emit only; main loop still runs on disconnect
10	Incomplete dead code	Fixed -- removed
11	Streaming tool failure	Fixed -- emits FunctionCallOutput with error

Remaining items are medium/low and can be follow-ups. Approve.

* feat(gateway): add OpenAI Responses API endpoints Add POST /v1/responses and GET /v1/responses/{id} to the web gateway, implementing the OpenAI Responses API. Unlike the existing Chat Completions proxy which passes through to the raw LLM, the Responses API routes requests through the full agent loop — giving external clients access to tools, memory, safety, and server-side conversation state via a standard OpenAI-compatible interface. Key design decisions: - Response IDs encode thread UUIDs statelessly (resp_{uuid_simple}) - previous_response_id enables multi-turn conversations - Streaming maps AppEvent variants to Responses API SSE events - Tool approval returns response.failed (no interactive approval flow) - GET endpoint reconstructs ResponseObject from conversation_messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(responses-api): address all review feedback on PR nearai#1656 - Decouple response ID from thread ID: encode both a per-call response_uuid and the thread_uuid so each POST produces a unique ID - Reject unsupported fields (instructions, tools, tool_choice, temperature, max_output_tokens, non-default model) with 400 - Add user_id to IncomingMessage metadata for user-scoped SSE events - Add conversation_belongs_to_user() ownership check on GET endpoint - Fix tool call parsing: handle both legacy array and object wrapper format; use call_id/tool_call_id/id key fallback chain - Correlate tool role messages to preceding FunctionCall call_id - Stabilize created_at (capture once in accumulator, reuse everywhere) - Surface error_message via new ResponseObject.error field - Handle streaming tool failures (emit FunctionCallOutput on error) - Remove dead Incomplete status variant - Fix formatting (cargo fmt) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 26, 2026 00:22

github-actions Bot added scope: channel/web Web gateway channel size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 26, 2026

Copilot started reviewing on behalf of ilblackdragon March 26, 2026 00:23 View session

gemini-code-assist Bot reviewed Mar 26, 2026

View reviewed changes

Copilot AI reviewed Mar 26, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

serrrfirat previously requested changes Mar 26, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

zmanian reviewed Mar 26, 2026

View reviewed changes

pranavraja99 reviewed Mar 26, 2026

View reviewed changes

ilblackdragon changed the base branch from main to staging March 26, 2026 19:44

ilblackdragon requested a review from serrrfirat March 26, 2026 20:13

github-actions Bot mentioned this pull request Mar 27, 2026

🦞 OpenClaw 生态日报 2026-03-27 gsscsd/big_model_radar#102

Open

ilblackdragon requested a review from zmanian March 27, 2026 05:38

zmanian approved these changes Mar 27, 2026

View reviewed changes

ilblackdragon merged commit 9c5ba43 into staging Mar 27, 2026
14 checks passed

ilblackdragon deleted the feat/responses-api branch March 27, 2026 07:00

ironclaw-ci Bot mentioned this pull request Mar 27, 2026

chore: promote staging to main (2026-03-27 07:25 UTC) #1698

Merged

serrrfirat mentioned this pull request Mar 28, 2026

feat(llm): add generic OpenAI Responses provider for custom gateways #1613

Closed

4 tasks

ironclaw-ci Bot mentioned this pull request Mar 30, 2026

chore(ironclaw): release v0.24.0 #1745

Merged

	"thread_id": &thread_id_str,
	"thread_id": &thread_id_str,
	"user_id": &user.user_id,

Conversation

ilblackdragon commented Mar 26, 2026

Summary

Design decisions

Files changed

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

MEDIUM — M1: `tools`, `tool_choice`, and `instructions` fields silently ignored

MEDIUM — M2: Streaming `emit()` silently swallows serialization failures and disconnects

LOW — L1: `last_call_id_for` returns empty string on miss

LOW — L2: `Incomplete` status variant is dead code

LOW — L3: GET handler hardcodes `model: "default"`

LOW — L4: No `DELETE /v1/responses/{id}` endpoint

pranavraja99 left a comment •

edited

Loading