Skip to content

Missing step-finish/step-start parts after retryable stream errors cause tool_use/tool_result mismatch #16749

@altendky

Description

@altendky

Summary

When the finish-step handler in processor.ts:244-288 throws (any of its async operations — Session.updatePart, Session.updateMessage, Snapshot.patch — can fail), the error is caught and if retryable, continue at line 377 creates a new LLM stream. But step-finish for step 1 and step-start for step 2 were never saved. Both steps' content gets merged into one DB message without boundaries.

On replay, convertToModelMessages() in the AI SDK produces a single assistant block with interleaved tool_use/text/reasoning content, which the Anthropic API rejects with:

messages.N: `tool_use` ids were found without `tool_result` blocks immediately after: toolu_XXX.
Each `tool_use` block must have a corresponding `tool_result` block in the next message.

or:

messages.N.content.0.type: Expected `thinking` or `redacted_thinking`, but found `tool_use`.

Root Cause

The finish-step handler at processor.ts:244-288 performs multiple async operations:

case "finish-step":
  const usage = Session.getUsage({ ... })
  await Session.updatePart({ type: "step-finish", ... })   // can throw
  await Session.updateMessage(input.assistantMessage)       // can throw
  if (snapshot) {
    const patch = await Snapshot.patch(snapshot)             // can throw
    // ...
  }
  // ...

If any of these throw and the error is deemed retryable, the catch block at line 353 hits continue at line 377, which loops back to while(true) and creates a new LLM stream. The new stream's events are appended to the same DB message, but step 1's step-finish and step 2's step-start parts were never saved.

Without step boundaries, the AI SDK's convertToModelMessages() merges all parts into a single block, producing:

assistant: [text, tool-call, text, tool-call]   ← INVALID: text after tool-call
tool:      [tool-result, tool-result]

Instead of the correct:

assistant: [text, tool-call]
tool:      [tool-result]
assistant: [text, tool-call]
tool:      [tool-result]

Secondary Root Cause: tool-error Race Condition

processor.ts:206 — the tool-error handler only processes errors when match.state.status === "running":

case "tool-error": {
  const match = toolcalls[value.toolCallId]
  if (match && match.state.status === "running") { // ← only "running"

Due to the AI SDK's merged-stream event ordering, tool-error can arrive before tool-call, when the status is still "pending". The error is silently ignored, leaving the tool in "pending" state. It's later cleaned up as "Tool execution aborted" with empty input {} by the post-stream cleanup at lines 401-417.

This was independently discovered by a user in #10616 (comment) who wrote:

"the tool-error handler only processes errors for tools in 'running' status. If the SDK emits a tool-error for a tool that's still in 'pending' status (because tool-call was never processed), the error is silently ignored."

Not Tool-Specific

This is a pipeline bug, not a tool bug. I've observed it with:

  • MCP tools (custom api_search tool)
  • Built-in tools (the write tool)

Real-World Evidence

Session ses_32fb35486ffeeJAHmplKU1gB2t, message msg_cd05ba534001gICo48Lsy1NHWp (from pre-repair DB backup):

SELECT p.id, p.time_created, json_extract(p.data, '$.type') as type,
       json_extract(p.data, '$.tool') as tool,
       json_extract(p.data, '$.state.status') as status,
       json_extract(p.data, '$.state.error') as error
FROM part p WHERE p.message_id = 'msg_cd05ba534001gICo48Lsy1NHWp'
ORDER BY p.time_created;
part_id                          | time_created  | type        | tool  | status    | error
---------------------------------+---------------+-------------+-------+-----------+------------------------
prt_cd05bb9ac001brzJbfx6NPVO2y   | 1773022198188 | step-start  |       |           |
prt_cd05bb9ad001pzM736ephha8OT   | 1773022198189 | text        |       |           |
prt_cd05bb9f0001N3qbpvXSA0NBGs   | 1773022198257 | tool        | write | error     | Tool execution aborted
                                                                                      ← 96 SECOND GAP
prt_cd05d3273001z4y25K6X1Q3Piz   | 1773022294644 | text        |       |           |
prt_cd05d35a8001jOK62EPx3KVVEd   | 1773022295465 | tool        | write | completed |
prt_cd05f3c5d001QVGr7VZTzuN4Gf   | 1773022428254 | step-finish |       |           |

Key observations:

  • The errored write tool has input: {} — the tool-error event was dropped because the tool was still "pending" when it arrived
  • There's a 96-second gap between the errored tool and the next text — this is when the retry created a new stream
  • No step-finish / step-start boundary between the two groups
  • The errored tool's time_updated (1773022428264) is 9ms after step-finish (1773022428254) — confirming the post-stream cleanup ran after the stream ended

Reproduction Test

A failing test is provided in the companion PR. It constructs a WithParts[] with parts from two merged steps:

step-start → text → tool(error) → [no boundary] → text → tool(completed)

Runs it through MessageV2.toModelMessages() and asserts the structural invariant: no text or reasoning part appears after a tool-call part in the same assistant ModelMessage.

Currently fails:

error: Invalid interleaving: found "text" part after "tool-call" in the same assistant message.
Content types in this message: [text, tool-call, text, tool-call]

Suggested Fixes

  1. Reconstruction-time fix (most important — handles already-corrupted data): In toModelMessages() or normalizeMessages(), detect when a text/reasoning part appears after a tool-call part in the same assistant block, and inject a synthetic step-start boundary to force the AI SDK to split the content into separate blocks.

  2. tool-error race fix: Accept tool-error when status === "pending" in addition to "running" at processor.ts:206.

  3. finish-step hardening: Wrap individual operations in the finish-step handler so partial failures don't lose the step boundary.

Related Issues

Environment

  • Provider: Anthropic (direct API)
  • Model: claude-opus-4-6 with adaptive thinking
  • OS: Linux (Ubuntu 22.04)
  • OpenCode version: dev build (latest dev branch)

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions