-
Notifications
You must be signed in to change notification settings - Fork 17.7k
Session becomes unresumable after JSONL writer drops assistant entry during parallel tool calls #31328
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
area:corebugSomething isn't workingSomething isn't workinghas reproHas detailed reproduction stepsHas detailed reproduction stepsplatform:macosIssue specifically occurs on macOSIssue specifically occurs on macOSstaleIssue is inactiveIssue is inactive
Description
Summary
A session becomes permanently stuck with API error 400 when the JSONL writer drops an assistant message during parallel tool call execution. The dropped entry creates an orphaned tool_result that breaks the parent chain, making the session unresumable from both CLI (claude --resume) and Claude for Mac.
Reproduction
- Run a session with multiple concurrent subagents (3+ parallel tool calls)
- One assistant message containing a
tool_useblock is never written to the JSONL - The corresponding
tool_result(user entry) IS written, referencing atool_use_idwith no matchingtool_use - Close and resume the session
- Every subsequent API call fails with:
400 {"type":"error","error":{"type":"invalid_request_error",
"message":"messages.0.content.0: unexpected tool_use_id found in tool_result blocks: toolu_XXXX.
Each tool_result block must have a corresponding tool_use block in the previous message."}}
Root Cause
The JSONL writer is not atomic with respect to multi-tool-use responses when concurrent agent instances write to the same session file. The assistant message (containing tool_use) is lost, but the subsequent user message (containing tool_result) is persisted.
Evidence from affected session
- Session: 45MB, 5064 entries, 15 compactions
- 11 orphaned
tool_resultentries found (parent assistant messages missing) - 10 were in dead sidechains (no user impact)
- 1 landed in the active parent chain → session permanently broken
- Debug log shows 3 concurrent agent instances (
cc_versionhashes:.7e0,.3ae,.1a9) writing simultaneously at the time of corruption - The dropped message was a
git committool_use — the Bash hook fired and completed, but the assistant entry was never written
Additional issue: resume hangs on large sessions
After manually repairing the parent chain (removing the orphaned entry, re-parenting descendants), both claude --resume <id> and Claude for Mac fail to load the session:
- CLI hangs indefinitely (no debug log output, no API call made)
- Claude for Mac removes the session from the sidebar entirely
- The JSONL parses correctly in <0.2s with Python, no cycles, no duplicate UUIDs
- Session size: 45MB / 5060 entries — large but not unreasonable for a multi-day session
Expected behavior
- Writer atomicity: Assistant messages and their tool_result responses should be written atomically, or tool_results should validate that their parent tool_use exists before persisting
- Graceful resume: Large or corrupted sessions should show an error message, not hang silently
- Self-healing: On resume, detect orphaned tool_results in the active chain and skip them (they contain no user-authored content)
Environment
- Claude Code CLI: 2.1.69
- Claude for Mac (claude-desktop): 2.1.51
- macOS 15.3.1 (Darwin 25.2.0)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:corebugSomething isn't workingSomething isn't workinghas reproHas detailed reproduction stepsHas detailed reproduction stepsplatform:macosIssue specifically occurs on macOSIssue specifically occurs on macOSstaleIssue is inactiveIssue is inactive