Note that this was fully written by OpenAI Codex
Summary
When using claude-agent-sdk with SDK MCP servers (mcp_servers={...,"type":"sdk"}), tool calls can become unavailable or fail in scenarios where subagents keep running “in background” after the parent response completes. In the transcript this often appears as the model emitting a plain-text <function_calls><invoke ...></invoke></function_calls> block instead of a real tool_use / tool_result pair, i.e. it behaves as if the MCP tool is missing.
This seems correlated with the SDK’s internal message buffering/backpressure: if the application stops consuming messages after the parent ResultMessage (e.g. uses receive_response() and returns), later streaming output from background subagents can fill the SDK’s internal queue and block the transport reader, which then blocks the control protocol needed for SDK MCP bridging (mcp_message control requests).
Environment
claude-agent-sdk: 0.1.17
- Claude Code CLI:
2.0.70 (from stream-json transcript)
- Python:
3.12.3
mcp: 1.21.1
anyio: 4.11.0
include_partial_messages: True (in our usage; increases message volume)
What we see in practice
Working (foreground): real tool call:
- assistant emits a
tool_use block: mcp__action_manager__persist_character_design
- then a
tool_result is delivered back
Failing (background): tool call “hallucinated” as plain text:
- assistant message is just a
text block that contains <function_calls><invoke name="mcp__...">...
- no
tool_use / tool_result blocks appear, but the assistant text claims success
This matches the behavior when the model does not actually have the tool schema available or cannot complete the tool call.
Reproduction sketch (minimal)
I don’t have a single deterministic prompt-only repro yet, but the pattern is:
- Configure
ClaudeSDKClient with an SDK MCP server:
mcp_servers={"action_manager": create_sdk_mcp_server(...)}
- Ensure the model uses a subagent/background mechanism that can produce output after the parent response
ResultMessage (e.g., Task/subagent jobs that continue running while the parent returns).
- In application code, send
client.query(...) and then consume messages only until the first ResultMessage (e.g., async for m in client.receive_response(): ... which terminates at ResultMessage).
- Don’t keep draining
client.receive_messages() afterwards.
- If the CLI keeps producing additional events/messages (especially with
include_partial_messages=True), the SDK’s internal queue can fill, causing backpressure and breaking the ability to service later control requests (including SDK MCP).
Expected behavior
Even if the application uses receive_response() (and therefore stops consuming after the parent ResultMessage), SDK MCP tool availability and tool execution should remain reliable for any background/subagent work that is still ongoing within the same session.
At minimum, the SDK should not deadlock/control-protocol-starve when the app temporarily isn’t consuming transcript messages.
Actual behavior
After the parent response completes, background/subagent work that tries to call SDK MCP tools may:
- not see tool schemas / not be able to call tools
- emit a plain-text “function_calls/invoke” block (hallucinated tool call)
- or otherwise fail to get tool results
Suspected root cause (SDK-side)
In claude_agent_sdk/_internal/query.py:
- The SDK uses an internal memory stream with a small buffer:
anyio.create_memory_object_stream(max_buffer_size=100)
_read_messages() forwards all non-control messages into this buffer via:
await self._message_send.send(message)
If the application stops consuming messages (e.g., stops after ResultMessage), and the CLI continues emitting messages (common with partial streaming and/or subagent background output), then:
_message_send.send(...) blocks once the buffer reaches capacity.
_read_messages() stops draining stdout from the Claude Code CLI process.
- Control protocol messages that arrive later on stdout (including
control_request subtype mcp_message used for SDK MCP bridging) are not read/handled promptly.
- SDK MCP becomes unreliable, which manifests as missing tool schema or missing tool results.
This is particularly surprising because receive_response() is presented as a convenience API; users may reasonably expect it to be safe in sessions that use SDK MCP servers.
Proposed fixes / improvements
One or more of:
-
Never block _read_messages() on delivery to the user queue
- Use
send_nowait() / move_on_after(0) for non-control messages.
- If the queue is full, drop messages (or drop only low-value messages like partial
StreamEvent).
- The priority should be “keep draining CLI stdout + keep servicing control protocol”.
-
Make the internal message buffer size configurable
- e.g.,
ClaudeAgentOptions.max_message_queue (separate from max_buffer_size which currently guards JSON line buffering).
-
Add an SDK-managed background drain/pump
- If SDK MCP servers or hooks are configured, keep draining messages after
receive_response() returns so the control channel remains healthy.
- Or provide a documented helper/pattern for this.
-
Documentation
- Explicitly warn that if you use SDK MCP servers (or expect background/subagent output), you must continue consuming
receive_messages() or you may starve the control channel.
Why this matters
SDK MCP servers are a key feature for “in-process tools”. Background subagents (e.g., Task tool patterns) are also a core workflow. If receive_response() usage can cause hidden backpressure that breaks tool execution, it’s very easy for users to end up with brittle systems and hard-to-debug “hallucinated tool results”.
Note that this was fully written by OpenAI Codex
Summary
When using
claude-agent-sdkwith SDK MCP servers (mcp_servers={...,"type":"sdk"}), tool calls can become unavailable or fail in scenarios where subagents keep running “in background” after the parent response completes. In the transcript this often appears as the model emitting a plain-text<function_calls><invoke ...></invoke></function_calls>block instead of a realtool_use/tool_resultpair, i.e. it behaves as if the MCP tool is missing.This seems correlated with the SDK’s internal message buffering/backpressure: if the application stops consuming messages after the parent
ResultMessage(e.g. usesreceive_response()and returns), later streaming output from background subagents can fill the SDK’s internal queue and block the transport reader, which then blocks the control protocol needed for SDK MCP bridging (mcp_messagecontrol requests).Environment
claude-agent-sdk:0.1.172.0.70(from stream-json transcript)3.12.3mcp:1.21.1anyio:4.11.0include_partial_messages:True(in our usage; increases message volume)What we see in practice
Working (foreground): real tool call:
tool_useblock:mcp__action_manager__persist_character_designtool_resultis delivered backFailing (background): tool call “hallucinated” as plain text:
textblock that contains<function_calls><invoke name="mcp__...">...tool_use/tool_resultblocks appear, but the assistant text claims successThis matches the behavior when the model does not actually have the tool schema available or cannot complete the tool call.
Reproduction sketch (minimal)
I don’t have a single deterministic prompt-only repro yet, but the pattern is:
ClaudeSDKClientwith an SDK MCP server:mcp_servers={"action_manager": create_sdk_mcp_server(...)}ResultMessage(e.g., Task/subagent jobs that continue running while the parent returns).client.query(...)and then consume messages only until the firstResultMessage(e.g.,async for m in client.receive_response(): ...which terminates atResultMessage).client.receive_messages()afterwards.include_partial_messages=True), the SDK’s internal queue can fill, causing backpressure and breaking the ability to service later control requests (including SDK MCP).Expected behavior
Even if the application uses
receive_response()(and therefore stops consuming after the parentResultMessage), SDK MCP tool availability and tool execution should remain reliable for any background/subagent work that is still ongoing within the same session.At minimum, the SDK should not deadlock/control-protocol-starve when the app temporarily isn’t consuming transcript messages.
Actual behavior
After the parent response completes, background/subagent work that tries to call SDK MCP tools may:
Suspected root cause (SDK-side)
In
claude_agent_sdk/_internal/query.py:anyio.create_memory_object_stream(max_buffer_size=100)_read_messages()forwards all non-control messages into this buffer via:await self._message_send.send(message)If the application stops consuming messages (e.g., stops after
ResultMessage), and the CLI continues emitting messages (common with partial streaming and/or subagent background output), then:_message_send.send(...)blocks once the buffer reaches capacity._read_messages()stops draining stdout from the Claude Code CLI process.control_requestsubtypemcp_messageused for SDK MCP bridging) are not read/handled promptly.This is particularly surprising because
receive_response()is presented as a convenience API; users may reasonably expect it to be safe in sessions that use SDK MCP servers.Proposed fixes / improvements
One or more of:
Never block
_read_messages()on delivery to the user queuesend_nowait()/move_on_after(0)for non-control messages.StreamEvent).Make the internal message buffer size configurable
ClaudeAgentOptions.max_message_queue(separate frommax_buffer_sizewhich currently guards JSON line buffering).Add an SDK-managed background drain/pump
receive_response()returns so the control channel remains healthy.Documentation
receive_messages()or you may starve the control channel.Why this matters
SDK MCP servers are a key feature for “in-process tools”. Background subagents (e.g., Task tool patterns) are also a core workflow. If
receive_response()usage can cause hidden backpressure that breaks tool execution, it’s very easy for users to end up with brittle systems and hard-to-debug “hallucinated tool results”.