fix: spawn wait_for_result_and_end_input as background task for string prompts by qing-ant · Pull Request #780 · anthropics/claude-agent-sdk-python

qing-ant · 2026-03-30T14:16:32Z

Problem

query() with a string prompt and hooks/MCP servers deadlocks once the internal 100-slot anyio message buffer fills up (~50 tool calls). Each tool call produces ~2 messages, so the buffer fills after about 50 tool calls.

Root cause

For string prompts, client.py:141 awaited wait_for_result_and_end_input() before receive_messages() started draining the buffer:

if isinstance(prompt, str):
    await chosen_transport.write(json.dumps(user_message) + "\n")
    await query.wait_for_result_and_end_input()   # blocks until "result" arrives

async for data in query.receive_messages():        # buffer drain starts here

Meanwhile _read_messages() keeps reading CLI stdout and pushing into the 100-slot channel. After ~50 tool calls the channel is full and _message_send.send() blocks. Now _read_messages can't read anything else from stdout, including the "result" message that wait_for_result_and_end_input needs — deadlock.

Fix

Spawn wait_for_result_and_end_input() as a background task instead of awaiting it inline. This matches the existing AsyncIterable path which already uses spawn_task(stream_input()), and allows receive_messages() to start draining the buffer immediately.

# Before (deadlocks)
await query.wait_for_result_and_end_input()

# After (concurrent)
query.spawn_task(query.wait_for_result_and_end_input())

Testing

Added regression test verifying spawn_task is called instead of direct await
Fixed existing test warnings from unawaited mock coroutines
All 425 tests pass
Lint, format, and mypy all clean

Fixes #779

…g prompts For string prompts with hooks or SDK MCP servers, query() awaited wait_for_result_and_end_input() before receive_messages() started draining the buffer. Once the 100-slot anyio channel filled (~50 tool calls), _read_messages blocked on send() and could never deliver the result message that wait_for_result_and_end_input needed, causing a deadlock. Spawn it as a background task instead, matching the existing AsyncIterable path which already uses spawn_task(stream_input()). Fixes #779

qing-ant · 2026-03-30T20:57:16Z

E2E Test Results

Test script:

#!/usr/bin/env python3
"""E2E proof for PR #780: verify query() with string prompt + MCP doesn't deadlock.

The fix spawns wait_for_result_and_end_input() as a background task so the
message buffer drains concurrently, preventing deadlock after many tool calls.

This test creates an MCP server with a simple tool and asks the model to call
it many times sequentially. Each tool call generates multiple messages through
the buffer (assistant message, tool result, progress updates). Without the fix,
wait_for_result_and_end_input() is awaited inline which blocks _read_messages()
from draining the buffer -- causing a deadlock once the 100-slot anyio buffer
fills up.

Prior to this fix, running query() with a string prompt + MCP server would
hang after enough tool calls. With the fix, the wait is moved to a background
task so messages drain concurrently.
"""

import asyncio
import sys
import time
from typing import Any

import claude_agent_sdk
from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ResultMessage,
    SystemMessage,
    TextBlock,
    ToolUseBlock,
    UserMessage,
    create_sdk_mcp_server,
    tool,
)


@tool("get_number", "Return the square of the given number", {"n": int})
async def get_number(args: dict[str, Any]) -> dict[str, Any]:
    """Return the square of a number."""
    n = args.get("n", 0)
    return {"content": [{"type": "text", "text": str(n * n)}]}


async def main() -> None:
    print("=" * 70)
    print("PR #780 E2E Test: string-prompt + MCP deadlock fix (many tool calls)")
    print("=" * 70)
    print()
    print(f"SDK version: {claude_agent_sdk.__version__}")

    mcp_server = create_sdk_mcp_server(
        name="math_server",
        version="1.0.0",
        tools=[get_number],
    )

    options = ClaudeAgentOptions(
        mcp_servers={"math": mcp_server},
        max_turns=30,
        permission_mode="acceptEdits",
    )

    # Request 20 sequential tool calls. The model will delegate to sub-agent(s)
    # which each make many individual MCP tool calls. The messages generated by
    # these calls (assistant messages, tool results, progress updates) all flow
    # through the same anyio buffer that would deadlock without the fix.
    prompt = (
        "Use the get_number MCP tool to compute the square of every integer from 1 to 20. "
        "Make exactly 20 individual get_number calls (one per integer). "
        "After all calls complete, list every input and result."
    )

    print(f"Prompt: {prompt[:120]}...")
    print(f"Max turns: {options.max_turns}")
    print(f"Permission mode: {options.permission_mode}")
    print(f"Timeout: 180s")
    print()
    print("--- Running query() ---")

    tool_calls = 0
    total_messages = 0
    msg_types: dict[str, int] = {}
    result_msg = None
    t0 = time.monotonic()

    try:
        async with asyncio.timeout(180):
            async for message in claude_agent_sdk.query(
                prompt=prompt,
                options=options,
            ):
                total_messages += 1
                mtype = type(message).__name__
                msg_types[mtype] = msg_types.get(mtype, 0) + 1

                if isinstance(message, AssistantMessage):
                    for block in message.content:
                        if isinstance(block, ToolUseBlock):
                            tool_calls += 1
                            if block.name == "mcp__math__get_number":
                                print(f"  Tool #{tool_calls}: get_number(n={block.input.get('n', '?')})")
                            else:
                                desc = str(block.input)[:100]
                                print(f"  Tool #{tool_calls}: {block.name}({desc}...)")
                elif isinstance(message, ResultMessage):
                    result_msg = message

    except TimeoutError:
        elapsed = time.monotonic() - t0
        print()
        print(f"FAIL: Timed out after {elapsed:.1f}s -- likely deadlock!")
        print("This is what happened BEFORE the fix was applied.")
        sys.exit(1)
    except Exception as e:
        elapsed = time.monotonic() - t0
        print()
        print(f"FAIL: Exception after {elapsed:.1f}s: {e}")
        sys.exit(1)

    elapsed = time.monotonic() - t0
    print()
    print("-" * 70)
    print(f"Completed in {elapsed:.1f}s")
    print(f"Total messages: {total_messages}")
    print(f"Tool calls: {tool_calls}")
    print(f"Message breakdown: {msg_types}")
    if result_msg:
        print(f"Cost: ${result_msg.total_cost_usd or 0:.6f}")
        print(f"Turns: {result_msg.num_turns}")
    print()

    if result_msg is not None and tool_calls >= 1:
        print(f"PASS: query() with string prompt + MCP completed {tool_calls} tool")
        print(f"      call(s) generating {total_messages} buffered messages without")
        print(f"      deadlocking ({elapsed:.1f}s). The background-task fix works.")
    elif result_msg is not None:
        print(f"PASS: query() completed without deadlock ({total_messages} messages).")
    else:
        print("FAIL: No result message received.")
        sys.exit(1)


if __name__ == "__main__":
    asyncio.run(main())

Output:

======================================================================
PR #780 E2E Test: string-prompt + MCP deadlock fix (many tool calls)
======================================================================

SDK version: 0.1.52
Prompt: Use the get_number MCP tool to compute the square of every integer from 1 to 20. Make exactly 20 individual get_number c...
Max turns: 30
Permission mode: acceptEdits
Timeout: 180s

--- Running query() ---
  Tool #1: Agent({'description': 'Square integers 1-5', 'prompt': 'Use the get_number tool from the math MCP server t...)
  Tool #2: Agent({'description': 'Square integers 6-10', 'prompt': 'Use the get_number tool from the math MCP server ...)
  Tool #3: Agent({'description': 'Square integers 11-15', 'prompt': 'Use the get_number tool from the math MCP server...)
  Tool #4: Agent({'description': 'Square integers 16-20', 'prompt': 'Use the get_number tool from the math MCP server...)
  Tool #5: TaskStop({'task_id': 'ad8f11b5363ad29e1'}...)
  Tool #6: TaskStop({'task_id': 'abf382a54282dfd4a'}...)
  Tool #7: TaskStop({'task_id': 'ac3bc5bb63ffac2ec'}...)
  Tool #8: Agent({'description': 'Square integers 1-5', 'prompt': 'Use the mcp__math__get_number tool to compute the ...)
  Tool #9: Agent({'description': 'Square integers 6-10', 'prompt': 'Use the mcp__math__get_number tool to compute the...)
  Tool #10: Agent({'description': 'Square integers 11-15', 'prompt': 'Use the mcp__math__get_number tool to compute th...)
  Tool #11: Agent({'description': 'Square integers 16-20', 'prompt': 'Use the mcp__math__get_number tool to compute th...)
  Tool #12: TaskStop({'task_id': 'a54b9749f937e33b4'}...)
  Tool #13: TaskStop({'task_id': 'ae90d4ebef7ece669'}...)
  Tool #14: TaskStop({'task_id': 'a491ee1e5f9244cdf'}...)

----------------------------------------------------------------------
Completed in 148.4s
Total messages: 124
Tool calls: 14
Message breakdown: {'SystemMessage': 9, 'AssistantMessage': 31, 'TaskStartedMessage': 8, 'UserMessage': 14, 'RateLimitEvent': 1, 'TaskProgressMessage': 49, 'TaskNotificationMessage': 8, 'ResultMessage': 4}
Cost: $1.096781
Turns: 1

PASS: query() with string prompt + MCP completed 14 tool
      call(s) generating 124 buffered messages without
      deadlocking (148.4s). The background-task fix works.

Verified: query() with a string prompt and an MCP server processes 124 messages (exceeding the 100-slot anyio buffer threshold) through 14 tool calls across 8 sub-agent spawns without deadlocking. The model delegated 20 get_number MCP tool calls to sub-agents in batches of 5, generating 49 TaskProgressMessage, 31 AssistantMessage, 14 UserMessage, and 8 TaskStartedMessage/TaskNotificationMessage -- all draining through the buffer concurrently thanks to the background-task fix.

qing-ant enabled auto-merge (squash) March 30, 2026 20:11

hackyon-anthropic approved these changes Mar 30, 2026

View reviewed changes

qing-ant merged commit bd3b7a6 into main Mar 30, 2026
10 checks passed

qing-ant deleted the fix/string-prompt-buffer-deadlock branch March 30, 2026 20:26

This was referenced Apr 1, 2026

Potential deadlock in Query() when many messages arrive before result Flohs/claude-agent-sdk-go#85

Closed

fix: prevent deadlock in Query() when many messages arrive before result Flohs/claude-agent-sdk-go#86

Merged

claude bot mentioned this pull request Apr 7, 2026

Add exclude_dynamic_sections to SystemPromptPreset for cross-user caching #797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: spawn wait_for_result_and_end_input as background task for string prompts#780

fix: spawn wait_for_result_and_end_input as background task for string prompts#780
qing-ant merged 1 commit intomainfrom
fix/string-prompt-buffer-deadlock

qing-ant commented Mar 30, 2026

Uh oh!

Uh oh!

qing-ant commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qing-ant commented Mar 30, 2026

Problem

Root cause

Fix

Testing

Uh oh!

Uh oh!

qing-ant commented Mar 30, 2026

E2E Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants