fix: defer CLI discovery to connect() to avoid blocking async event loops by qing-ant · Pull Request #722 · anthropics/claude-agent-sdk-python

qing-ant · 2026-03-24T15:09:40Z

Summary

SubprocessCLITransport.__init__() called shutil.which("claude") synchronously, which blocks the async event loop. This causes BlockingError in async frameworks that monitor for blocking calls (e.g., LangGraph with the blockbuster library).

Fix

Defer CLI resolution from __init__ to the async connect() method, where it runs via anyio.to_thread.run_sync() on a worker thread. This wraps the entire _find_cli() function (including shutil.which() and filesystem existence checks) in a single thread dispatch.

When cli_path is provided via ClaudeAgentOptions, it is still set immediately in __init__ — no behavior change for that path.

Before:

def __init__(self, ...):
    self._cli_path = (
        str(options.cli_path) if options.cli_path is not None else self._find_cli()
    )  # ← blocks the event loop

After:

def __init__(self, ...):
    self._cli_path = (
        str(options.cli_path) if options.cli_path is not None else None
    )  # ← no blocking

async def connect(self):
    if self._cli_path is None:
        self._cli_path = await anyio.to_thread.run_sync(self._find_cli)
    # ...

Tests

Updated test_find_cli_not_found to verify the error is raised during connect() instead of __init__
Added test_init_does_not_call_find_cli to verify __init__ defers discovery
Added test_init_uses_provided_cli_path to verify explicit cli_path is used immediately
All 330 tests pass, mypy clean, ruff clean

Fixes #324

claude

LGTM — clean fix deferring blocking I/O to the async connect() path.

Extended reasoning...

Overview

This PR modifies 2 files: subprocess_cli.py and test_transport.py. It moves the shutil.which("claude") call (and related filesystem checks) from the synchronous __init__ constructor to the async connect() method, where it runs via anyio.to_thread.run_sync(). Defensive None checks are added in _build_command() and _check_claude_version() for the case where _cli_path hasn't been resolved yet. When cli_path is explicitly provided via options, behavior is unchanged.

Security risks

None. The change doesn't alter what commands are executed or how paths are resolved — it only changes when the resolution happens. No new inputs are accepted, no auth or permissions logic is modified.

Level of scrutiny

This is a straightforward async-correctness fix with a small blast radius. The pattern (deferring blocking I/O to a thread in an async context) is well-established and the implementation is minimal. The only bug found is a style nit about test convention (@pytest.mark.anyio vs anyio.run()) which doesn't affect correctness.

Other factors

Three new/updated tests cover the key behaviors: deferred discovery, explicit path passthrough, and error on missing CLI. The existing test suite (330 tests) reportedly passes. No CODEOWNERS file exists for these paths.

tests/test_transport.py

claude

LGTM — small, focused fix with good test coverage; previous feedback addressed.

Extended reasoning...

Overview

This PR modifies two files: subprocess_cli.py (production) and test_transport.py (tests). The production change defers _find_cli() from __init__ to connect() using anyio.to_thread.run_sync(), preventing synchronous shutil.which() from blocking async event loops. Two defensive None guards were added in _build_command() and _check_claude_version(). Three tests were added/updated to cover the new behavior.

Security risks

No security concerns. The change only affects when CLI discovery happens (construction vs connection), not how. No auth, crypto, or permissions code is touched.

Level of scrutiny

Low scrutiny warranted. This is a straightforward timing change — moving a blocking call out of a synchronous constructor into an async method where it can be properly offloaded to a thread. The pattern (anyio.to_thread.run_sync) is idiomatic and well-established in this codebase.

Other factors

My previous review flagged a test convention issue (@pytest.mark.anyio vs anyio.run), which has been addressed in commit 547da9e.
The bug report about using CLINotFoundError instead of RuntimeError for the internal guards is a valid semantic nit, but both guards are effectively unreachable dead code — they protect private methods that are only ever called after connect() resolves the path.
All existing tests continue to pass, and the new tests verify the deferred behavior correctly.

src/claude_agent_sdk/_internal/transport/subprocess_cli.py

qing-ant · 2026-03-25T18:05:28Z

E2E Test Results for PR #722

Tested the deferred CLI discovery change on branch fix/defer-cli-discovery-to-connect.

Unit Tests

4/4 passed (test_find_cli_not_found, test_init_does_not_call_find_cli, test_init_uses_provided_cli_path, test_build_command_agents_always_via_initialize).

E2E Tests (3/3 passed)

Test 1: Transport __init__ does NOT resolve CLI path

SubprocessCLITransport.__init__() completes in ~0.00s
transport._cli_path is None after construction
Confirms no shutil.which call during init

Test 2: CLI path resolved during connect()

After calling await transport.connect(), _cli_path is set (e.g., /root/.local/bin/claude)
Confirms CLI discovery happens via anyio.to_thread.run_sync() inside connect()

Test 3: Full query end-to-end

claude_agent_sdk.query(prompt="Reply with exactly: hello", options=ClaudeAgentOptions(max_turns=1)) completed successfully
Received expected response: TextBlock(text="hello")
CLI discovery, version check, subprocess launch, and message parsing all work correctly with the deferred approach

======================================================================
SUMMARY
======================================================================
  test_init_no_cli_path: PASS
  test_cli_path_after_connect: PASS
  test_full_query_e2e: PASS

ALL TESTS PASSED

claude

LGTM — clean fix for deferring blocking CLI discovery to the async connect() method.

Extended reasoning...

Overview

This PR touches two files: subprocess_cli.py (production) and test_transport.py (tests). The change defers shutil.which() and filesystem existence checks from __init__ to connect(), where they run on a worker thread via anyio.to_thread.run_sync(). When cli_path is explicitly provided, it is still set immediately in __init__ with no behavior change.

Security risks

None. The change does not alter any security-sensitive code paths — it only changes when the CLI binary is located, not how. No auth, crypto, or permissions code is affected.

Level of scrutiny

This is a low-risk, well-scoped bug fix. The core logic change is minimal (3 lines in __init__, 2 lines in connect()), with defensive guards added to two private methods. The existing _find_cli() method is completely unchanged. E2E tests confirmed the full query flow works correctly with the deferred approach.

Other factors

My two prior inline comments were both resolved: the test convention issue was fixed (now uses anyio.run(_test) matching the project pattern), and the CLINotFoundError vs RuntimeError semantic nit was acknowledged as minor since the guards are in private methods only called after connect() resolves the path. Three new tests were added covering deferred discovery, explicit cli_path, and the error-during-connect path. All 330 tests pass, mypy and ruff clean, and independent E2E verification confirmed correct behavior.

…oops shutil.which() in __init__ blocks the event loop in async frameworks (e.g. LangGraph with blockbuster monitoring). Defer CLI resolution to the async connect() method using anyio.to_thread.run_sync(). When cli_path is provided via ClaudeAgentOptions, it is still set immediately in __init__ with no behavior change. Fixes #324

claude

LGTM — straightforward async fix with good test coverage.

Extended reasoning...

Overview

This PR touches two files: subprocess_cli.py (the transport implementation) and test_transport.py (its tests). The change moves _find_cli() from __init__ (synchronous, blocks the event loop) to connect() (async, runs via anyio.to_thread.run_sync()). When cli_path is explicitly provided, the behavior is unchanged.

Security risks

None. The change only affects when CLI discovery happens, not how. No new inputs, no auth changes, no data exposure.

Level of scrutiny

This is a low-risk bugfix. The diff is small (~40 lines of meaningful change), the logic is straightforward (move a blocking call to an async context), and it follows the established anyio.to_thread.run_sync() pattern already used elsewhere in the codebase. Both my prior nits (test convention and exception type) have been resolved — the test convention was addressed in a follow-up commit, and the exception type nit was acknowledged and closed.

Other factors

Three new/updated unit tests directly validate the deferred behavior.
E2E tests (posted by a reviewer) confirm the full flow works correctly.
No CODEOWNERS file exists for these paths.
The defensive guards in _build_command() and _check_claude_version() are effectively unreachable (both are private methods only called after _cli_path is resolved in connect()), making the exception type choice a non-issue in practice.

bogini

stamped ⚡

claude bot reviewed Mar 24, 2026

View reviewed changes

tests/test_transport.py Show resolved Hide resolved

claude bot reviewed Mar 24, 2026

View reviewed changes

src/claude_agent_sdk/_internal/transport/subprocess_cli.py Show resolved Hide resolved

claude bot reviewed Mar 26, 2026

View reviewed changes

claude and others added 3 commits March 26, 2026 03:13

fix: use sync def + anyio.run convention in test_find_cli_not_found

ce3be11

Fix ruff format violation in test_transport.py

d9d6e3c

qing-ant force-pushed the fix/defer-cli-discovery-to-connect branch from 5e0e807 to d9d6e3c Compare March 26, 2026 03:13

claude bot reviewed Mar 26, 2026

View reviewed changes

bogini approved these changes Mar 26, 2026

View reviewed changes

qing-ant merged commit 19e1f53 into main Mar 26, 2026
10 checks passed

qing-ant deleted the fix/defer-cli-discovery-to-connect branch March 26, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: defer CLI discovery to connect() to avoid blocking async event loops#722

fix: defer CLI discovery to connect() to avoid blocking async event loops#722
qing-ant merged 3 commits intomainfrom
fix/defer-cli-discovery-to-connect

qing-ant commented Mar 24, 2026

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

qing-ant commented Mar 25, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

bogini left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qing-ant commented Mar 24, 2026

Summary

Fix

Tests

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

qing-ant commented Mar 25, 2026

E2E Test Results for PR #722

Unit Tests

E2E Tests (3/3 passed)

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

bogini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants