fix: defer CLI discovery to connect() to avoid blocking async event loops#722
fix: defer CLI discovery to connect() to avoid blocking async event loops#722
Conversation
There was a problem hiding this comment.
LGTM — clean fix deferring blocking I/O to the async connect() path.
Extended reasoning...
Overview
This PR modifies 2 files: subprocess_cli.py and test_transport.py. It moves the shutil.which("claude") call (and related filesystem checks) from the synchronous __init__ constructor to the async connect() method, where it runs via anyio.to_thread.run_sync(). Defensive None checks are added in _build_command() and _check_claude_version() for the case where _cli_path hasn't been resolved yet. When cli_path is explicitly provided via options, behavior is unchanged.
Security risks
None. The change doesn't alter what commands are executed or how paths are resolved — it only changes when the resolution happens. No new inputs are accepted, no auth or permissions logic is modified.
Level of scrutiny
This is a straightforward async-correctness fix with a small blast radius. The pattern (deferring blocking I/O to a thread in an async context) is well-established and the implementation is minimal. The only bug found is a style nit about test convention (@pytest.mark.anyio vs anyio.run()) which doesn't affect correctness.
Other factors
Three new/updated tests cover the key behaviors: deferred discovery, explicit path passthrough, and error on missing CLI. The existing test suite (330 tests) reportedly passes. No CODEOWNERS file exists for these paths.
There was a problem hiding this comment.
LGTM — small, focused fix with good test coverage; previous feedback addressed.
Extended reasoning...
Overview
This PR modifies two files: subprocess_cli.py (production) and test_transport.py (tests). The production change defers _find_cli() from __init__ to connect() using anyio.to_thread.run_sync(), preventing synchronous shutil.which() from blocking async event loops. Two defensive None guards were added in _build_command() and _check_claude_version(). Three tests were added/updated to cover the new behavior.
Security risks
No security concerns. The change only affects when CLI discovery happens (construction vs connection), not how. No auth, crypto, or permissions code is touched.
Level of scrutiny
Low scrutiny warranted. This is a straightforward timing change — moving a blocking call out of a synchronous constructor into an async method where it can be properly offloaded to a thread. The pattern (anyio.to_thread.run_sync) is idiomatic and well-established in this codebase.
Other factors
- My previous review flagged a test convention issue (
@pytest.mark.anyiovsanyio.run), which has been addressed in commit547da9e. - The bug report about using
CLINotFoundErrorinstead ofRuntimeErrorfor the internal guards is a valid semantic nit, but both guards are effectively unreachable dead code — they protect private methods that are only ever called afterconnect()resolves the path. - All existing tests continue to pass, and the new tests verify the deferred behavior correctly.
E2E Test Results for PR #722Tested the deferred CLI discovery change on branch Unit Tests4/4 passed ( E2E Tests (3/3 passed)Test 1: Transport
Test 2: CLI path resolved during
Test 3: Full query end-to-end
|
There was a problem hiding this comment.
LGTM — clean fix for deferring blocking CLI discovery to the async connect() method.
Extended reasoning...
Overview
This PR touches two files: subprocess_cli.py (production) and test_transport.py (tests). The change defers shutil.which() and filesystem existence checks from __init__ to connect(), where they run on a worker thread via anyio.to_thread.run_sync(). When cli_path is explicitly provided, it is still set immediately in __init__ with no behavior change.
Security risks
None. The change does not alter any security-sensitive code paths — it only changes when the CLI binary is located, not how. No auth, crypto, or permissions code is affected.
Level of scrutiny
This is a low-risk, well-scoped bug fix. The core logic change is minimal (3 lines in __init__, 2 lines in connect()), with defensive guards added to two private methods. The existing _find_cli() method is completely unchanged. E2E tests confirmed the full query flow works correctly with the deferred approach.
Other factors
My two prior inline comments were both resolved: the test convention issue was fixed (now uses anyio.run(_test) matching the project pattern), and the CLINotFoundError vs RuntimeError semantic nit was acknowledged as minor since the guards are in private methods only called after connect() resolves the path. Three new tests were added covering deferred discovery, explicit cli_path, and the error-during-connect path. All 330 tests pass, mypy and ruff clean, and independent E2E verification confirmed correct behavior.
…oops shutil.which() in __init__ blocks the event loop in async frameworks (e.g. LangGraph with blockbuster monitoring). Defer CLI resolution to the async connect() method using anyio.to_thread.run_sync(). When cli_path is provided via ClaudeAgentOptions, it is still set immediately in __init__ with no behavior change. Fixes #324
5e0e807 to
d9d6e3c
Compare
There was a problem hiding this comment.
LGTM — straightforward async fix with good test coverage.
Extended reasoning...
Overview
This PR touches two files: subprocess_cli.py (the transport implementation) and test_transport.py (its tests). The change moves _find_cli() from __init__ (synchronous, blocks the event loop) to connect() (async, runs via anyio.to_thread.run_sync()). When cli_path is explicitly provided, the behavior is unchanged.
Security risks
None. The change only affects when CLI discovery happens, not how. No new inputs, no auth changes, no data exposure.
Level of scrutiny
This is a low-risk bugfix. The diff is small (~40 lines of meaningful change), the logic is straightforward (move a blocking call to an async context), and it follows the established anyio.to_thread.run_sync() pattern already used elsewhere in the codebase. Both my prior nits (test convention and exception type) have been resolved — the test convention was addressed in a follow-up commit, and the exception type nit was acknowledged and closed.
Other factors
- Three new/updated unit tests directly validate the deferred behavior.
- E2E tests (posted by a reviewer) confirm the full flow works correctly.
- No CODEOWNERS file exists for these paths.
- The defensive guards in
_build_command()and_check_claude_version()are effectively unreachable (both are private methods only called after_cli_pathis resolved inconnect()), making the exception type choice a non-issue in practice.
Summary
SubprocessCLITransport.__init__()calledshutil.which("claude")synchronously, which blocks the async event loop. This causesBlockingErrorin async frameworks that monitor for blocking calls (e.g., LangGraph with theblockbusterlibrary).Fix
Defer CLI resolution from
__init__to the asyncconnect()method, where it runs viaanyio.to_thread.run_sync()on a worker thread. This wraps the entire_find_cli()function (includingshutil.which()and filesystem existence checks) in a single thread dispatch.When
cli_pathis provided viaClaudeAgentOptions, it is still set immediately in__init__— no behavior change for that path.Before:
After:
Tests
test_find_cli_not_foundto verify the error is raised duringconnect()instead of__init__test_init_does_not_call_find_clito verify__init__defers discoverytest_init_uses_provided_cli_pathto verify explicitcli_pathis used immediatelyFixes #324