Skip to content

fix: resolve cross-task cancel scope RuntimeError on async generator cleanup (#454)#746

Merged
qing-ant merged 1 commit intomainfrom
qing/fix-454-cross-task-cancel-scope
Mar 26, 2026
Merged

fix: resolve cross-task cancel scope RuntimeError on async generator cleanup (#454)#746
qing-ant merged 1 commit intomainfrom
qing/fix-454-cross-task-cancel-scope

Conversation

@qing-ant
Copy link
Copy Markdown
Contributor

Problem

When users break out of the async for loop over query(), Python may finalize the async generator in a different task than the one that created the task group. This causes close() to call TaskGroup.__aexit__() from a different task than start() called __aenter__(), triggering:

RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

Fixes #454.

Root cause

The Query class was using anyio's TaskGroup with manual __aenter__/__aexit__ calls. anyio's cancel scopes have task affinity — they must be exited by the same async task that entered them. During async generator finalization, Python can schedule the generator's cleanup in a different task, violating this invariant.

Why PR #364 doesn't fix it

PR #364 introduces an "owner task pattern" that wraps the inner task group in a dedicated owner task. However, it still creates an outer task group (_outer_tg) using the same manual __aenter__/__aexit__ pattern, so the cross-task error just moves one level up. The tests in that PR call start() and close() from the same task, so they don't reproduce the actual failure scenario.

Solution

Replace anyio TaskGroup with asyncio.create_task() for background task management. asyncio.create_task() has no cancel scope, so close() can cancel tasks from any task context without triggering the RuntimeError.

Changes:

  • query.py: Replace _tg (anyio TaskGroup) with _read_task (asyncio Task) and _child_tasks (set of asyncio Tasks). Add spawn_task() method as the replacement for _tg.start_soon().
  • client.py / _internal/client.py: Update callers to use spawn_task() instead of _tg.start_soon().
  • test_query.py: Add tests that reproduce the cross-task cleanup scenario.

Test plan

  • All 356 existing tests pass
  • New test test_close_from_different_task_does_not_raise verifies cross-task cleanup works
  • New test test_close_from_same_task_still_works verifies normal cleanup still works
  • Linting (ruff) and type checking (mypy) pass

…ncel scope error (#454)

Replace manual anyio TaskGroup.__aenter__/__aexit__ calls with
asyncio.create_task() for background task management in Query.

The anyio TaskGroup pattern required cancel scopes to be entered and
exited in the same async task. When users break from the async generator
returned by query(), Python may finalize the generator in a different
task, causing close() to call __aexit__ from a different task than
start() called __aenter__. This produced a RuntimeError:
'Attempted to exit cancel scope in a different task than it was entered in'

The fix uses asyncio.create_task() which has no cancel scope affinity,
allowing close() to cancel the read task from any task context. A new
spawn_task() method replaces _tg.start_soon() for child tasks.

:house: Remote-Dev: homespace
@qing-ant
Copy link
Copy Markdown
Contributor Author

E2E Test Results

Test Script

"""E2E test for PR #746 / Issue #454: cross-task cancel scope RuntimeError on async generator cleanup.

When breaking out of `async for` over query(), Python may finalize the async
generator in a different task than the one that entered the anyio cancel scope,
causing:
  RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

This test breaks early from the async generator and checks stderr for the error.
"""

import asyncio
import sys
import io
import logging
import warnings


async def run_test():
    from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage

    # Capture warnings and stderr to detect the RuntimeError
    stderr_capture = io.StringIO()
    handler = logging.StreamHandler(stderr_capture)
    handler.setLevel(logging.DEBUG)
    logging.getLogger().addHandler(handler)

    old_stderr = sys.stderr
    sys.stderr = io.TextIOWrapper(io.BytesIO(), write_through=True)

    error_detected = False

    try:
        messages = []
        async for msg in query(
            prompt="Say hello in exactly 3 words",
            options=ClaudeAgentOptions(model="claude-sonnet-4-20250514"),
        ):
            messages.append(msg)
            if isinstance(msg, ResultMessage):
                result_preview = (msg.result or "")[:80]
                print(f"Got ResultMessage, breaking early. Result: {result_preview}...")
                break

        # Give the event loop a chance to process any pending callbacks/finalizers
        # that would trigger the cross-task cancel scope error
        await asyncio.sleep(0.5)

        # Force garbage collection to trigger async generator finalization
        import gc
        gc.collect()
        await asyncio.sleep(0.5)

    except Exception as e:
        print(f"Exception during query: {type(e).__name__}: {e}")
        error_detected = True

    finally:
        # Restore stderr and check for errors
        captured_stderr_bytes = sys.stderr.buffer.getvalue() if hasattr(sys.stderr, 'buffer') else b""
        sys.stderr = old_stderr
        captured_stderr = captured_stderr_bytes.decode("utf-8", errors="replace")
        captured_logs = stderr_capture.getvalue()

    # Check for the specific RuntimeError in captured output
    all_output = captured_stderr + captured_logs
    if "cancel scope" in all_output.lower() or "RuntimeError" in all_output:
        error_detected = True
        print(f"\n--- CAPTURED STDERR/LOGS ---")
        print(all_output.strip())
        print(f"--- END CAPTURED ---")

    return error_detected, messages


def main():
    # Also install a custom exception handler to catch "Task exception was never retrieved"
    exceptions_found = []

    def custom_exception_handler(loop, context):
        msg = context.get("message", "")
        exc = context.get("exception", None)
        detail = f"{msg}: {exc}" if exc else msg
        exceptions_found.append(detail)
        print(f"[Exception handler] {detail}", file=sys.__stderr__)

    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    loop.set_exception_handler(custom_exception_handler)

    try:
        error_detected, messages = loop.run_until_complete(run_test())
        # Give time for any deferred task exceptions
        loop.run_until_complete(asyncio.sleep(1.0))
    finally:
        # Run pending callbacks
        loop.run_until_complete(asyncio.sleep(0.2))
        loop.close()

    # Check for cross-task cancel scope errors in the exception handler output
    cancel_scope_errors = [e for e in exceptions_found if "cancel scope" in e.lower() or "RuntimeError" in e]
    any_task_exceptions = len(exceptions_found) > 0

    print(f"\nMessages received before break: {len(messages)}")
    print(f"Exception handler caught {len(exceptions_found)} exception(s)")
    for e in exceptions_found:
        print(f"  - {e}")

    if error_detected or cancel_scope_errors:
        print("\n>>> FAIL: RuntimeError about cancel scope detected <<<")
        sys.exit(1)
    elif any_task_exceptions:
        print(f"\n>>> FAIL: Task exceptions detected (not cancel scope, but still errors) <<<")
        sys.exit(1)
    else:
        print("\n>>> PASS: No cross-task cancel scope errors <<<")
        sys.exit(0)


if __name__ == "__main__":
    main()

Regression Test (main branch) - FAIL as expected

SDK installed from origin/main (76cb292). Breaking early from the async generator triggers the cross-task cancel scope RuntimeError:

[Exception handler] Task exception was never retrieved: Attempted to exit cancel scope in a different task than it was entered in
Got ResultMessage, breaking early. Result: Hello there friend!...
Traceback (most recent call last):
  File "/tmp/e2e-746-test.py", line 120, in <module>
    main()
  File "/tmp/e2e-746-test.py", line 91, in main
    error_detected, messages = loop.run_until_complete(run_test())
  File "asyncio/base_events.py", line 725, in run_until_complete
    return future.result()
  File "/tmp/e2e-746-test.py", line 46, in run_test
    await asyncio.sleep(0.5)
  File "asyncio/tasks.py", line 718, in sleep
    return await future
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f9c466835c0

Exit code: 1 (FAIL)

Fixed Branch Test (qing/fix-454-cross-task-cancel-scope) - PASS

SDK installed from the fix branch (b7dddce). Breaking early from the async generator works cleanly with no errors:

Got ResultMessage, breaking early. Result: Hello there friend!...

Messages received before break: 5
Exception handler caught 0 exception(s)

>>> PASS: No cross-task cancel scope errors <<<

Exit code: 0 (PASS)

Verdict

PASS - The fix correctly resolves the cross-task cancel scope RuntimeError. Replacing the anyio TaskGroup with direct asyncio.Task management eliminates the cancel scope that caused the cross-task finalization error when Python's async generator cleanup runs in a different task.

@qing-ant qing-ant enabled auto-merge (squash) March 26, 2026 23:12
Copy link
Copy Markdown
Collaborator

@bogini bogini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamped 🐎

@qing-ant qing-ant merged commit f39ebeb into main Mar 26, 2026
10 checks passed
@qing-ant qing-ant deleted the qing/fix-454-cross-task-cancel-scope branch March 26, 2026 23:26
qing-ant added a commit that referenced this pull request Mar 28, 2026
## Summary

Implements `control_cancel_request` handling in the Python SDK.
Previously, these messages from the CLI were silently ignored via a TODO
placeholder at `_internal/query.py:210-213`.

Fixes #739

## Problem

When the CLI sends `control_cancel_request` to cancel an in-flight hook
callback (e.g., when a subagent completes while a parent-level hook is
still pending, or during query shutdown), the SDK takes no action. This
causes:

1. **CLI-side AbortError noise** — The CLI fires its abort signal,
rejects the pending hook request, and logs `Error in hook callback
hook_N: ... AbortError` to stderr on every cancelled hook.
2. **Python runs cancelled callbacks** — Hook callbacks continue
executing after the CLI has abandoned them. The eventual response write
either gets dropped silently or hits a closed transport.
3. **Shutdown desync** — During `close()`, in-flight hooks that should
have been cancelled are still running.

## Fix

- **`__init__`**: Add `self._inflight_requests: dict[str, asyncio.Task]`
to track control request handlers by `request_id`
- **`_read_messages`**: When spawning `_handle_control_request` tasks,
register them in `_inflight_requests` with a done-callback that removes
them on completion. When `control_cancel_request` arrives, look up the
task by `request_id` and cancel it.
- **`_handle_control_request`**: Catch and re-raise
`asyncio.CancelledError` before the generic `Exception` handler, so
cancelled tasks don't attempt to write error responses for requests the
CLI has already abandoned.

The issue's suggested fix used `anyio.CancelScope`, but PR #746 replaced
the anyio TaskGroup with plain `asyncio.Task` tracking, so this fix uses
the simpler `asyncio.Task.cancel()` approach that matches the current
architecture.

## Verification

**Unit tests (3 new):**
- `test_cancel_request_cancels_inflight_hook` — slow hook gets
cancelled, `CancelledError` raised, no response written
- `test_cancel_request_for_unknown_id_is_noop` — unknown `request_id`
doesn't raise
- `test_completed_request_is_removed_from_inflight` — completed handlers
are cleaned up from tracking dict

**End-to-end with live SDK instance:**
```
=== Structural check ===
PASS: control_cancel_request handler implemented, TODO removed
PASS: _handle_control_request re-raises CancelledError without writing
PASS: _inflight_requests dict initialized

=== Live E2E: hooks still work after fix ===
  ResultMessage: is_error=False, turns=1
  Hook called 2 times: ['Agent', 'Write']
PASS: Hooks work correctly after fix
```

**Test suite:**
- 407 tests pass (2 pre-existing trio backend failures on main)
- `ruff check` + `ruff format` clean
- `mypy src/` clean
@tjni
Copy link
Copy Markdown

tjni commented Apr 2, 2026

Hi @qing-ant, is it possible to resolve this in a way that continues to use anyio in order to support trio?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RuntimeError on async generator cleanup due to task group context mismatch

3 participants