Skip to content

fix: handle CancelledError in MCP tool calls to prevent process crash#1728

Merged
Re-bin merged 1 commit intoHKUDS:mainfrom
pixan-ai:main
Mar 8, 2026
Merged

fix: handle CancelledError in MCP tool calls to prevent process crash#1728
Re-bin merged 1 commit intoHKUDS:mainfrom
pixan-ai:main

Conversation

@pixan-ai
Copy link
Copy Markdown
Contributor

@pixan-ai pixan-ai commented Mar 8, 2026

What

Add proper exception handling for CancelledError and general Exception
in MCPToolWrapper.execute().

Why

MCP SDK uses anyio cancel scopes internally. When a tool call times out or
fails, anyio can produce a CancelledError that leaks through asyncio.wait_for.

Since CancelledError inherits from BaseException (not Exception in Python 3.9+),
it escapes both MCPToolWrapper.execute() (which only caught TimeoutError) and
ToolRegistry.execute() (which only catches Exception), propagating up to the
agent loop dispatch and crashing the process.

The existing close_mcp() method already acknowledges this MCP SDK behavior
by catching BaseExceptionGroup.

How

  • Catch asyncio.CancelledError: use task.cancelling() (Python 3.11+) to
    distinguish genuine cancellation (/stop) from MCP SDK internal cancellation.
    Re-raise for /stop, return graceful error otherwise.
  • Catch Exception: return error message to LLM instead of crashing.

Testing

  1. Configure an MCP server with a slow tool
  2. Set a short toolTimeout
  3. Trigger the tool → should return graceful error instead of crashing
  4. Use /stop during MCP tool execution → should still cancel properly

Related: #1055

Acknowledgment

Fix identified and developed with assistance from Claude Opus 4.6

MCP SDK's anyio cancel scopes can leak CancelledError on timeout or
failure paths. Since CancelledError is a BaseException (not Exception),
it escapes both MCPToolWrapper.execute() and ToolRegistry.execute(),
crashing the agent loop.

Now catches CancelledError and returns a graceful error to the LLM,
while still re-raising genuine task cancellations from /stop.
Also catches general Exception for other MCP failures (connection
drops, invalid responses, etc.).

Related: HKUDS#1055
@Re-bin Re-bin merged commit 4e197dc into HKUDS:main Mar 8, 2026
linziyanleo added a commit to linziyanleo/nanobot__ava that referenced this pull request Mar 12, 2026
合并上游 165 commits(52 文件,+3288/-377 行),解决 6 个冲突文件:
- agent/loop.py: 保留 fork 的多模型/工具扩展,接纳上游变量名规范
- channels/telegram.py: 合并 fork 的 proxy 优化 + 上游的 streaming/table/stop/topic
- cli/commands.py: 合并 fork 的 console server + 上游的 Windows/Azure/multi-instance
- config/schema.py: 合并双方 Config 扩展(fork 压缩/心跳 + 上游 Discord/Azure)
- providers/registry.py: 保留 fork 的 zenmux/yunwu + 引入上游 Azure provider
- skills/memory/SKILL.md: 保留 fork 三维度记忆架构

上游关键合入:
- 安全修复 HKUDS#1677 allowlist bypass
- 稳定性修复 HKUDS#1728 MCP CancelledError
- MCP SSE 传输 HKUDS#1488
- Azure OpenAI provider HKUDS#1618
- 工具参数自动类型转换 HKUDS#1610
- read_file 大小限制 HKUDS#1511

适配测试以匹配 fork 自定义行为(206 tests passed)。
robottwo pushed a commit to robottwo/nanobot that referenced this pull request Mar 13, 2026
sorker pushed a commit to sorker/nanobot that referenced this pull request Mar 24, 2026
Wattysaid pushed a commit to kieran-assistant/nanobot that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants