feat: spawn_status/spawn_cancel tools, domain loop detection, hardened cursor recovery by MuataSr · Pull Request #3303 · HKUDS/nanobot

MuataSr · 2026-04-19T07:57:18Z

Summary

Three targeted improvements to agent orchestration and safety:

1. spawn_status and spawn_cancel tools

SpawnStatusTool: List running/done subagent tasks with task ID, label, elapsed time, and status
SpawnCancelTool: Cancel a running subagent by task ID
New methods on SubagentManager: cancel_by_id, get_task_status, get_all_status, _format_status

2. max_iterations and timeout_seconds on spawn

SpawnTool.execute now accepts max_iterations (default 15) and timeout_seconds (default 300)
Passed through to SubagentManager.spawn and enforced via asyncio.wait_for
Lets the main agent allocate iteration budget based on task complexity

3. Same-domain loop detection

Upstream blocks repeated exact-URL lookups but not repeated fetches from the same domain
Adds _MAX_REPEAT_SAME_DOMAIN = 3 threshold with domain extraction via urlparse
Blocks after 4 fetches from the same site, catching loops that vary the URL path but keep hitting one domain

4. Hardened cursor recovery (memory.py)

Upstream uses e.get(cursor, 0) which passes poisoned string cursors as truthy
Explicit isinstance(c, int) guard in both _next_cursor fallback and read_unprocessed_history
Prevents string cursors from corrupting session state (proven fix from Apr 2026 outage)

Files changed (7 files, +218 -35)

nanobot/agent/tools/spawn_status.py (new)
nanobot/agent/tools/spawn_cancel.py (new)
nanobot/agent/subagent.py
nanobot/agent/loop.py
nanobot/agent/tools/spawn.py
nanobot/utils/runtime.py
nanobot/agent/memory.py

Testing

All imports verified passing
Local patches running in production since April 2026
Memory cursor fix proven by surviving the Apr 2026 history poisoning outage

Replaces

Supersedes fix: subagent tooling, iteration budget warning, and cron update_job #3172 (cleaner scope, drops fixes already handled upstream independently)

- Add SpawnStatusTool: list running/done subagent tasks with ID, label, elapsed time - Add SpawnCancelTool: cancel a running subagent by task ID - Add cancel_by_id(), get_task_status(), get_all_status(), _format_status() to SubagentManager - Add max_iterations and timeout_seconds params to SpawnTool.execute() - Wire new tools into AgentLoop tool registration

Upstream blocks repeated exact-URL lookups but not repeated fetches from the same domain. Add _MAX_REPEAT_SAME_DOMAIN=3 threshold and domain extraction via urlparse. Blocks after 4 fetches from the same site, catching agent loops that vary the URL path but keep hitting one domain.

Upstream uses e.get('cursor', 0) which passes string cursors (e.g. 'abc') as truthy values. Our fix explicitly checks isinstance(c, int) in both _next_cursor fallback and read_unprocessed_history, preventing poisoned string cursors from corrupting session state.

tool_parameters decorator wraps all params as StringSchema, so values arrive as strings (e.g. '5' not 5). The runner compares these against int thresholds, causing TypeError: '<=' not supported between instances of 'str' and 'int'. Add int() coercion before passing to manager.

Re-bin

Thanks for bundling these — the individual ideas (spawn status/cancel, iteration budget, cursor poisoning guard) are all reasonable. But the PR as-is has enough execution issues that I'd ask for changes before we merge any of it.

Bugs and blockers

nanobot/utils/runtime.py declares _MAX_REPEAT_SAME_DOMAIN = 3 twice on consecutive lines. Drop the duplicate.
spawn.py declares max_iterations and timeout_seconds as StringSchema with a comment saying tool_parameters wraps everything as StringSchema — coerce to int. That's not correct — nanobot/agent/tools/schema.py:54 defines IntegerSchema. Using StringSchema tells strict-mode providers to emit strings and makes int(max_iterations) crash with ValueError if the model emits anything non-numeric. Please use IntegerSchema and drop the runtime coerce.
spawn_status.py and spawn_cancel.py declare their parameters with the old hand-rolled parameters property dict, while spawn.py in the same PR uses @tool_parameters(tool_parameters_schema(...)). The rest of the codebase has standardized on the decorator DSL — please convert these two new tools over for consistency.
memory.py _next_cursor now scans every history entry on every call (O(n)). The old fast path read the last entry (O(1)). The poisoning fix is correct in intent — the right shape is to keep the fast path and add isinstance(c, int) there, and only fall back to the full scan when the last entry's cursor isn't an int.
asyncio.wait_for around self.runner.run(...) will raise TimeoutError without setting status.phase = "done" or cleaning up _running_tasks/_task_statuses for the timed-out task. Please add explicit cleanup so timed-out subagents don't show as "running" forever in spawn_status.

Testing

223 lines of new behavior across subagent lifecycle, history cursor recovery, and same-domain blocking, with zero tests. "Running in production" isn't a test. Please add at least:

one test for cursor recovery: write a history entry with a non-int cursor, verify _next_cursor doesn't crash and returns a valid int
one test for cancel_by_id: spawn → cancel → verify _running_tasks and _task_statuses are cleaned up
one test for timeout: spawn with 1s timeout on a task that sleeps 5s, verify cleanup happens and status ends as error
one test for domain detection: 4 different URLs from the same domain → 4th is blocked

Scope

These are actually four unrelated improvements (spawn tools, iteration/timeout budget, domain loop detection, cursor recovery). Merging them as one makes any partial revert painful. If you can split into separate PRs I can review each one faster; the cursor recovery one especially deserves to land on its own since it's a reliability fix with a clear story.

Ergonomics

_MAX_REPEAT_SAME_DOMAIN = 3 is hardcoded and not configurable. Legitimate research flows often fetch 4+ pages from one site (blog archives, docs). Please expose this as a config knob with a more permissive default (I'd suggest 10) so the protection doesn't fire on legitimate usage.

Also worth closing #3172 with a note pointing at this PR once the scope is settled.

MuataSr · 2026-04-20T20:32:39Z

Thanks for the thorough review, @Re-bin — the feedback was spot-on.

I've split this PR into separate focused PRs as you suggested. First one is up:

PR #3340 — Cursor recovery (the highest-priority reliability fix with a clear production incident behind it)

It addresses your specific feedback:

Keeps the O(1) fast path plus isinstance guard, only falls back to full scan on corruption
read_unprocessed_history skips non-int cursors instead of crashing
7 regression tests included

The remaining three PRs (spawn status/cancel with decorator DSL, iteration/timeout budget with IntegerSchema, domain loop detection with configurable threshold) will follow.

Also closing #3172 — cursor poisoning was the blocking issue reported there and PR #3340 fixes it.

Re-bin · 2026-04-21T14:13:13Z

#3340 is merged. Thanks!

MuataSr added 4 commits April 19, 2026 03:54

Re-bin requested changes Apr 20, 2026

View reviewed changes

MuataSr mentioned this pull request Apr 20, 2026

fix(memory): harden cursor recovery against non-integer corruption #3340

Merged

github-actions Bot mentioned this pull request Apr 22, 2026

🦞 OpenClaw 生态日报 2026-04-22 gsscsd/big_model_radar#226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: spawn_status/spawn_cancel tools, domain loop detection, hardened cursor recovery#3303

feat: spawn_status/spawn_cancel tools, domain loop detection, hardened cursor recovery#3303
MuataSr wants to merge 4 commits intoHKUDS:mainfrom
MuataSr:feat/spawn-tools-and-domain-loop-detection

MuataSr commented Apr 19, 2026

Uh oh!

Re-bin left a comment

Uh oh!

MuataSr commented Apr 20, 2026

Uh oh!

Re-bin commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MuataSr commented Apr 19, 2026

Summary

1. spawn_status and spawn_cancel tools

2. max_iterations and timeout_seconds on spawn

3. Same-domain loop detection

4. Hardened cursor recovery (memory.py)

Files changed (7 files, +218 -35)

Testing

Replaces

Uh oh!

Re-bin left a comment

Choose a reason for hiding this comment

Bugs and blockers

Testing

Scope

Ergonomics

Uh oh!

MuataSr commented Apr 20, 2026

Uh oh!

Re-bin commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants