feat: configurable context budget for tool-loop iterations by 95256155o · Pull Request #2317 · HKUDS/nanobot

95256155o · 2026-03-21T03:54:55Z

Problem

In agent/loop.py, _run_agent_loop sends the full message list to the LLM on every tool-call iteration. For sessions with accumulated history, the redundant tokens scale linearly with both history size and iteration count.

Real-world example (20 tool iterations in 2 minutes):

Input tokens per call: 21,187 → 34,067 (climbing ~600/iteration)
~13,000 tokens of old session history resent identically every call
Total input: ~609,000 tokens ≈ $1.50 on Claude Sonnet via OpenRouter
95%+ of spend is redundant history resends

The LLM needs full history on the first call to understand context. Subsequent tool iterations reference their own tool chain, not old history.

Solution

Add a configurable contextBudgetTokens option in agents.defaults. When set, tool-loop iterations (after the first) trim old session history to fit within the token budget. Current-turn messages are never trimmed.

{
  "agents": {
    "defaults": {
      "contextBudgetTokens": 4000
    }
  }
}

0 (default): no trimming — current behavior, nothing breaks
> 0: max tokens of old session history during tool iterations 2+
Minimum floor: 500 tokens (values 1-499 clamped up)

How it works

Iteration 1: Full context, no trimming — LLM sees everything
Iteration 2+: Split messages into [system] + [old_history] + [current_turn], trim oldest messages from old_history until within budget, fix orphaned tool results via _find_legal_start, send trimmed view to LLM
The canonical message list is never mutated — full history is preserved for session persistence

Safety

System prompt: never trimmed
Current turn (user message + all tool calls/results): never trimmed
Tool-call boundaries: _find_legal_start prevents orphaned tool results after trimming
Session persistence: unaffected — full history always saved

Real-world results (deployed 2 days on personal instance)

With contextBudgetTokens: 1000 (aggressive):

Metric	Before	After
Input tokens (multi-tool call)	34K	6K
Reduction	—	~82%

Typical multi-tool session with budget=4000:

Metric	Before	After (budget=4000)
Call 1 input	21,187	21,187 (unchanged)
Call 20 input	34,067	~16,000
Total input (20 iterations)	~609,000	~280,000
Savings	—	~53%

Log output confirms trimming is active:

Context budget: trimmed 62 history messages (17482 tokens) for iteration 7
Context budget: trimmed 99 history messages (39593 tokens) for iteration 15
Context budget: trimmed 135 history messages (57280 tokens) for iteration 18

Files changed

File	Change
`nanobot/config/schema.py`	Add `context_budget_tokens: int = 0` to `AgentDefaults`
`nanobot/cli/commands.py`	Thread config into `AgentLoop` constructor
`nanobot/agent/loop.py`	Add `_trim_history_for_budget()`, wire into `_run_agent_loop`, add `context_budget_tokens` to `__init__`
`tests/test_context_budget.py`	11 unit tests covering no-op, trim, boundaries, floor, mutation safety

Test plan

11/11 unit tests pass
2-day soak on personal instance with contextBudgetTokens: 1000
No loss of context quality in multi-turn conversations
Bot correctly handles long tool chains (18+ iterations)
Session persistence unaffected

What this does NOT do

Does not change behavior when contextBudgetTokens = 0 (default)
Does not change the first LLM call (always full context)
Does not change session persistence or memory consolidation
Does not affect subagents (fresh sessions)

chengyongru · 2026-03-21T06:13:44Z

Hi @95256155o

I'll need to verify this PR in detail

chengyongru · 2026-03-21T09:42:11Z

Hi @95256155o thansk for your contribute!

I have a few concerns:

On the cost savings: The analysis confirms that trimming does save money — since nanobot only caches system and tools, not history messages, trimming doesn't hurt caching at all. So the cost benefit is real.

Btw, for providers with implicit prefix-based caching (like DeepSeek), cost savings might actually be negative — aggressive trimming can break prefix matching and reduce cache hit rate.

But there's a bigger problem: The core assumption — that "old history is redundant after iteration 1" — is flawed. For simple linear tool chains it might work, but for complex multi-step tasks, trimming history can cause:

LLM to lose track of original intent
Inconsistent responses across turns
Repetition of failed approaches

The risk depends on task complexity. A simple "read file → summarize" doesn't need history. But a debugging session with 20 tool calls? That's a different story.

Recommendation: The default (0 = no trimming) is safe. If users enable this, they should understand the trade-off. Maybe add a warning in the docs about when this is appropriate.

What are your thoughts? @Re-bin

95256155o · 2026-03-21T12:29:41Z

Thanks for the detailed review @chengyongru!

Good catch on DeepSeek's prefix caching — I'll add a note in the docs about that.

On the context loss concern: I've been running budget=1000 (aggressive) on my own instance for 2 days and haven't
noticed quality degradation. Most tool-loop iterations are fairly linear (read → process → act), so the current turn's
tool results carry enough context without needing old history. That said, I agree it's worth documenting the
trade-off.

I can add a docs section with recommended tiers:

0 — no trimming (default, safest)
4000 — conservative, barely trims in practice
1000 — aggressive, significant savings, works well for typical tasks
Lower values are technically possible but offer diminishing returns since 4000 already covers most use cases

Happy to add a warning that users with complex multi-step debugging workflows should stick with 0 or 4000. Want me to
push that as a follow-up commit?

95256155o · 2026-03-21T12:36:29Z

One more thought on the "complex multi-step tasks" concern:

In practice, when a task is complex enough that the LLM genuinely needs 20+ iterations of history to stay coherent, that's already outside nanobot's sweet spot — you'd want a tool with proper context management (like Claude Code) for that. Nanobot shines at short-to-medium tasks where recent context is what matters.

And for providers without implicit caching, keeping full history actually makes things worse — you're paying full price for every resent token with no speed benefit either. Trimming is arguably the more responsible default in that scenario.

So the trade-off is really: trim and save cost on the tasks nanobot is good at, vs. keep full history for tasks that probably shouldn't be running in nanobot anyway.

chengyongru · 2026-03-21T12:58:04Z

What I mean is, for example, when I tell the agent a fact, this fact may have been trimmed before consolidation, and originally it might have been recorded in history.md or memory.md.

95256155o · 2026-03-21T13:10:33Z

Good point — but consolidation doesn't run inside the tool loop, so it always sees full history. Here's the flow:

sequenceDiagram
    participant S as Session (full history)
    participant C as Consolidation
    participant L as Agent Loop
    participant LLM as LLM

    C->>S: Read full messages (preflight)
    C->>C: Archive old messages → MEMORY.md / HISTORY.md

    L->>S: Read full messages
    L->>LLM: Iteration 1 (full context)
    LLM-->>L: tool call
    L->>LLM: Iteration 2+ (trimmed view)
    Note right of LLM: Only this view is trimmed.<br/>Session is never mutated.
    LLM-->>L: done

    C->>S: Read full messages (background)
    C->>C: Archive if needed → MEMORY.md / HISTORY.md

Trimming only affects the LLM view during iterations 2+. Consolidation runs before/after the loop and always reads the full canonical message list — no facts are lost.

chengyongru · 2026-03-21T13:49:51Z

One more edge case I'm concerned about:

Long file read in Round 1 + edit in Round 2:

Round 1:

User: "reade a.py" (10,00 lines)
Agent: reads file, returns content (very long)
Round 2:
User: "edit line 100 "
If Round 1's tool_result was in old_history and got trimmed:
- Agent may not remember the file content from Round 1
- Edit operation failure rate could increase

The core question: Is the previous turn's tool_result considered "current turn" or "old history"?

If contextBudgetTokens is set too aggressively and Round 1's tool_result (the long file content) gets trimmed, the agent in Round 2 might:

Not have the file content context

Fail the edit operation

Ask user to re-provide the file

This is more severe than "forgetting preferences" — it directly impacts task success rate.

I think this might be a scenario I will encounter.

Re-bin · 2026-03-21T14:03:21Z

Hi @95256155o thansk for your contribute!

I have a few concerns:

On the cost savings: The analysis confirms that trimming does save money — since nanobot only caches system and tools, not history messages, trimming doesn't hurt caching at all. So the cost benefit is real.

Btw, for providers with implicit prefix-based caching (like DeepSeek), cost savings might actually be negative — aggressive trimming can break prefix matching and reduce cache hit rate.

But there's a bigger problem: The core assumption — that "old history is redundant after iteration 1" — is flawed. For simple linear tool chains it might work, but for complex multi-step tasks, trimming history can cause:

LLM to lose track of original intent

Inconsistent responses across turns

Repetition of failed approaches

The risk depends on task complexity. A simple "read file → summarize" doesn't need history. But a debugging session with 20 tool calls? That's a different story.

Recommendation: The default (0 = no trimming) is safe. If users enable this, they should understand the trade-off. Maybe add a warning in the docs about when this is appropriate.

What are your thoughts? @Re-bin

I will check this soon, many thanks ;) @chengyongru

95256155o · 2026-03-21T17:58:27Z

Good question — to answer directly: yes, Round 1's tool_result is "old history" and can be trimmed in Round 2. That's by design.

Here's why this is OK in practice:

Nanobot agents have tool access. If the agent needs file content to perform an edit, it will re-read the file. It's not a human trying to remember what they saw — it's an LLM with tools. A missing context → re-read is a 1-tool-call cost, not a failure.
The scenario itself is unusual for nanobot. Reading a 10,000-line file and then editing line 100 across separate rounds is a heavy IDE workflow — that's Claude Code / Cursor territory. Nanobot's strength is lightweight, fast, agent-dispatched tasks.
Every context management system has this trade-off. Claude Code's own auto-compact drops old content too. The question isn't "can context be lost" — it always can — it's whether the system can recover gracefully. With tool access, it can.

That said, I'll add a docs note making it clear:

contextBudgetTokens trades old-history visibility for cost savings
For workflows that rely on large tool results from previous rounds, use 0 or a high budget like 4000
Agent tool access provides a natural recovery path when context is trimmed

chengyongru · 2026-03-22T17:15:12Z

hi @95256155o

Because the nightly branch has recently been updated, you may need to rebase onto the latest nightly branch.

I have decided to merge this PR in and let nanobot users on the nightly branch experience it, see what they say.

Thanks for your contribution!

chengyongru · 2026-03-22T17:32:21Z

And so on, maybe you need to fix your commit message to protect your contribution! It looks like the email isn't linked to your GitHub account.

95256155o · 2026-03-22T17:41:45Z

Thanks for pointing that out @chengyongru! The commit email was misconfigured — should be properly linked to my GitHub account now. Will rebase onto the latest nightly shortly.

95256155o · 2026-03-22T17:46:57Z

Hi @chengyongru — rebased and cleaned up, here's a summary of what changed:

Housekeeping

Fixed commit author email (was 95256155o@github, now correctly linked to GitHub account)
Rebased onto latest upstream nightly (resolved conflicts in schema.py and loop.py — upstream had removed memory_window and refactored AgentLoop.__init__, both cleanly integrated)

Commit breakdown (4 commits on top of nightly)

feat: add contextBudgetTokens config field — adds context_budget_tokens: int = 0 to AgentDefaults in schema.py
feat: implement _trim_history_for_budget — core trimming logic in loop.py + tests in tests/test_context_budget.py
feat: thread contextBudgetTokens into AgentLoop constructor — wires the config value through config/loader.py → AgentLoop.__init__
feat: wire context budget trimming into agent loop — calls _trim_history_for_budget inside _run_agent_loop on iterations ≥ 2

What the feature does

contextBudgetTokens (default 0 = disabled) caps how many tokens of old session history are sent to the LLM during tool-loop iterations 2+. The current turn is never trimmed. Orphaned tool results at the trim boundary are automatically removed to avoid provider errors.

Ready for review — happy to make any further adjustments.

chengyongru · 2026-03-22T17:52:32Z

Ok, it's a bit late here, I will merge tomorrow. Then I will add some documentation based on our previous discussion.

Anyway, thank you very much for providing such valuable insights. I think this not only saves tokens but also reduces the first token latency!

95256155o · 2026-03-22T17:56:34Z

Thanks! And that's exactly the core insight — shorter context means lower TTFT, which is what users actually feel. Token savings is just a bonus.

Looking forward to the merge and the docs. 🤝

- Extract trim_history_for_budget() as a pure function in helpers.py - AgentLoop._trim_history_for_budget becomes a thin wrapper - Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes - Replace wrapper tests with direct helper unit tests

chengyongru

LGTM

* feat: add contextBudgetTokens config field for tool-loop trimming * feat: implement _trim_history_for_budget for tool-loop cost reduction * feat: thread contextBudgetTokens into AgentLoop constructor * feat: wire context budget trimming into agent loop * refactor: move trim_history_for_budget to helpers and add docs - Extract trim_history_for_budget() as a pure function in helpers.py - AgentLoop._trim_history_for_budget becomes a thin wrapper - Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes - Replace wrapper tests with direct helper unit tests --------- Co-authored-by: chengyongru <chengyongru.ai@gmail.com>

95256155o mentioned this pull request Mar 21, 2026

Tool-loop cost & reliability: roadmap of related issues #2318

Open

Re-bin requested a review from chengyongru March 21, 2026 04:02

Re-bin added the to-nightly label Mar 21, 2026

github-actions Bot mentioned this pull request Mar 22, 2026

🦞 Bản tin hàng ngày hệ sinh thái OpenClaw 2026-03-22 compasify/agents-radar#71

Open

chengyongru force-pushed the nightly branch from 43baf71 to 5fd66ca Compare March 22, 2026 12:30

95256155o force-pushed the feat/context-budget-tokens branch from 3869740 to b738cbf Compare March 22, 2026 17:23

95256155o force-pushed the feat/context-budget-tokens branch from b738cbf to b5db046 Compare March 22, 2026 17:40

95256155o force-pushed the feat/context-budget-tokens branch 2 times, most recently from 56d000e to 016613d Compare March 22, 2026 17:46

github-actions Bot mentioned this pull request Mar 23, 2026

🦞 Bản tin hàng ngày hệ sinh thái OpenClaw 2026-03-23 compasify/agents-radar#76

Open

chengyongru mentioned this pull request Mar 23, 2026

bug：run_agent_loop，没有检查contextWindowTokens。报错：This model's maximum context length is 32768 tokens. However, you requested 36748 tokens (28556 in the messages, 8192 in the completion #2343

Open

chengyongru added the enhancement New feature or request label Mar 23, 2026

95256155o added 3 commits March 23, 2026 17:17

feat: add contextBudgetTokens config field for tool-loop trimming

91c601c

feat: implement _trim_history_for_budget for tool-loop cost reduction

f6ba52a

feat: thread contextBudgetTokens into AgentLoop constructor

aeb6394

95256155o and others added 2 commits March 23, 2026 17:23

feat: wire context budget trimming into agent loop

9d3da14

chengyongru force-pushed the feat/context-budget-tokens branch from 016613d to ed79916 Compare March 23, 2026 10:07

chengyongru approved these changes Mar 23, 2026

View reviewed changes

chengyongru merged commit 528b3cf into HKUDS:nightly Mar 23, 2026
3 checks passed

This was referenced Mar 27, 2026

3/29 nightly refresh! #2563

Closed

feat(discord): configurable read receipt + subagent working indicator #2330

Merged

Conversation

95256155o commented Mar 21, 2026

Problem

Solution

How it works

Safety

Real-world results (deployed 2 days on personal instance)

Files changed

Test plan

What this does NOT do

Uh oh!

chengyongru commented Mar 21, 2026

Uh oh!

chengyongru commented Mar 21, 2026

Uh oh!

95256155o commented Mar 21, 2026

Uh oh!

95256155o commented Mar 21, 2026

Uh oh!

chengyongru commented Mar 21, 2026

Uh oh!

95256155o commented Mar 21, 2026

Uh oh!

chengyongru commented Mar 21, 2026

Uh oh!

Re-bin commented Mar 21, 2026

Uh oh!

95256155o commented Mar 21, 2026

Uh oh!

chengyongru commented Mar 22, 2026

Uh oh!

chengyongru commented Mar 22, 2026

Uh oh!

95256155o commented Mar 22, 2026

Uh oh!

95256155o commented Mar 22, 2026

Uh oh!

chengyongru commented Mar 22, 2026

Uh oh!

95256155o commented Mar 22, 2026

Uh oh!

chengyongru left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants