Skip to content

feat: configurable context budget for tool-loop iterations#2317

Merged
chengyongru merged 5 commits into
HKUDS:nightlyfrom
95256155o:feat/context-budget-tokens
Mar 23, 2026
Merged

feat: configurable context budget for tool-loop iterations#2317
chengyongru merged 5 commits into
HKUDS:nightlyfrom
95256155o:feat/context-budget-tokens

Conversation

@95256155o
Copy link
Copy Markdown
Contributor

Problem

In agent/loop.py, _run_agent_loop sends the full message list to the LLM on every tool-call iteration. For sessions with accumulated history, the redundant tokens scale linearly with both history size and iteration count.

Real-world example (20 tool iterations in 2 minutes):

  • Input tokens per call: 21,187 → 34,067 (climbing ~600/iteration)
  • ~13,000 tokens of old session history resent identically every call
  • Total input: ~609,000 tokens ≈ $1.50 on Claude Sonnet via OpenRouter
  • 95%+ of spend is redundant history resends

The LLM needs full history on the first call to understand context. Subsequent tool iterations reference their own tool chain, not old history.

Solution

Add a configurable contextBudgetTokens option in agents.defaults. When set, tool-loop iterations (after the first) trim old session history to fit within the token budget. Current-turn messages are never trimmed.

{
  "agents": {
    "defaults": {
      "contextBudgetTokens": 4000
    }
  }
}
  • 0 (default): no trimming — current behavior, nothing breaks
  • > 0: max tokens of old session history during tool iterations 2+
  • Minimum floor: 500 tokens (values 1-499 clamped up)

How it works

  1. Iteration 1: Full context, no trimming — LLM sees everything
  2. Iteration 2+: Split messages into [system] + [old_history] + [current_turn], trim oldest messages from old_history until within budget, fix orphaned tool results via _find_legal_start, send trimmed view to LLM
  3. The canonical message list is never mutated — full history is preserved for session persistence

Safety

  • System prompt: never trimmed
  • Current turn (user message + all tool calls/results): never trimmed
  • Tool-call boundaries: _find_legal_start prevents orphaned tool results after trimming
  • Session persistence: unaffected — full history always saved

Real-world results (deployed 2 days on personal instance)

With contextBudgetTokens: 1000 (aggressive):

Metric Before After
Input tokens (multi-tool call) 34K 6K
Reduction ~82%

Typical multi-tool session with budget=4000:

Metric Before After (budget=4000)
Call 1 input 21,187 21,187 (unchanged)
Call 20 input 34,067 ~16,000
Total input (20 iterations) ~609,000 ~280,000
Savings ~53%

Log output confirms trimming is active:

Context budget: trimmed 62 history messages (17482 tokens) for iteration 7
Context budget: trimmed 99 history messages (39593 tokens) for iteration 15
Context budget: trimmed 135 history messages (57280 tokens) for iteration 18

Files changed

File Change
nanobot/config/schema.py Add context_budget_tokens: int = 0 to AgentDefaults
nanobot/cli/commands.py Thread config into AgentLoop constructor
nanobot/agent/loop.py Add _trim_history_for_budget(), wire into _run_agent_loop, add context_budget_tokens to __init__
tests/test_context_budget.py 11 unit tests covering no-op, trim, boundaries, floor, mutation safety

Test plan

  • 11/11 unit tests pass
  • 2-day soak on personal instance with contextBudgetTokens: 1000
  • No loss of context quality in multi-turn conversations
  • Bot correctly handles long tool chains (18+ iterations)
  • Session persistence unaffected

What this does NOT do

  • Does not change behavior when contextBudgetTokens = 0 (default)
  • Does not change the first LLM call (always full context)
  • Does not change session persistence or memory consolidation
  • Does not affect subagents (fresh sessions)

@chengyongru
Copy link
Copy Markdown
Collaborator

Hi @95256155o

I'll need to verify this PR in detail

@chengyongru
Copy link
Copy Markdown
Collaborator

Hi @95256155o thansk for your contribute!

I have a few concerns:

On the cost savings: The analysis confirms that trimming does save money — since nanobot only caches system and tools, not history messages, trimming doesn't hurt caching at all. So the cost benefit is real.

Btw, for providers with implicit prefix-based caching (like DeepSeek), cost savings might actually be negative — aggressive trimming can break prefix matching and reduce cache hit rate.

But there's a bigger problem: The core assumption — that "old history is redundant after iteration 1" — is flawed. For simple linear tool chains it might work, but for complex multi-step tasks, trimming history can cause:

  • LLM to lose track of original intent
  • Inconsistent responses across turns
  • Repetition of failed approaches

The risk depends on task complexity. A simple "read file → summarize" doesn't need history. But a debugging session with 20 tool calls? That's a different story.

Recommendation: The default (0 = no trimming) is safe. If users enable this, they should understand the trade-off. Maybe add a warning in the docs about when this is appropriate.

What are your thoughts? @Re-bin

@95256155o
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review @chengyongru!

Good catch on DeepSeek's prefix caching — I'll add a note in the docs about that.

On the context loss concern: I've been running budget=1000 (aggressive) on my own instance for 2 days and haven't
noticed quality degradation. Most tool-loop iterations are fairly linear (read → process → act), so the current turn's
tool results carry enough context without needing old history. That said, I agree it's worth documenting the
trade-off.

I can add a docs section with recommended tiers:

  • 0 — no trimming (default, safest)
  • 4000 — conservative, barely trims in practice
  • 1000 — aggressive, significant savings, works well for typical tasks
  • Lower values are technically possible but offer diminishing returns since 4000 already covers most use cases

Happy to add a warning that users with complex multi-step debugging workflows should stick with 0 or 4000. Want me to
push that as a follow-up commit?

@95256155o
Copy link
Copy Markdown
Contributor Author

One more thought on the "complex multi-step tasks" concern:

In practice, when a task is complex enough that the LLM genuinely needs 20+ iterations of history to stay coherent, that's already outside nanobot's sweet spot — you'd want a tool with proper context management (like Claude Code) for that. Nanobot shines at short-to-medium tasks where recent context is what matters.

And for providers without implicit caching, keeping full history actually makes things worse — you're paying full price for every resent token with no speed benefit either. Trimming is arguably the more responsible default in that scenario.

So the trade-off is really: trim and save cost on the tasks nanobot is good at, vs. keep full history for tasks that probably shouldn't be running in nanobot anyway.

@chengyongru
Copy link
Copy Markdown
Collaborator

What I mean is, for example, when I tell the agent a fact, this fact may have been trimmed before consolidation, and originally it might have been recorded in history.md or memory.md.

@95256155o
Copy link
Copy Markdown
Contributor Author

Good point — but consolidation doesn't run inside the tool loop, so it always sees full history. Here's the flow:

sequenceDiagram
    participant S as Session (full history)
    participant C as Consolidation
    participant L as Agent Loop
    participant LLM as LLM

    C->>S: Read full messages (preflight)
    C->>C: Archive old messages → MEMORY.md / HISTORY.md

    L->>S: Read full messages
    L->>LLM: Iteration 1 (full context)
    LLM-->>L: tool call
    L->>LLM: Iteration 2+ (trimmed view)
    Note right of LLM: Only this view is trimmed.<br/>Session is never mutated.
    LLM-->>L: done

    C->>S: Read full messages (background)
    C->>C: Archive if needed → MEMORY.md / HISTORY.md
Loading

Trimming only affects the LLM view during iterations 2+. Consolidation runs before/after the loop and always reads the full canonical message list — no facts are lost.

@chengyongru
Copy link
Copy Markdown
Collaborator

One more edge case I'm concerned about:

Long file read in Round 1 + edit in Round 2:

Round 1:

  • User: "reade a.py" (10,00 lines)
  • Agent: reads file, returns content (very long)
    Round 2:
  • User: "edit line 100 "
  • If Round 1's tool_result was in old_history and got trimmed:
    • Agent may not remember the file content from Round 1
    • Edit operation failure rate could increase

The core question: Is the previous turn's tool_result considered "current turn" or "old history"?

If contextBudgetTokens is set too aggressively and Round 1's tool_result (the long file content) gets trimmed, the agent in Round 2 might:

Not have the file content context

Fail the edit operation

Ask user to re-provide the file

This is more severe than "forgetting preferences" — it directly impacts task success rate.

I think this might be a scenario I will encounter.

@Re-bin
Copy link
Copy Markdown
Collaborator

Re-bin commented Mar 21, 2026

Hi @95256155o thansk for your contribute!

I have a few concerns:

On the cost savings: The analysis confirms that trimming does save money — since nanobot only caches system and tools, not history messages, trimming doesn't hurt caching at all. So the cost benefit is real.

Btw, for providers with implicit prefix-based caching (like DeepSeek), cost savings might actually be negative — aggressive trimming can break prefix matching and reduce cache hit rate.

But there's a bigger problem: The core assumption — that "old history is redundant after iteration 1" — is flawed. For simple linear tool chains it might work, but for complex multi-step tasks, trimming history can cause:

  • LLM to lose track of original intent
  • Inconsistent responses across turns
  • Repetition of failed approaches

The risk depends on task complexity. A simple "read file → summarize" doesn't need history. But a debugging session with 20 tool calls? That's a different story.

Recommendation: The default (0 = no trimming) is safe. If users enable this, they should understand the trade-off. Maybe add a warning in the docs about when this is appropriate.

What are your thoughts? @Re-bin

I will check this soon, many thanks ;) @chengyongru

@95256155o
Copy link
Copy Markdown
Contributor Author

Good question — to answer directly: yes, Round 1's tool_result is "old history" and can be trimmed in Round 2. That's by design.

Here's why this is OK in practice:

  1. Nanobot agents have tool access. If the agent needs file content to perform an edit, it will re-read the file. It's not a human trying to remember what they saw — it's an LLM with tools. A missing context → re-read is a 1-tool-call cost, not a failure.

  2. The scenario itself is unusual for nanobot. Reading a 10,000-line file and then editing line 100 across separate rounds is a heavy IDE workflow — that's Claude Code / Cursor territory. Nanobot's strength is lightweight, fast, agent-dispatched tasks.

  3. Every context management system has this trade-off. Claude Code's own auto-compact drops old content too. The question isn't "can context be lost" — it always can — it's whether the system can recover gracefully. With tool access, it can.

That said, I'll add a docs note making it clear:

  • contextBudgetTokens trades old-history visibility for cost savings
  • For workflows that rely on large tool results from previous rounds, use 0 or a high budget like 4000
  • Agent tool access provides a natural recovery path when context is trimmed

@chengyongru
Copy link
Copy Markdown
Collaborator

hi @95256155o

Because the nightly branch has recently been updated, you may need to rebase onto the latest nightly branch.

I have decided to merge this PR in and let nanobot users on the nightly branch experience it, see what they say.

Thanks for your contribution!

@95256155o 95256155o force-pushed the feat/context-budget-tokens branch from 3869740 to b738cbf Compare March 22, 2026 17:23
@chengyongru
Copy link
Copy Markdown
Collaborator

And so on, maybe you need to fix your commit message to protect your contribution! It looks like the email isn't linked to your GitHub account.
image

@95256155o 95256155o force-pushed the feat/context-budget-tokens branch from b738cbf to b5db046 Compare March 22, 2026 17:40
@95256155o
Copy link
Copy Markdown
Contributor Author

Thanks for pointing that out @chengyongru! The commit email was misconfigured — should be properly linked to my GitHub account now. Will rebase onto the latest nightly shortly.

@95256155o 95256155o force-pushed the feat/context-budget-tokens branch 2 times, most recently from 56d000e to 016613d Compare March 22, 2026 17:46
@95256155o
Copy link
Copy Markdown
Contributor Author

Hi @chengyongru — rebased and cleaned up, here's a summary of what changed:

Housekeeping

  • Fixed commit author email (was 95256155o@github, now correctly linked to GitHub account)
  • Rebased onto latest upstream nightly (resolved conflicts in schema.py and loop.py — upstream had removed memory_window and refactored AgentLoop.__init__, both cleanly integrated)

Commit breakdown (4 commits on top of nightly)

  1. feat: add contextBudgetTokens config field — adds context_budget_tokens: int = 0 to AgentDefaults in schema.py
  2. feat: implement _trim_history_for_budget — core trimming logic in loop.py + tests in tests/test_context_budget.py
  3. feat: thread contextBudgetTokens into AgentLoop constructor — wires the config value through config/loader.pyAgentLoop.__init__
  4. feat: wire context budget trimming into agent loop — calls _trim_history_for_budget inside _run_agent_loop on iterations ≥ 2

What the feature does

contextBudgetTokens (default 0 = disabled) caps how many tokens of old session history are sent to the LLM during tool-loop iterations 2+. The current turn is never trimmed. Orphaned tool results at the trim boundary are automatically removed to avoid provider errors.

Ready for review — happy to make any further adjustments.

@chengyongru
Copy link
Copy Markdown
Collaborator

Ok, it's a bit late here, I will merge tomorrow. Then I will add some documentation based on our previous discussion.

Anyway, thank you very much for providing such valuable insights. I think this not only saves tokens but also reduces the first token latency!

@95256155o
Copy link
Copy Markdown
Contributor Author

Thanks! And that's exactly the core insight — shorter context means lower TTFT, which is what users actually feel. Token savings is just a bonus.

Looking forward to the merge and the docs. 🤝

95256155o and others added 2 commits March 23, 2026 17:23
- Extract trim_history_for_budget() as a pure function in helpers.py
- AgentLoop._trim_history_for_budget becomes a thin wrapper
- Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes
- Replace wrapper tests with direct helper unit tests
@chengyongru chengyongru force-pushed the feat/context-budget-tokens branch from 016613d to ed79916 Compare March 23, 2026 10:07
Copy link
Copy Markdown
Collaborator

@chengyongru chengyongru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chengyongru chengyongru merged commit 528b3cf into HKUDS:nightly Mar 23, 2026
3 checks passed
xzq-xu pushed a commit to xzq-xu/nanobot that referenced this pull request Mar 26, 2026
* feat: add contextBudgetTokens config field for tool-loop trimming

* feat: implement _trim_history_for_budget for tool-loop cost reduction

* feat: thread contextBudgetTokens into AgentLoop constructor

* feat: wire context budget trimming into agent loop

* refactor: move trim_history_for_budget to helpers and add docs

- Extract trim_history_for_budget() as a pure function in helpers.py
- AgentLoop._trim_history_for_budget becomes a thin wrapper
- Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes
- Replace wrapper tests with direct helper unit tests

---------

Co-authored-by: chengyongru <chengyongru.ai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request to-nightly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants