feat: configurable context budget for tool-loop iterations#2317
Conversation
|
Hi @95256155o I'll need to verify this PR in detail |
|
Hi @95256155o thansk for your contribute! I have a few concerns: On the cost savings: The analysis confirms that trimming does save money — since nanobot only caches Btw, for providers with implicit prefix-based caching (like DeepSeek), cost savings might actually be negative — aggressive trimming can break prefix matching and reduce cache hit rate. But there's a bigger problem: The core assumption — that "old history is redundant after iteration 1" — is flawed. For simple linear tool chains it might work, but for complex multi-step tasks, trimming history can cause:
The risk depends on task complexity. A simple "read file → summarize" doesn't need history. But a debugging session with 20 tool calls? That's a different story. Recommendation: The default (0 = no trimming) is safe. If users enable this, they should understand the trade-off. Maybe add a warning in the docs about when this is appropriate. What are your thoughts? @Re-bin |
|
Thanks for the detailed review @chengyongru! Good catch on DeepSeek's prefix caching — I'll add a note in the docs about that. On the context loss concern: I've been running budget=1000 (aggressive) on my own instance for 2 days and haven't I can add a docs section with recommended tiers:
Happy to add a warning that users with complex multi-step debugging workflows should stick with 0 or 4000. Want me to |
|
One more thought on the "complex multi-step tasks" concern: In practice, when a task is complex enough that the LLM genuinely needs 20+ iterations of history to stay coherent, that's already outside nanobot's sweet spot — you'd want a tool with proper context management (like Claude Code) for that. Nanobot shines at short-to-medium tasks where recent context is what matters. And for providers without implicit caching, keeping full history actually makes things worse — you're paying full price for every resent token with no speed benefit either. Trimming is arguably the more responsible default in that scenario. So the trade-off is really: trim and save cost on the tasks nanobot is good at, vs. keep full history for tasks that probably shouldn't be running in nanobot anyway. |
|
What I mean is, for example, when I tell the agent a fact, this fact may have been trimmed before consolidation, and originally it might have been recorded in history.md or memory.md. |
|
Good point — but consolidation doesn't run inside the tool loop, so it always sees full history. Here's the flow: sequenceDiagram
participant S as Session (full history)
participant C as Consolidation
participant L as Agent Loop
participant LLM as LLM
C->>S: Read full messages (preflight)
C->>C: Archive old messages → MEMORY.md / HISTORY.md
L->>S: Read full messages
L->>LLM: Iteration 1 (full context)
LLM-->>L: tool call
L->>LLM: Iteration 2+ (trimmed view)
Note right of LLM: Only this view is trimmed.<br/>Session is never mutated.
LLM-->>L: done
C->>S: Read full messages (background)
C->>C: Archive if needed → MEMORY.md / HISTORY.md
Trimming only affects the LLM view during iterations 2+. Consolidation runs before/after the loop and always reads the full canonical message list — no facts are lost. |
|
One more edge case I'm concerned about: Long file read in Round 1 + edit in Round 2: Round 1:
The core question: Is the previous turn's tool_result considered "current turn" or "old history"? If Not have the file content context Fail the edit operation Ask user to re-provide the file This is more severe than "forgetting preferences" — it directly impacts task success rate. I think this might be a scenario I will encounter. |
I will check this soon, many thanks ;) @chengyongru |
|
Good question — to answer directly: yes, Round 1's tool_result is "old history" and can be trimmed in Round 2. That's by design. Here's why this is OK in practice:
That said, I'll add a docs note making it clear:
|
|
hi @95256155o Because the nightly branch has recently been updated, you may need to rebase onto the latest nightly branch. I have decided to merge this PR in and let nanobot users on the nightly branch experience it, see what they say. Thanks for your contribution! |
3869740 to
b738cbf
Compare
b738cbf to
b5db046
Compare
|
Thanks for pointing that out @chengyongru! The commit email was misconfigured — should be properly linked to my GitHub account now. Will rebase onto the latest nightly shortly. |
56d000e to
016613d
Compare
|
Hi @chengyongru — rebased and cleaned up, here's a summary of what changed: Housekeeping
Commit breakdown (4 commits on top of nightly)
What the feature does
Ready for review — happy to make any further adjustments. |
|
Ok, it's a bit late here, I will merge tomorrow. Then I will add some documentation based on our previous discussion. Anyway, thank you very much for providing such valuable insights. I think this not only saves tokens but also reduces the first token latency! |
|
Thanks! And that's exactly the core insight — shorter context means lower TTFT, which is what users actually feel. Token savings is just a bonus. Looking forward to the merge and the docs. 🤝 |
- Extract trim_history_for_budget() as a pure function in helpers.py - AgentLoop._trim_history_for_budget becomes a thin wrapper - Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes - Replace wrapper tests with direct helper unit tests
016613d to
ed79916
Compare
* feat: add contextBudgetTokens config field for tool-loop trimming * feat: implement _trim_history_for_budget for tool-loop cost reduction * feat: thread contextBudgetTokens into AgentLoop constructor * feat: wire context budget trimming into agent loop * refactor: move trim_history_for_budget to helpers and add docs - Extract trim_history_for_budget() as a pure function in helpers.py - AgentLoop._trim_history_for_budget becomes a thin wrapper - Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes - Replace wrapper tests with direct helper unit tests --------- Co-authored-by: chengyongru <chengyongru.ai@gmail.com>

Problem
In
agent/loop.py,_run_agent_loopsends the full message list to the LLM on every tool-call iteration. For sessions with accumulated history, the redundant tokens scale linearly with both history size and iteration count.Real-world example (20 tool iterations in 2 minutes):
The LLM needs full history on the first call to understand context. Subsequent tool iterations reference their own tool chain, not old history.
Solution
Add a configurable
contextBudgetTokensoption inagents.defaults. When set, tool-loop iterations (after the first) trim old session history to fit within the token budget. Current-turn messages are never trimmed.{ "agents": { "defaults": { "contextBudgetTokens": 4000 } } }0(default): no trimming — current behavior, nothing breaks> 0: max tokens of old session history during tool iterations 2+How it works
[system] + [old_history] + [current_turn], trim oldest messages fromold_historyuntil within budget, fix orphaned tool results via_find_legal_start, send trimmed view to LLMSafety
_find_legal_startprevents orphaned tool results after trimmingReal-world results (deployed 2 days on personal instance)
With
contextBudgetTokens: 1000(aggressive):Typical multi-tool session with budget=4000:
Log output confirms trimming is active:
Files changed
nanobot/config/schema.pycontext_budget_tokens: int = 0toAgentDefaultsnanobot/cli/commands.pyAgentLoopconstructornanobot/agent/loop.py_trim_history_for_budget(), wire into_run_agent_loop, addcontext_budget_tokensto__init__tests/test_context_budget.pyTest plan
contextBudgetTokens: 1000What this does NOT do
contextBudgetTokens = 0(default)