Skip to content

feat: per-template capability sandbox (#222)#279

Merged
qhkm merged 13 commits intomainfrom
feat/template-capability-sandbox
Mar 7, 2026
Merged

feat: per-template capability sandbox (#222)#279
qhkm merged 13 commits intomainfrom
feat/template-capability-sandbox

Conversation

@qhkm
Copy link
Copy Markdown
Owner

@qhkm qhkm commented Mar 7, 2026

Summary

Adds three new optional fields to AgentTemplate for declarative per-template sandboxing:

  • shell_allowlist — restrict which shell binaries the agent can execute (Strict mode). Applied to ShellTool, CustomTool, AND PluginTool via a shared ShellSecurityConfig built once at kernel boot.
  • max_token_budget — per-agent-run token cap. Resolved via min(template, global) so templates can restrict but never expand beyond global.
  • max_tool_calls — hard cap on total tool calls across the entire agent run via a dedicated ToolCallLimitTracker (AtomicU32). Intentionally NOT wired into LoopGuard.

Also adds TOML template loading.toml files in the template directory are now parsed alongside .json.

All fields are optional with None defaults — zero behavior change for existing templates.

Key design decisions

Decision Choice
Shell enforcement scope Shared config injected into ShellTool, CustomTool, AND PluginTool
max_tool_calls enforcement Dedicated atomic counter, NOT LoopGuard (which only counts repetitions)
Token budget naming max_token_budget (not max_tokens — avoids collision with response generation field)
Template can restrict, not expand min() for numeric limits

Files changed

  • src/config/templates.rs — 3 new fields + TOML loading + 7 tests
  • src/kernel/registrar.rsbuild_shell_config() + ToolDeps.template + 4 tests
  • src/kernel/mod.rs — pass template to ToolDeps
  • src/tools/custom.rswith_security() constructor + 1 test
  • src/tools/plugin.rs — security field + validate_command() before sh -c + 1 test
  • src/agent/budget.rsresolve_token_budget() + 4 tests
  • src/agent/tool_call_limit.rs — NEW: ToolCallLimitTracker + 5 tests
  • src/agent/mod.rs — module + re-export
  • src/agent/loop.rs — tracker field + enforcement in both streaming/non-streaming loops
  • src/config/types.rsmax_tool_calls on AgentDefaults
  • src/config/mod.rs — env var override
  • src/config/validate.rs — known field
  • src/cli/common.rs — template overrides for budget + tool calls

Test plan

  • 3055 lib tests pass (22 new tests added)
  • 127 doc tests pass
  • cargo clippy -- -D warnings clean
  • cargo fmt -- --check clean
  • Shell allowlist: template with ["git"] blocks curl; empty list blocks all; None uses default
  • Token budget: min(template, global) logic verified for all 4 branches
  • Tool call limit: tracker increments per-call, stops at limit, unlimited when None

Closes #222

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Per-run tool-call limits with enforcement, truncation, logging, and final-synthesis fallback.
    • Templates can specify max token budgets, max tool calls, and shell command allowlists.
    • Token-budget resolution combines global and template overrides.
    • Shell security configs applied to tools; commands validated against allowlists.
    • Templates loadable from JSON or TOML; env var to override default max tool calls.
  • Tests

    • Comprehensive unit tests for budgeting, tool-call limits, template parsing, and security behavior.

qhkm and others added 5 commits March 7, 2026 14:20
…s fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mTool, PluginTool

Build ShellSecurityConfig from template shell_allowlist and inject it
into all 3 shell execution paths. When a template defines an allowlist,
Strict mode ensures only listed binaries can execute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add resolve_token_budget() helper that picks the effective budget from
global config and optional template override (lower-wins when both set).
Wire it into the template override block in create_agent so templates
can cap per-session token usage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ToolCallLimitTracker (AtomicU32) that enforces per-template
max_tool_calls limits. Wired into AgentLoop with enforcement in both
streaming and non-streaming tool execution loops. The tracker is
initialized from config.agents.defaults.max_tool_calls, which is set
from the template's max_tool_calls field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Threads per-template sandbox data through boot/registration and runtime: adds shell allowlists, per-template token-budget resolution, and a per-run tool-call limiter; registers tools with security configs and enforces tool-call caps and shell command validation at execution.

Changes

Cohort / File(s) Summary
Token Budget
src/agent/budget.rs
Add pub fn resolve_token_budget(global: u64, template: Option<u64>) -> u64 and unit tests to compute effective token budget between global and template overrides.
Tool Call Limiter
src/agent/tool_call_limit.rs, src/agent/mod.rs, src/agent/loop.rs
Add ToolCallLimitTracker (atomic counter + optional limit); re-export in agent mod; wire into AgentLoop constructors and tool-execution paths with pre-checks, truncation, increments, logging, and metrics.
Template Extensions & Loading
src/config/templates.rs
Extend AgentTemplate with shell_allowlist, max_token_budget, max_tool_calls; add TOML support for templates; update built-in templates and test coverage.
Config Defaults & CLI Overrides
src/config/types.rs, src/config/mod.rs, src/config/validate.rs, src/cli/common.rs
Add max_tool_calls: Option<u32> to agent defaults (serde/default/Default); env var override ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS; validation acceptance; CLI wiring to apply template token and tool-call overrides.
Kernel & Registrar Wiring
src/kernel/mod.rs, src/kernel/registrar.rs
Propagate AgentTemplate via ToolDeps.template; add build_shell_config(template: Option<&AgentTemplate>) -> ShellSecurityConfig and reuse shell security config for tool registration.
Tool Security Integration
src/tools/plugin.rs, src/tools/custom.rs
Add with_security constructors and ShellSecurityConfig field; validate commands via security.validate_command() before execution; update tests.
Tests & Docs
src/config/..., src/kernel/..., src/tools/..., src/agent/...
New/updated tests for TOML loading, shell allowlists, token budget resolution, tool-call limits, and security enforcement; comments/docs updated to reflect TOML and new fields.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / Template loader
    participant Config as Config Loader
    participant Kernel as ZeptoKernel::boot
    participant Registrar as registrar::register_all_tools
    participant Tools as Tool Factory (Shell/Plugin/Custom)
    participant AgentLoop as AgentLoop runtime
    participant ToolLimit as ToolCallLimitTracker

    CLI->>Config: load template (JSON/TOML) with shell_allowlist, max_token_budget, max_tool_calls
    Config->>Kernel: provide AgentTemplate + defaults
    Kernel->>Registrar: boot(pass ToolDeps { template: AgentTemplate })
    Registrar->>Registrar: build_shell_config(template)
    Registrar->>Tools: register tools with ShellSecurityConfig
    Tools->>Tools: store security config per tool
    AgentLoop->>ToolLimit: remaining()
    alt remaining >= batch
        AgentLoop->>Tools: execute tool batch
        Tools->>Tools: security.validate_command(command)
        Tools-->>AgentLoop: results
        AgentLoop->>ToolLimit: increment(batch)
    else remaining < batch
        AgentLoop->>AgentLoop: truncate/skip tool calls, log, run final synthesis
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through templates, tidy and neat,
Shell rules snug, token caps light on my feet.
I counted tool calls, careful and bright,
Truncated the batch, then logged through the night.
Safe hops, small bounds — a rabbit’s delight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: per-template capability sandbox' accurately and concisely describes the main change—adding optional template fields for capability sandboxing, including shell allowlists, token budgets, and tool call limits.
Linked Issues check ✅ Passed The PR successfully implements all primary coding objectives from issue #222: shell command allowlist enforcement via ShellSecurityConfig, per-template token budget resolution, tool call limiting via ToolCallLimitTracker, TOML template loading, and integration into kernel registration and enforcement points.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #222 requirements: template field additions (shell_allowlist, max_token_budget, max_tool_calls), enforcement mechanism implementation, TOML parsing support, and kernel/tool integration. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/template-capability-sandbox

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/budget.rs`:
- Around line 172-180: The resolve_token_budget function treats 0 as a valid
template cap and thus allows Some(0) to disable a finite global cap; change the
branching in resolve_token_budget so that 0 always means “unlimited” (i.e., if
global == 0 return tpl, else if tpl == 0 return global, otherwise return
global.min(tpl)), and add a regression test (e.g.,
test_resolve_token_budget_template_unlimited_does_not_bypass_global) asserting
resolve_token_budget(100_000, Some(0)) == 100_000 to prevent regressions.

In `@src/agent/loop.rs`:
- Around line 397-398: The ToolCallLimitTracker is stored on AgentLoop and never
reset, causing one request to exhaust tool-call budget for subsequent runs;
remove the struct field usage and instead create a fresh local tracker at the
start of each run by instantiating
ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls) inside
process_message() and process_message_streaming(), replace references to the old
self.tool_call_limit with the new local variable, and remove the tool_call_limit
field from AgentLoop so each message run has its own per-agent-run tracker.
- Around line 1011-1021: The current logic increments self.tool_call_limit via
self.tool_call_limit.increment(...) before executing tool calls and then breaks
if self.tool_call_limit.is_exceeded(), which makes max_tool_calls behave like
N-1 and can leave response.content empty; fix by checking the limit against the
upcoming call count before incrementing (or use a would_exceed-like check) and
only call self.tool_call_limit.increment(...) after allowing/executing the tool
calls; additionally, when you do break because the limit would be exceeded,
ensure you set or preserve response.content (e.g., set a clear “tool call limit
reached” message or retain prior assistant content) so the final assistant reply
is not empty.

In `@src/config/mod.rs`:
- Around line 200-204: The current env var parsing for
ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS only accepts numeric values and ignores
empty strings, so you cannot clear an existing Option<u32>; update the logic
around reading std::env::var so that if the variable is present and
val.trim().is_empty() you set self.agents.defaults.max_tool_calls = None,
otherwise attempt val.parse::<u32>() and set Some(v) on success (and keep
current behavior for parse failures or surface/log an error as appropriate).
Target the block that reads the env var and assigns
self.agents.defaults.max_tool_calls to implement this empty-string -> None
behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1a44f85-a4b7-42a0-a7e3-c2ba9033c509

📥 Commits

Reviewing files that changed from the base of the PR and between d8d9cb2 and 7a3bdbc.

📒 Files selected for processing (13)
  • src/agent/budget.rs
  • src/agent/loop.rs
  • src/agent/mod.rs
  • src/agent/tool_call_limit.rs
  • src/cli/common.rs
  • src/config/mod.rs
  • src/config/templates.rs
  • src/config/types.rs
  • src/config/validate.rs
  • src/kernel/mod.rs
  • src/kernel/registrar.rs
  • src/tools/custom.rs
  • src/tools/plugin.rs

Comment thread src/agent/budget.rs
Comment thread src/agent/loop.rs
Comment on lines +397 to +398
/// Per-agent-run tool call limit tracker.
tool_call_limit: ToolCallLimitTracker,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Create a fresh tool-call tracker for each message run.

The field comment says “per-agent-run”, but this tracker is initialized once on the long-lived AgentLoop and never reset. One request can exhaust the budget for every later request handled by the same agent instance.

💡 Fix direction
 pub struct AgentLoop {
     /// Per-session token budget tracker.
     token_budget: Arc<TokenBudget>,
-    /// Per-agent-run tool call limit tracker.
-    tool_call_limit: ToolCallLimitTracker,
     /// Tool approval gate for policy-based tool gating.
     approval_gate: Arc<ApprovalGate>,
let tool_call_limit = ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls);

Instantiate that local tracker at the start of process_message() and process_message_streaming() instead of storing it on the struct.

Also applies to: 477-479, 514-516, 546-547, 582-583

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 397 - 398, The ToolCallLimitTracker is stored
on AgentLoop and never reset, causing one request to exhaust tool-call budget
for subsequent runs; remove the struct field usage and instead create a fresh
local tracker at the start of each run by instantiating
ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls) inside
process_message() and process_message_streaming(), replace references to the old
self.tool_call_limit with the new local variable, and remove the tool_call_limit
field from AgentLoop so each message run has its own per-agent-run tracker.

Comment thread src/agent/loop.rs Outdated
Comment thread src/config/mod.rs
Comment on lines +200 to +204
if let Ok(val) = std::env::var("ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS") {
if let Ok(v) = val.parse::<u32>() {
self.agents.defaults.max_tool_calls = Some(v);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Allow the env override to clear max_tool_calls as well.

This only handles numeric values, so a deployment cannot use ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS to remove a cap that is already set in config.json. For an Option<u32>, an empty value should clear the override instead of being ignored.

Suggested fix
         if let Ok(val) = std::env::var("ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS") {
-            if let Ok(v) = val.parse::<u32>() {
+            let val = val.trim();
+            if val.is_empty() {
+                self.agents.defaults.max_tool_calls = None;
+            } else if let Ok(v) = val.parse::<u32>() {
                 self.agents.defaults.max_tool_calls = Some(v);
             }
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if let Ok(val) = std::env::var("ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS") {
if let Ok(v) = val.parse::<u32>() {
self.agents.defaults.max_tool_calls = Some(v);
}
}
if let Ok(val) = std::env::var("ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS") {
let val = val.trim();
if val.is_empty() {
self.agents.defaults.max_tool_calls = None;
} else if let Ok(v) = val.parse::<u32>() {
self.agents.defaults.max_tool_calls = Some(v);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/mod.rs` around lines 200 - 204, The current env var parsing for
ZEPTOCLAW_AGENTS_DEFAULTS_MAX_TOOL_CALLS only accepts numeric values and ignores
empty strings, so you cannot clear an existing Option<u32>; update the logic
around reading std::env::var so that if the variable is present and
val.trim().is_empty() you set self.agents.defaults.max_tool_calls = None,
otherwise attempt val.parse::<u32>() and set Some(v) on success (and keep
current behavior for parse failures or surface/log an error as appropriate).
Target the block that reads the env var and assigns
self.agents.defaults.max_tool_calls to implement this empty-string -> None
behavior.

…nd loop

- resolve_token_budget: treat Some(0) as None so template cannot
  disable a finite global budget (0 = unlimited in TokenBudget)
- agent loop: move tool_call_limit increment+check after execution
  in both streaming and non-streaming paths (fixes off-by-one that
  blocked the first allowed call)
- common.rs: intersect template max_tool_calls with global via min()
  instead of direct override

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/agent/loop.rs (2)

397-398: ⚠️ Potential issue | 🔴 Critical

Instantiate the tool-call tracker per request, not on AgentLoop.

This tracker is stateful, but these lines make it live for the entire agent instance. One run can exhaust the budget for later runs, including other sessions processed by the same loop.

Possible fix direction
 pub struct AgentLoop {
     /// Per-session token budget tracker.
     token_budget: Arc<TokenBudget>,
-    /// Per-agent-run tool call limit tracker.
-    tool_call_limit: ToolCallLimitTracker,
     /// Tool approval gate for policy-based tool gating.
     approval_gate: Arc<ApprovalGate>,
-        let tool_call_limit = ToolCallLimitTracker::new(config.agents.defaults.max_tool_calls);
         let approval_gate = Arc::new(ApprovalGate::new(config.approval.clone()));
...
-            tool_call_limit,
             approval_gate,
let tool_call_limit = ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls);

Create that local tracker at the start of process_message() and process_message_streaming(), then use the local instance throughout each run.

Also applies to: 478-479, 515-515, 546-547, 583-583

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 397 - 398, The ToolCallLimitTracker is
currently stored on AgentLoop as the field tool_call_limit, causing its state to
persist across runs; instead, remove that field and instantiate a fresh local
tracker at the start of each run (e.g., inside process_message() and
process_message_streaming()) using
ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls), then pass
that local tracker through the call chain for the duration of the single request
so each run gets an independent budget.

1298-1308: ⚠️ Potential issue | 🟠 Major

Don’t end the run with the tool-call turn as the final assistant reply.

When this branch trips, the loop exits before a post-tool LLM turn. The returned response.content is still from the tool-call response, which is often empty when tool_calls are present, so exact-limit runs can finish with a blank or stale answer.

Also applies to: 1785-1795

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 1298 - 1308, The current branch increments
tool_call_limit and breaks immediately when exceeded, causing the loop to return
the last tool-call response (often empty) instead of performing the required
post-tool LLM turn; change the behavior in the tool_call_limit handling (the
block that touches self.tool_call_limit, response.tool_calls, and does the info!
log) to not break immediately but instead set a flag (e.g., stop_after_llm_turn
or pending_post_tool_llm) or adjust the loop control so the loop completes one
more iteration to run the post-tool LLM turn and produce an assistant response
derived from the LLM rather than the tool-call response; apply the same fix to
the analogous branch around the other occurrence noted (the block at 1785-1795).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/agent/loop.rs`:
- Around line 397-398: The ToolCallLimitTracker is currently stored on AgentLoop
as the field tool_call_limit, causing its state to persist across runs; instead,
remove that field and instantiate a fresh local tracker at the start of each run
(e.g., inside process_message() and process_message_streaming()) using
ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls), then pass
that local tracker through the call chain for the duration of the single request
so each run gets an independent budget.
- Around line 1298-1308: The current branch increments tool_call_limit and
breaks immediately when exceeded, causing the loop to return the last tool-call
response (often empty) instead of performing the required post-tool LLM turn;
change the behavior in the tool_call_limit handling (the block that touches
self.tool_call_limit, response.tool_calls, and does the info! log) to not break
immediately but instead set a flag (e.g., stop_after_llm_turn or
pending_post_tool_llm) or adjust the loop control so the loop completes one more
iteration to run the post-tool LLM turn and produce an assistant response
derived from the LLM rather than the tool-call response; apply the same fix to
the analogous branch around the other occurrence noted (the block at 1785-1795).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a59dc65-836d-4832-9a27-af9be6b1286a

📥 Commits

Reviewing files that changed from the base of the PR and between 7a3bdbc and b25c70e.

📒 Files selected for processing (3)
  • src/agent/budget.rs
  • src/agent/loop.rs
  • src/cli/common.rs

qhkm and others added 2 commits March 7, 2026 16:35
The limit check was only performed after the entire batch of tool calls
had already executed, allowing max_tool_calls=0 to still run one batch
and any budget smaller than the batch size to be overshot.

Now:
- Check is_exceeded() BEFORE building futures — breaks immediately
  if limit already reached (handles max_tool_calls=0 correctly)
- Truncate response.tool_calls to remaining() budget before building
  futures, so a batch of 10 with 3 remaining only executes 3
- Truncation happens before compute_tool_result_budget so the per-tool
  budget is computed from the actual execution count
- Add ToolCallLimitTracker::remaining() method with tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The limit check and batch truncation happened after the assistant
tool-call message was already written to the session transcript.
This caused:
- max_tool_calls=0 to leave an orphaned tool-call message with no
  matching results
- Partial truncation to record tool IDs that were never executed,
  producing an inconsistent transcript for the next LLM call
- metrics.record_tool_calls() to over-report the original batch size

Now in both paths (non-streaming and streaming):
1. is_exceeded() check + break — before anything is written
2. truncate() — before anything is written
3. record_tool_calls() — after truncation, reflects actual count
4. assistant message — built from the truncated tool_calls list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/agent/loop.rs (1)

397-398: ⚠️ Potential issue | 🔴 Critical

Per-run tracker must be instantiated per run, not stored on the struct.

The comment says "per-agent-run" but tool_call_limit is initialized once when AgentLoop is constructed and never reset. This causes one request to exhaust the tool-call budget for all subsequent requests handled by the same agent instance.

💡 Recommended fix

Remove the field from AgentLoop and instantiate a fresh tracker at the start of each process_message() and process_message_streaming():

 pub struct AgentLoop {
     // ...
-    /// Per-agent-run tool call limit tracker.
-    tool_call_limit: ToolCallLimitTracker,
     // ...
 }

Then in process_message() and process_message_streaming():

// At the start of the method, after acquiring session lock:
let tool_call_limit = ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls);

Replace all self.tool_call_limit references with the local tool_call_limit variable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 397 - 398, The tool-call tracker is currently
stored on the AgentLoop struct as tool_call_limit but should be recreated per
request; remove the tool_call_limit field from the AgentLoop struct and any
initialization in its constructor, then at the start of each request-handling
method (process_message and process_message_streaming) instantiate a local
ToolCallLimitTracker (e.g. via
ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls) after
acquiring the session lock) and replace all uses of self.tool_call_limit with
the new local tool_call_limit variable so each run gets a fresh per-run tracker.
🧹 Nitpick comments (1)
src/agent/loop.rs (1)

1321-1331: Post-execution increment is correct; consider setting response content on limit break.

The increment-after-execute pattern correctly avoids the off-by-one issue. However, when breaking due to limit, response.content may be empty (LLM responses with tool calls often have minimal content), potentially leaving the user without a meaningful final message.

Compare to the loop guard breaks at lines 1335-1345 which explicitly set response.content before breaking.

💡 Optional: Set informative content on limit break
 if self.tool_call_limit.is_exceeded() {
     info!(
         count = self.tool_call_limit.count(),
         limit = ?self.tool_call_limit.limit(),
         "Tool call limit reached, stopping tool execution"
     );
+    if response.content.is_empty() {
+        response.content = "Tool call limit reached.".to_string();
+    }
     break;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 1321 - 1331, The tool call limit is
incremented after executing tools (self.tool_call_limit.increment(...)) which is
correct, but when is_exceeded() triggers the loop break the response may lack
user-facing content; update the break path to set a meaningful response.content
(similar to the other loop guard breaks) before breaking—e.g., assign a concise
informative message to response.content indicating the tool call limit was
reached and include debugging info (self.tool_call_limit.count() /
self.tool_call_limit.limit()) so callers of the function receive a final message
when the loop exits due to tool_call_limit.is_exceeded().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/agent/loop.rs`:
- Around line 397-398: The tool-call tracker is currently stored on the
AgentLoop struct as tool_call_limit but should be recreated per request; remove
the tool_call_limit field from the AgentLoop struct and any initialization in
its constructor, then at the start of each request-handling method
(process_message and process_message_streaming) instantiate a local
ToolCallLimitTracker (e.g. via
ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls) after
acquiring the session lock) and replace all uses of self.tool_call_limit with
the new local tool_call_limit variable so each run gets a fresh per-run tracker.

---

Nitpick comments:
In `@src/agent/loop.rs`:
- Around line 1321-1331: The tool call limit is incremented after executing
tools (self.tool_call_limit.increment(...)) which is correct, but when
is_exceeded() triggers the loop break the response may lack user-facing content;
update the break path to set a meaningful response.content (similar to the other
loop guard breaks) before breaking—e.g., assign a concise informative message to
response.content indicating the tool call limit was reached and include
debugging info (self.tool_call_limit.count() / self.tool_call_limit.limit()) so
callers of the function receive a final message when the loop exits due to
tool_call_limit.is_exceeded().

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 48c26f3d-b43f-4aa7-9341-dd86f28766e5

📥 Commits

Reviewing files that changed from the base of the PR and between b25c70e and c4aa4dd.

📒 Files selected for processing (2)
  • src/agent/loop.rs
  • src/agent/tool_call_limit.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/agent/tool_call_limit.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/agent/loop.rs (2)

1012-1018: ⚠️ Potential issue | 🟠 Major

Return a fallback assistant message when the cap aborts the tool loop.

If the limit is already exhausted here, this breaks without updating response.content. On tool-call turns that content is often empty, so the caller can get a blank final assistant reply. Line 1565 in the streaming path has the same gap.

Suggested fix
-            if self.tool_call_limit.is_exceeded() {
+            if self.tool_call_limit.is_exceeded() {
+                if let Some(limit) = self.tool_call_limit.limit() {
+                    response.content = format!(
+                        "Stopped tool execution after reaching the max_tool_calls limit ({}).",
+                        limit
+                    );
+                }
                 info!(
                     count = self.tool_call_limit.count(),
                     limit = ?self.tool_call_limit.limit(),
                     "Tool call limit already reached, skipping tool execution"
                 );
                 break;
             }

Apply the same change to the streaming branch before its break.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 1012 - 1018, When the tool-call cap causes
the loop to abort (the branch that checks self.tool_call_limit.is_exceeded()),
ensure you set a fallback assistant message into response.content before
breaking so callers do not receive an empty final assistant reply; locate the
non-streaming tool loop where the code currently logs and breaks and assign a
suitable assistant content string (e.g., a short "Tool call limit reached,
aborting" assistant message) to response.content, and mirror the exact same
assignment in the streaming branch (the analogous branch around the streaming
path) before its break so both paths return the fallback content.

397-398: ⚠️ Potential issue | 🔴 Critical

Reset the tool-call tracker per request, not per AgentLoop.

This counter now lives for the entire agent instance, so one request can exhaust max_tool_calls for every later request handled by the same AgentLoop. That contradicts the "per-agent-run" contract.

Suggested fix
 pub struct AgentLoop {
     /// Per-session token budget tracker.
     token_budget: Arc<TokenBudget>,
-    /// Per-agent-run tool call limit tracker.
-    tool_call_limit: ToolCallLimitTracker,
     /// Tool approval gate for policy-based tool gating.
     approval_gate: Arc<ApprovalGate>,
 pub async fn process_message(&self, msg: &InboundMessage) -> Result<String> {
+    let tool_call_limit = ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls);
     // Acquire a per-session lock to serialize concurrent messages for the
     // same session key. Different sessions can still proceed concurrently.
 pub async fn process_message_streaming(
     &self,
     msg: &InboundMessage,
 ) -> Result<tokio::sync::mpsc::Receiver<crate::providers::StreamEvent>> {
+    let tool_call_limit = ToolCallLimitTracker::new(self.config.agents.defaults.max_tool_calls);
     use crate::providers::StreamEvent;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/loop.rs` around lines 397 - 398, The ToolCallLimitTracker currently
stored as the AgentLoop struct field (tool_call_limit: ToolCallLimitTracker)
causes the counter to persist across multiple requests; change the lifecycle so
the tracker is created or reset at the start of each agent run/request instead
of living on AgentLoop. Specifically, remove or stop using the persistent
tool_call_limit field for per-request counting and instead instantiate a fresh
ToolCallLimitTracker (or call its reset method) inside the entry point that
handles a single run/request (e.g., AgentLoop::run or the request-handling
method), and wire that per-run tracker into any functions that reference
tool_call_limit so each request starts with a fresh counter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/agent/loop.rs`:
- Around line 1012-1018: When the tool-call cap causes the loop to abort (the
branch that checks self.tool_call_limit.is_exceeded()), ensure you set a
fallback assistant message into response.content before breaking so callers do
not receive an empty final assistant reply; locate the non-streaming tool loop
where the code currently logs and breaks and assign a suitable assistant content
string (e.g., a short "Tool call limit reached, aborting" assistant message) to
response.content, and mirror the exact same assignment in the streaming branch
(the analogous branch around the streaming path) before its break so both paths
return the fallback content.
- Around line 397-398: The ToolCallLimitTracker currently stored as the
AgentLoop struct field (tool_call_limit: ToolCallLimitTracker) causes the
counter to persist across multiple requests; change the lifecycle so the tracker
is created or reset at the start of each agent run/request instead of living on
AgentLoop. Specifically, remove or stop using the persistent tool_call_limit
field for per-request counting and instead instantiate a fresh
ToolCallLimitTracker (or call its reset method) inside the entry point that
handles a single run/request (e.g., AgentLoop::run or the request-handling
method), and wire that per-run tracker into any functions that reference
tool_call_limit so each request starts with a fresh counter.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e8de97e-8224-411d-bc1f-e6a8bc594812

📥 Commits

Reviewing files that changed from the base of the PR and between c4aa4dd and f45ef38.

📒 Files selected for processing (1)
  • src/agent/loop.rs

qhkm and others added 5 commits March 7, 2026 16:44
…atch

When the tool call limit was reached after executing the last allowed
batch, the loop broke immediately and returned the stale tool-call stub
content from the previous LLM response. The executed tool results were
in the session but the user got the assistant's earlier text instead of
a final answer derived from those results.

Non-streaming path: now makes one final LLM call with empty tool
definitions (vec![]) so the model is forced to produce a text synthesis
of the tool results before returning.

Streaming path: clears response.tool_calls before breaking so the
post-loop code enters the streaming final call branch, which re-issues
the full conversation (with tool results in session) as a proper
streamed response.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oken budget

Two issues in the tool-call-limit post-batch path:

1. Streaming: the post-loop final streaming branch rebuilt full tool
   definitions and passed them to chat_stream, allowing the model to
   emit ToolCalls after the limit was supposedly enforced. Now tracks
   tool_limit_hit flag and passes vec![] when the cap was reached.

2. Non-streaming: the synthesis LLM call bypassed the token budget
   gate that guards regular loop iterations. Now checks
   token_budget.is_exceeded() before issuing the synthesis call;
   if over budget, returns a descriptive message instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add MockBatchToolProvider and 4 e2e tests for per-template
max_tool_calls enforcement:
- zero budget blocks all tool execution
- exact budget allows full batch then synthesizes
- over-budget batch gets truncated
- unlimited (None) allows normal flow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two review fixes:

1. (High) ToolCallLimitTracker persisted across process_message() calls,
   so once any conversation exhausted max_tool_calls the agent was
   permanently blocked. Add reset() and call it at the top of both
   process_message and process_message_streaming.

2. (Medium) GitTool shelled out via Command::new("git") bypassing
   ShellSecurityConfig, so templates with shell_allowlist=[] could still
   run git commands. Thread ShellSecurityConfig into GitTool and validate
   before execution.

Tests: ToolCallLimitTracker::reset unit test, 3 git allowlist unit tests,
       e2e regression test for counter reset between runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TokenBudget persisted across process_message() calls on the same
AgentLoop, so once any conversation exhausted max_token_budget, later
runs on the same agent were permanently blocked.

Add token_budget.reset() alongside tool_call_limit.reset() at the top
of both process_message and process_message_streaming.

Regression test: test_token_budget_resets_between_runs verifies the
second run on the same agent succeeds after the first exhausts the budget.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@qhkm qhkm merged commit 5244281 into main Mar 7, 2026
9 checks passed
@qhkm qhkm deleted the feat/template-capability-sandbox branch March 7, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: per-template capability sandbox (tools, shell patterns, resource limits)

1 participant