Skip to content

refactor(kernel): extract thin kernel from create_agent_with_template()#224

Merged
qhkm merged 2 commits into
mainfrom
refactor/thin-kernel
Mar 3, 2026
Merged

refactor(kernel): extract thin kernel from create_agent_with_template()#224
qhkm merged 2 commits into
mainfrom
refactor/thin-kernel

Conversation

@qhkm
Copy link
Copy Markdown
Owner

@qhkm qhkm commented Mar 3, 2026

Summary

  • Extract ~700 lines of subsystem assembly from create_agent_with_template() into ZeptoKernel::boot() — a single entry point that assembles provider chain, tools, safety, metrics, hooks, memory, runtime, and cron
  • Add src/kernel/ module with provider.rs (provider chain assembly), registrar.rs (30+ tool registrations with ToolFilter), gate.rs (tool execution entry point), and mod.rs (ZeptoKernel struct + boot)
  • Add ToolRegistry::merge() for bulk kernel→agent tool transfer and AgentLoop::merge_kernel_tools()/set_provider_arc() for kernel integration
  • Net effect: cli/common.rs shrinks from ~1300 lines to ~65 lines of diff; new features (feat: MCP server mode — expose tools to Claude Desktop, VS Code, Cursor #217-feat: per-template capability sandbox (tools, shell patterns, resource limits) #222) now wire into kernel/registrar.rs instead of the monolithic function

Closes #223

Test plan

  • All 2817 lib tests pass
  • All 92 bin tests pass
  • cargo clippy -- -D warnings clean
  • cargo fmt -- --check clean
  • CI passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a centralized kernel that boots shared subsystems (providers, tools, safety, metrics, memory) and exposes unified tool execution with safety checks and metrics.
    • Bulk merge of pre-assembled tools and MCP clients into agents for simpler initialization.
  • Improvements

    • Provider chain assembly moved to kernel for clearer runtime/provider behavior and retry/quota handling.
    • Tool registration and filtering centralized for consistent enablement and lifecycle.
    • Made home-path expansion function publicly accessible.

Replace ~700 lines of subsystem assembly in create_agent_with_template()
with ZeptoKernel::boot() — a single entry point that assembles provider
chain, tools, safety, metrics, hooks, memory, runtime, and cron.

- Add src/kernel/ module (mod.rs, provider.rs, registrar.rs, gate.rs)
- Extract provider chain assembly to kernel/provider.rs
- Extract 30+ tool registrations to kernel/registrar.rs with ToolFilter
- Add ToolRegistry::merge() for bulk kernel→agent tool transfer
- Add AgentLoop::merge_kernel_tools() and set_provider_arc()
- Make config::expand_home() public for registrar use
- Reduce cli/common.rs from ~1300 to ~65 lines of changes

Closes #223

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

Extracts a new ZeptoKernel module that centralizes provider chain assembly, tool registration, gated tool execution, and subsystem orchestration; refactors CLI agent boot to use the kernel, adds AgentLoop helpers for provider/tool merges, exposes a ToolRegistry merge, and makes a small config function public.

Changes

Cohort / File(s) Summary
Kernel core
src/lib.rs, src/kernel/mod.rs
Adds kernel module and ZeptoKernel with boot, accessors (tool_definitions, provider) and shutdown, assembling provider, tools, safety, metrics, hooks, cron, MCP clients, and optional LTM.
Provider chain
src/kernel/provider.rs
New provider chain builder: build_provider_chain, runtime selection mapping, OAuth refresh, fallback chaining, retry/quota wrappers, and related helpers + tests.
Tool registrar
src/kernel/registrar.rs
Adds ToolFilter, ToolDeps, and register_all_tools to register many tool groups, discover/init MCP servers, return MCP clients; includes filter tests.
Tool execution gate
src/kernel/gate.rs
Adds execute_tool that performs optional input/output safety checks, invokes tools from a registry, records duration and metrics, and returns guarded outputs; includes unit tests.
Agent loop
src/agent/loop.rs
Adds pub async fn set_provider_arc(&self, provider: Arc<dyn LLMProvider>) and pub async fn merge_kernel_tools(&self, registry: ToolRegistry, mcp_clients: Vec<Arc<...::McpClient>>) for kernel-to-agent integration.
CLI refactor
src/cli/common.rs
Replaces in-file provider & tool construction with kernel boot usage; delegates memory, provider chain, tool registration, and tool enabling logic to kernel.
Registry & config
src/tools/registry.rs, src/config/mod.rs
Adds ToolRegistry::merge to consume and extend another registry; makes expand_home public.

Sequence Diagram(s)

sequenceDiagram
    participant Config
    participant Bus
    participant ZeptoKernel
    participant ProviderBuilder as Provider Chain
    participant Registrar as Tool Registrar
    participant ToolRegistry
    participant MCPClients
    participant Safety
    participant Metrics

    Config->>ZeptoKernel: boot(config, bus, template, hand)
    ZeptoKernel->>ProviderBuilder: build_provider_chain(config)
    ProviderBuilder-->>ZeptoKernel: Arc<LLMProvider>, provider_names
    ZeptoKernel->>Registrar: register_all_tools(registry, config, filter, deps)
    Registrar->>ToolRegistry: insert tools
    Registrar-->>MCPClients: Vec<Arc<McpClient>>
    ZeptoKernel->>Safety: configure from config
    ZeptoKernel->>Metrics: create collector
    ZeptoKernel-->>Bus: return ZeptoKernel{provider, tools, safety, metrics, mcp_clients}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I stitched a kernel, snug and bright,
Tools marched in rows, providers alight,
Safety guarded each hop and twirl,
Metrics chimed like a springtime whirl,
Together we boot—the rabbit and the kernel.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main refactoring objective—extracting kernel subsystems from create_agent_with_template() into a new kernel module.
Linked Issues check ✅ Passed The PR fully implements the coding objectives from issue #223: Phase 1 provider chain extraction (kernel/provider.rs), Phase 2 tool registrar with ToolFilter and register_all_tools(), plus kernel entry point assembly and agent integration.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #223 objectives: kernel module extraction, provider chain assembly, tool registration infrastructure, and necessary supporting changes (ToolRegistry.merge, AgentLoop methods, expand_home visibility).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/thin-kernel

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/kernel/mod.rs (1)

190-193: Log MCP shutdown failures instead of discarding them.

Suppressing shutdown errors makes teardown issues hard to diagnose in production.

Proposed fix
     pub async fn shutdown(&self) {
         for client in &self.mcp_clients {
-            let _ = client.shutdown().await;
+            if let Err(e) = client.shutdown().await {
+                warn!(
+                    server = %client.server_name(),
+                    error = %e,
+                    "Failed to shut down MCP client"
+                );
+            }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/mod.rs` around lines 190 - 193, The shutdown method currently
discards results from client.shutdown().await; change it to inspect the Result
and log any Err so failures during MCP teardown are visible. In the impl for
shutdown (the async fn shutdown on the type with self.mcp_clients), await each
client.shutdown(), match or if let Err(err) = ... and call your project's
logging facility (e.g., log::error! or tracing::error!) with context like "MCP
client shutdown failed" and include the err and an identifier for the client if
available. Ensure the loop continues to attempt shutting down remaining clients
after logging.
src/kernel/registrar.rs (2)

372-376: Empty feature-gated block is a placeholder — consider removing or implementing.

The #[cfg(feature = "google")] block at lines 372-376 contains only a comment about deferred implementation. This could be confusing for future maintainers.

Either remove the block entirely or add a TODO issue to track the implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/registrar.rs` around lines 372 - 376, The feature-gated empty
block guarded by #[cfg(feature = "google")] around the
filter.is_enabled("google") check is a placeholder and should be removed or
turned into an explicit tracked TODO; either delete the entire block to avoid
confusion, or replace the comment with a concise TODO that references a
task/issue ID and a short note that Google Workspace OAuth token resolution is
deferred to the caller (so maintainers can track implementation) — locate the
block around filter.is_enabled("google") in registrar.rs and apply one of those
two options.

259-297: Consider graceful degradation for misconfigured web search providers.

Lines 266 and 280 return hard errors (Err(anyhow::anyhow!(...))) when required config is missing for SearXNG/Brave. This could prevent entire agent startup if a user partially configures web search.

Consider warning and skipping the tool instead, similar to other optional tools:

♻️ Suggested change for graceful degradation
         "searxng" => {
-            let url = search_cfg
+            let url = match search_cfg
                 .api_url
                 .as_deref()
                 .map(str::trim)
                 .filter(|s| !s.is_empty())
-                .ok_or_else(|| {
-                    anyhow::anyhow!("SearXNG provider requires tools.web.search.api_url")
-                })?;
-            registry.register(Box::new(crate::tools::SearxngSearchTool::with_max_results(
-                url, max,
-            )?));
-            info!("Registered web_search tool (SearXNG)");
+            {
+                Some(url) => url,
+                None => {
+                    warn!("SearXNG provider requires tools.web.search.api_url, skipping");
+                    return Ok(mcp_clients);
+                }
+            };
+            match crate::tools::SearxngSearchTool::with_max_results(url, max) {
+                Ok(tool) => {
+                    registry.register(Box::new(tool));
+                    info!("Registered web_search tool (SearXNG)");
+                }
+                Err(e) => warn!("Failed to initialize SearXNG search tool: {}", e),
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/registrar.rs` around lines 259 - 297, The match arms for "searxng"
and "brave" currently return hard Err(...) when required config is missing;
change them to gracefully skip registration instead: for Searxng and Brave,
check api_url/api_key as an Option (instead of ok_or_else) and if missing or
empty call warn! with a clear message (mentioning tools.web.search.api_url or
tools.web.search.api_key) and simply continue without registering the tool,
leaving the ddg fallback behavior unchanged; update the registry.register calls
in the "searxng" and "brave" arms to only run when the config value is present
and valid (use the same construction as the current with_max_results
invocations) and ensure info! is only logged when registration actually occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/kernel/gate.rs`:
- Around line 50-51: The call to registry.execute_with_context(name, input,
ctx).await currently uses the ? operator, which short-circuits and skips
metrics.record_tool_call(...) on errors; change it to capture the Result (e.g.,
let result = registry.execute_with_context(name, input, ctx).await;), call
metrics.record_tool_call(...) based on result.is_ok() or .is_err() to record the
failure, then propagate the result (e.g., result?), doing the same pattern for
the other occurrence around lines 65-67 so both successful and failed tool
executions are counted; reference the registry.execute_with_context(...) call
and metrics.record_tool_call(...) when making the change.

In `@src/kernel/mod.rs`:
- Around line 81-84: The code currently falls back to
Arc::new(crate::providers::ClaudeProvider::new("")) when no runtime provider is
present, which hides configuration errors; change the fallback to either return
an explicit "no provider configured" error from the boot path (propagate a
Result error) or instantiate a dedicated NoProvider/UnconfiguredProvider stub
that implements the same trait and returns deterministic "not configured"
errors; update the call sites such as create_agent_with_template() and the
variable that currently holds the Arc to accept the Result or the NoProvider to
ensure boot fails loudly or returns clear errors instead of silently using
ClaudeProvider with an empty string.

---

Nitpick comments:
In `@src/kernel/mod.rs`:
- Around line 190-193: The shutdown method currently discards results from
client.shutdown().await; change it to inspect the Result and log any Err so
failures during MCP teardown are visible. In the impl for shutdown (the async fn
shutdown on the type with self.mcp_clients), await each client.shutdown(), match
or if let Err(err) = ... and call your project's logging facility (e.g.,
log::error! or tracing::error!) with context like "MCP client shutdown failed"
and include the err and an identifier for the client if available. Ensure the
loop continues to attempt shutting down remaining clients after logging.

In `@src/kernel/registrar.rs`:
- Around line 372-376: The feature-gated empty block guarded by #[cfg(feature =
"google")] around the filter.is_enabled("google") check is a placeholder and
should be removed or turned into an explicit tracked TODO; either delete the
entire block to avoid confusion, or replace the comment with a concise TODO that
references a task/issue ID and a short note that Google Workspace OAuth token
resolution is deferred to the caller (so maintainers can track implementation) —
locate the block around filter.is_enabled("google") in registrar.rs and apply
one of those two options.
- Around line 259-297: The match arms for "searxng" and "brave" currently return
hard Err(...) when required config is missing; change them to gracefully skip
registration instead: for Searxng and Brave, check api_url/api_key as an Option
(instead of ok_or_else) and if missing or empty call warn! with a clear message
(mentioning tools.web.search.api_url or tools.web.search.api_key) and simply
continue without registering the tool, leaving the ddg fallback behavior
unchanged; update the registry.register calls in the "searxng" and "brave" arms
to only run when the config value is present and valid (use the same
construction as the current with_max_results invocations) and ensure info! is
only logged when registration actually occurs.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23790f3 and b40e03f.

📒 Files selected for processing (9)
  • src/agent/loop.rs
  • src/cli/common.rs
  • src/config/mod.rs
  • src/kernel/gate.rs
  • src/kernel/mod.rs
  • src/kernel/provider.rs
  • src/kernel/registrar.rs
  • src/lib.rs
  • src/tools/registry.rs

Comment thread src/kernel/gate.rs Outdated
Comment on lines +50 to +51
let output = registry.execute_with_context(name, input, ctx).await?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Record metrics on tool execution Err before returning.

The ? short-circuit skips metrics.record_tool_call(...) for real execution failures, so failed calls are not fully counted.

Proposed fix
-    let output = registry.execute_with_context(name, input, ctx).await?;
+    let output = match registry.execute_with_context(name, input, ctx).await {
+        Ok(output) => output,
+        Err(e) => {
+            metrics.record_tool_call(name, start.elapsed(), false);
+            return Err(e);
+        }
+    };

Also applies to: 65-67

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/gate.rs` around lines 50 - 51, The call to
registry.execute_with_context(name, input, ctx).await currently uses the ?
operator, which short-circuits and skips metrics.record_tool_call(...) on
errors; change it to capture the Result (e.g., let result =
registry.execute_with_context(name, input, ctx).await;), call
metrics.record_tool_call(...) based on result.is_ok() or .is_err() to record the
failure, then propagate the result (e.g., result?), doing the same pattern for
the other occurrence around lines 65-67 so both successful and failed tool
executions are counted; reference the registry.execute_with_context(...) call
and metrics.record_tool_call(...) when making the change.

Comment thread src/kernel/mod.rs Outdated
…er registration

Two fixes from CodeRabbit review:

1. gate.rs: The `?` operator on execute_with_context skipped metrics
   recording when tools returned Err. Now records failed metrics before
   propagating the error.

2. mod.rs + common.rs: ZeptoKernel.provider was always set (even as an
   empty ClaudeProvider placeholder), which caused agent.provider().is_none()
   to always return false — preventing plugin-only provider setups from
   registering. Changed provider to Option<Arc<dyn LLMProvider>> and
   conditionally set it only when a real provider was assembled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
src/cli/common.rs (1)

313-317: Minor duplication: ToolFilter created twice.

ToolFilter::from_config() is called here and also inside ZeptoKernel::boot() (line 70 in mod.rs). Consider exposing the filter from the kernel struct to avoid redundant construction, though this is a minor optimization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/common.rs` around lines 313 - 317, ToolFilter is being constructed
twice; remove the redundant call to ToolFilter::from_config in src/cli/common.rs
and use the filter already produced by ZeptoKernel::boot() instead—add or expose
a getter on ZeptoKernel (e.g., a pub field or a method like
ZeptoKernel::tool_filter()/get_filter()) and replace the local `let filter =
...` usage with a reference to that kernel-provided filter (identify occurrences
of ToolFilter::from_config, the local `filter` binding, and
ZeptoKernel::boot/ZeptoKernel to implement the change).
src/kernel/gate.rs (2)

145-164: Consider adding tests for safety blocking scenarios.

The test test_execute_tool_with_safety_passes_clean_input only covers the case where safety checks pass. Consider adding tests that verify the behavior when safety actually blocks input or output to ensure the blocking paths (lines 39-46 and 61-68) are exercised.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/gate.rs` around lines 145 - 164, Add tests that exercise
safety-blocking branches of execute_tool by creating new async tokio tests
similar to test_execute_tool_with_safety_passes_clean_input but configuring or
mocking SafetyLayer/SafetyConfig so the safety check returns a blocking
decision; for example add tests named test_execute_tool_with_safety_blocks_input
and test_execute_tool_with_safety_blocks_output that call execute_tool with a
SafetyLayer instance (or a test-double) that yields a blocked SafetyResult for
the input path (to hit the branch around lines handling input block) and for the
output path (to hit the branch handling output block), then assert that
execute_tool returns the expected Err or blocked marker and that metrics/safety
decisions are recorded as appropriate; locate and reuse execute_tool,
ToolContext, SafetyLayer, SafetyConfig, and MetricsCollector to implement these
tests.

35-47: Consider method naming: check_tool_output used for input validation.

The safety layer's check_tool_output method is being called on the serialized input at line 38. This may be intentional (reusing the same content check logic for both directions), but consider adding a comment clarifying the design choice or using a more generic method name if both input and output share the same check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/gate.rs` around lines 35 - 47, The code calls
safety_layer.check_tool_output on the serialized input_str (in gate.rs) which is
confusing; either add a clarifying comment above that call explaining that
check_tool_output is intentionally reused for both inputs and outputs, or rename
the method across the safety layer trait and implementations to a neutral name
like check_tool_io (and update all call sites, including this use in gate.rs,
plus any implementations of check_tool_output) so the intent matches usage;
ensure metrics.record_tool_call, ToolOutput::error and the variable name
input_str remain unchanged while you update the method name or add the
explanatory comment.
src/kernel/mod.rs (1)

190-195: Consider logging MCP client shutdown errors.

The let _ = pattern silently ignores shutdown errors. While acceptable for graceful shutdown, logging at debug/trace level could aid troubleshooting.

Optional: Log shutdown errors at trace level
     pub async fn shutdown(&self) {
         for client in &self.mcp_clients {
-            let _ = client.shutdown().await;
+            if let Err(e) = client.shutdown().await {
+                tracing::trace!(error = %e, "MCP client shutdown error (ignored)");
+            }
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/kernel/mod.rs` around lines 190 - 195, The shutdown method currently
swallows errors with "let _ = client.shutdown().await;", losing useful info;
change this to capture the result and log failures at trace/debug level (e.g.,
let res = client.shutdown().await; if let Err(e) = res { trace!("MCP client
shutdown failed: {:?}", e); }) so that the loop over self.mcp_clients reports
shutdown errors without changing control flow; update imports if needed to use
your project's logging macro (trace!/debug!); keep behavior of continuing
shutdown on error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/cli/common.rs`:
- Around line 313-317: ToolFilter is being constructed twice; remove the
redundant call to ToolFilter::from_config in src/cli/common.rs and use the
filter already produced by ZeptoKernel::boot() instead—add or expose a getter on
ZeptoKernel (e.g., a pub field or a method like
ZeptoKernel::tool_filter()/get_filter()) and replace the local `let filter =
...` usage with a reference to that kernel-provided filter (identify occurrences
of ToolFilter::from_config, the local `filter` binding, and
ZeptoKernel::boot/ZeptoKernel to implement the change).

In `@src/kernel/gate.rs`:
- Around line 145-164: Add tests that exercise safety-blocking branches of
execute_tool by creating new async tokio tests similar to
test_execute_tool_with_safety_passes_clean_input but configuring or mocking
SafetyLayer/SafetyConfig so the safety check returns a blocking decision; for
example add tests named test_execute_tool_with_safety_blocks_input and
test_execute_tool_with_safety_blocks_output that call execute_tool with a
SafetyLayer instance (or a test-double) that yields a blocked SafetyResult for
the input path (to hit the branch around lines handling input block) and for the
output path (to hit the branch handling output block), then assert that
execute_tool returns the expected Err or blocked marker and that metrics/safety
decisions are recorded as appropriate; locate and reuse execute_tool,
ToolContext, SafetyLayer, SafetyConfig, and MetricsCollector to implement these
tests.
- Around line 35-47: The code calls safety_layer.check_tool_output on the
serialized input_str (in gate.rs) which is confusing; either add a clarifying
comment above that call explaining that check_tool_output is intentionally
reused for both inputs and outputs, or rename the method across the safety layer
trait and implementations to a neutral name like check_tool_io (and update all
call sites, including this use in gate.rs, plus any implementations of
check_tool_output) so the intent matches usage; ensure metrics.record_tool_call,
ToolOutput::error and the variable name input_str remain unchanged while you
update the method name or add the explanatory comment.

In `@src/kernel/mod.rs`:
- Around line 190-195: The shutdown method currently swallows errors with "let _
= client.shutdown().await;", losing useful info; change this to capture the
result and log failures at trace/debug level (e.g., let res =
client.shutdown().await; if let Err(e) = res { trace!("MCP client shutdown
failed: {:?}", e); }) so that the loop over self.mcp_clients reports shutdown
errors without changing control flow; update imports if needed to use your
project's logging macro (trace!/debug!); keep behavior of continuing shutdown on
error.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b40e03f and f268b21.

📒 Files selected for processing (3)
  • src/cli/common.rs
  • src/kernel/gate.rs
  • src/kernel/mod.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: thin kernel — extract provider chain + tool registrar from create_agent()

1 participant