refactor(kernel): extract thin kernel from create_agent_with_template()#224
Conversation
Replace ~700 lines of subsystem assembly in create_agent_with_template() with ZeptoKernel::boot() — a single entry point that assembles provider chain, tools, safety, metrics, hooks, memory, runtime, and cron. - Add src/kernel/ module (mod.rs, provider.rs, registrar.rs, gate.rs) - Extract provider chain assembly to kernel/provider.rs - Extract 30+ tool registrations to kernel/registrar.rs with ToolFilter - Add ToolRegistry::merge() for bulk kernel→agent tool transfer - Add AgentLoop::merge_kernel_tools() and set_provider_arc() - Make config::expand_home() public for registrar use - Reduce cli/common.rs from ~1300 to ~65 lines of changes Closes #223 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughExtracts a new ZeptoKernel module that centralizes provider chain assembly, tool registration, gated tool execution, and subsystem orchestration; refactors CLI agent boot to use the kernel, adds AgentLoop helpers for provider/tool merges, exposes a ToolRegistry merge, and makes a small config function public. Changes
Sequence Diagram(s)sequenceDiagram
participant Config
participant Bus
participant ZeptoKernel
participant ProviderBuilder as Provider Chain
participant Registrar as Tool Registrar
participant ToolRegistry
participant MCPClients
participant Safety
participant Metrics
Config->>ZeptoKernel: boot(config, bus, template, hand)
ZeptoKernel->>ProviderBuilder: build_provider_chain(config)
ProviderBuilder-->>ZeptoKernel: Arc<LLMProvider>, provider_names
ZeptoKernel->>Registrar: register_all_tools(registry, config, filter, deps)
Registrar->>ToolRegistry: insert tools
Registrar-->>MCPClients: Vec<Arc<McpClient>>
ZeptoKernel->>Safety: configure from config
ZeptoKernel->>Metrics: create collector
ZeptoKernel-->>Bus: return ZeptoKernel{provider, tools, safety, metrics, mcp_clients}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/kernel/mod.rs (1)
190-193: Log MCP shutdown failures instead of discarding them.Suppressing shutdown errors makes teardown issues hard to diagnose in production.
Proposed fix
pub async fn shutdown(&self) { for client in &self.mcp_clients { - let _ = client.shutdown().await; + if let Err(e) = client.shutdown().await { + warn!( + server = %client.server_name(), + error = %e, + "Failed to shut down MCP client" + ); + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/kernel/mod.rs` around lines 190 - 193, The shutdown method currently discards results from client.shutdown().await; change it to inspect the Result and log any Err so failures during MCP teardown are visible. In the impl for shutdown (the async fn shutdown on the type with self.mcp_clients), await each client.shutdown(), match or if let Err(err) = ... and call your project's logging facility (e.g., log::error! or tracing::error!) with context like "MCP client shutdown failed" and include the err and an identifier for the client if available. Ensure the loop continues to attempt shutting down remaining clients after logging.src/kernel/registrar.rs (2)
372-376: Empty feature-gated block is a placeholder — consider removing or implementing.The
#[cfg(feature = "google")]block at lines 372-376 contains only a comment about deferred implementation. This could be confusing for future maintainers.Either remove the block entirely or add a TODO issue to track the implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/kernel/registrar.rs` around lines 372 - 376, The feature-gated empty block guarded by #[cfg(feature = "google")] around the filter.is_enabled("google") check is a placeholder and should be removed or turned into an explicit tracked TODO; either delete the entire block to avoid confusion, or replace the comment with a concise TODO that references a task/issue ID and a short note that Google Workspace OAuth token resolution is deferred to the caller (so maintainers can track implementation) — locate the block around filter.is_enabled("google") in registrar.rs and apply one of those two options.
259-297: Consider graceful degradation for misconfigured web search providers.Lines 266 and 280 return hard errors (
Err(anyhow::anyhow!(...))) when required config is missing for SearXNG/Brave. This could prevent entire agent startup if a user partially configures web search.Consider warning and skipping the tool instead, similar to other optional tools:
♻️ Suggested change for graceful degradation
"searxng" => { - let url = search_cfg + let url = match search_cfg .api_url .as_deref() .map(str::trim) .filter(|s| !s.is_empty()) - .ok_or_else(|| { - anyhow::anyhow!("SearXNG provider requires tools.web.search.api_url") - })?; - registry.register(Box::new(crate::tools::SearxngSearchTool::with_max_results( - url, max, - )?)); - info!("Registered web_search tool (SearXNG)"); + { + Some(url) => url, + None => { + warn!("SearXNG provider requires tools.web.search.api_url, skipping"); + return Ok(mcp_clients); + } + }; + match crate::tools::SearxngSearchTool::with_max_results(url, max) { + Ok(tool) => { + registry.register(Box::new(tool)); + info!("Registered web_search tool (SearXNG)"); + } + Err(e) => warn!("Failed to initialize SearXNG search tool: {}", e), + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/kernel/registrar.rs` around lines 259 - 297, The match arms for "searxng" and "brave" currently return hard Err(...) when required config is missing; change them to gracefully skip registration instead: for Searxng and Brave, check api_url/api_key as an Option (instead of ok_or_else) and if missing or empty call warn! with a clear message (mentioning tools.web.search.api_url or tools.web.search.api_key) and simply continue without registering the tool, leaving the ddg fallback behavior unchanged; update the registry.register calls in the "searxng" and "brave" arms to only run when the config value is present and valid (use the same construction as the current with_max_results invocations) and ensure info! is only logged when registration actually occurs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/kernel/gate.rs`:
- Around line 50-51: The call to registry.execute_with_context(name, input,
ctx).await currently uses the ? operator, which short-circuits and skips
metrics.record_tool_call(...) on errors; change it to capture the Result (e.g.,
let result = registry.execute_with_context(name, input, ctx).await;), call
metrics.record_tool_call(...) based on result.is_ok() or .is_err() to record the
failure, then propagate the result (e.g., result?), doing the same pattern for
the other occurrence around lines 65-67 so both successful and failed tool
executions are counted; reference the registry.execute_with_context(...) call
and metrics.record_tool_call(...) when making the change.
In `@src/kernel/mod.rs`:
- Around line 81-84: The code currently falls back to
Arc::new(crate::providers::ClaudeProvider::new("")) when no runtime provider is
present, which hides configuration errors; change the fallback to either return
an explicit "no provider configured" error from the boot path (propagate a
Result error) or instantiate a dedicated NoProvider/UnconfiguredProvider stub
that implements the same trait and returns deterministic "not configured"
errors; update the call sites such as create_agent_with_template() and the
variable that currently holds the Arc to accept the Result or the NoProvider to
ensure boot fails loudly or returns clear errors instead of silently using
ClaudeProvider with an empty string.
---
Nitpick comments:
In `@src/kernel/mod.rs`:
- Around line 190-193: The shutdown method currently discards results from
client.shutdown().await; change it to inspect the Result and log any Err so
failures during MCP teardown are visible. In the impl for shutdown (the async fn
shutdown on the type with self.mcp_clients), await each client.shutdown(), match
or if let Err(err) = ... and call your project's logging facility (e.g.,
log::error! or tracing::error!) with context like "MCP client shutdown failed"
and include the err and an identifier for the client if available. Ensure the
loop continues to attempt shutting down remaining clients after logging.
In `@src/kernel/registrar.rs`:
- Around line 372-376: The feature-gated empty block guarded by #[cfg(feature =
"google")] around the filter.is_enabled("google") check is a placeholder and
should be removed or turned into an explicit tracked TODO; either delete the
entire block to avoid confusion, or replace the comment with a concise TODO that
references a task/issue ID and a short note that Google Workspace OAuth token
resolution is deferred to the caller (so maintainers can track implementation) —
locate the block around filter.is_enabled("google") in registrar.rs and apply
one of those two options.
- Around line 259-297: The match arms for "searxng" and "brave" currently return
hard Err(...) when required config is missing; change them to gracefully skip
registration instead: for Searxng and Brave, check api_url/api_key as an Option
(instead of ok_or_else) and if missing or empty call warn! with a clear message
(mentioning tools.web.search.api_url or tools.web.search.api_key) and simply
continue without registering the tool, leaving the ddg fallback behavior
unchanged; update the registry.register calls in the "searxng" and "brave" arms
to only run when the config value is present and valid (use the same
construction as the current with_max_results invocations) and ensure info! is
only logged when registration actually occurs.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/agent/loop.rssrc/cli/common.rssrc/config/mod.rssrc/kernel/gate.rssrc/kernel/mod.rssrc/kernel/provider.rssrc/kernel/registrar.rssrc/lib.rssrc/tools/registry.rs
| let output = registry.execute_with_context(name, input, ctx).await?; | ||
|
|
There was a problem hiding this comment.
Record metrics on tool execution Err before returning.
The ? short-circuit skips metrics.record_tool_call(...) for real execution failures, so failed calls are not fully counted.
Proposed fix
- let output = registry.execute_with_context(name, input, ctx).await?;
+ let output = match registry.execute_with_context(name, input, ctx).await {
+ Ok(output) => output,
+ Err(e) => {
+ metrics.record_tool_call(name, start.elapsed(), false);
+ return Err(e);
+ }
+ };Also applies to: 65-67
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/kernel/gate.rs` around lines 50 - 51, The call to
registry.execute_with_context(name, input, ctx).await currently uses the ?
operator, which short-circuits and skips metrics.record_tool_call(...) on
errors; change it to capture the Result (e.g., let result =
registry.execute_with_context(name, input, ctx).await;), call
metrics.record_tool_call(...) based on result.is_ok() or .is_err() to record the
failure, then propagate the result (e.g., result?), doing the same pattern for
the other occurrence around lines 65-67 so both successful and failed tool
executions are counted; reference the registry.execute_with_context(...) call
and metrics.record_tool_call(...) when making the change.
…er registration Two fixes from CodeRabbit review: 1. gate.rs: The `?` operator on execute_with_context skipped metrics recording when tools returned Err. Now records failed metrics before propagating the error. 2. mod.rs + common.rs: ZeptoKernel.provider was always set (even as an empty ClaudeProvider placeholder), which caused agent.provider().is_none() to always return false — preventing plugin-only provider setups from registering. Changed provider to Option<Arc<dyn LLMProvider>> and conditionally set it only when a real provider was assembled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
src/cli/common.rs (1)
313-317: Minor duplication: ToolFilter created twice.
ToolFilter::from_config()is called here and also insideZeptoKernel::boot()(line 70 in mod.rs). Consider exposing the filter from the kernel struct to avoid redundant construction, though this is a minor optimization.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cli/common.rs` around lines 313 - 317, ToolFilter is being constructed twice; remove the redundant call to ToolFilter::from_config in src/cli/common.rs and use the filter already produced by ZeptoKernel::boot() instead—add or expose a getter on ZeptoKernel (e.g., a pub field or a method like ZeptoKernel::tool_filter()/get_filter()) and replace the local `let filter = ...` usage with a reference to that kernel-provided filter (identify occurrences of ToolFilter::from_config, the local `filter` binding, and ZeptoKernel::boot/ZeptoKernel to implement the change).src/kernel/gate.rs (2)
145-164: Consider adding tests for safety blocking scenarios.The test
test_execute_tool_with_safety_passes_clean_inputonly covers the case where safety checks pass. Consider adding tests that verify the behavior when safety actually blocks input or output to ensure the blocking paths (lines 39-46 and 61-68) are exercised.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/kernel/gate.rs` around lines 145 - 164, Add tests that exercise safety-blocking branches of execute_tool by creating new async tokio tests similar to test_execute_tool_with_safety_passes_clean_input but configuring or mocking SafetyLayer/SafetyConfig so the safety check returns a blocking decision; for example add tests named test_execute_tool_with_safety_blocks_input and test_execute_tool_with_safety_blocks_output that call execute_tool with a SafetyLayer instance (or a test-double) that yields a blocked SafetyResult for the input path (to hit the branch around lines handling input block) and for the output path (to hit the branch handling output block), then assert that execute_tool returns the expected Err or blocked marker and that metrics/safety decisions are recorded as appropriate; locate and reuse execute_tool, ToolContext, SafetyLayer, SafetyConfig, and MetricsCollector to implement these tests.
35-47: Consider method naming:check_tool_outputused for input validation.The safety layer's
check_tool_outputmethod is being called on the serialized input at line 38. This may be intentional (reusing the same content check logic for both directions), but consider adding a comment clarifying the design choice or using a more generic method name if both input and output share the same check.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/kernel/gate.rs` around lines 35 - 47, The code calls safety_layer.check_tool_output on the serialized input_str (in gate.rs) which is confusing; either add a clarifying comment above that call explaining that check_tool_output is intentionally reused for both inputs and outputs, or rename the method across the safety layer trait and implementations to a neutral name like check_tool_io (and update all call sites, including this use in gate.rs, plus any implementations of check_tool_output) so the intent matches usage; ensure metrics.record_tool_call, ToolOutput::error and the variable name input_str remain unchanged while you update the method name or add the explanatory comment.src/kernel/mod.rs (1)
190-195: Consider logging MCP client shutdown errors.The
let _ =pattern silently ignores shutdown errors. While acceptable for graceful shutdown, logging at debug/trace level could aid troubleshooting.Optional: Log shutdown errors at trace level
pub async fn shutdown(&self) { for client in &self.mcp_clients { - let _ = client.shutdown().await; + if let Err(e) = client.shutdown().await { + tracing::trace!(error = %e, "MCP client shutdown error (ignored)"); + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/kernel/mod.rs` around lines 190 - 195, The shutdown method currently swallows errors with "let _ = client.shutdown().await;", losing useful info; change this to capture the result and log failures at trace/debug level (e.g., let res = client.shutdown().await; if let Err(e) = res { trace!("MCP client shutdown failed: {:?}", e); }) so that the loop over self.mcp_clients reports shutdown errors without changing control flow; update imports if needed to use your project's logging macro (trace!/debug!); keep behavior of continuing shutdown on error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/cli/common.rs`:
- Around line 313-317: ToolFilter is being constructed twice; remove the
redundant call to ToolFilter::from_config in src/cli/common.rs and use the
filter already produced by ZeptoKernel::boot() instead—add or expose a getter on
ZeptoKernel (e.g., a pub field or a method like
ZeptoKernel::tool_filter()/get_filter()) and replace the local `let filter =
...` usage with a reference to that kernel-provided filter (identify occurrences
of ToolFilter::from_config, the local `filter` binding, and
ZeptoKernel::boot/ZeptoKernel to implement the change).
In `@src/kernel/gate.rs`:
- Around line 145-164: Add tests that exercise safety-blocking branches of
execute_tool by creating new async tokio tests similar to
test_execute_tool_with_safety_passes_clean_input but configuring or mocking
SafetyLayer/SafetyConfig so the safety check returns a blocking decision; for
example add tests named test_execute_tool_with_safety_blocks_input and
test_execute_tool_with_safety_blocks_output that call execute_tool with a
SafetyLayer instance (or a test-double) that yields a blocked SafetyResult for
the input path (to hit the branch around lines handling input block) and for the
output path (to hit the branch handling output block), then assert that
execute_tool returns the expected Err or blocked marker and that metrics/safety
decisions are recorded as appropriate; locate and reuse execute_tool,
ToolContext, SafetyLayer, SafetyConfig, and MetricsCollector to implement these
tests.
- Around line 35-47: The code calls safety_layer.check_tool_output on the
serialized input_str (in gate.rs) which is confusing; either add a clarifying
comment above that call explaining that check_tool_output is intentionally
reused for both inputs and outputs, or rename the method across the safety layer
trait and implementations to a neutral name like check_tool_io (and update all
call sites, including this use in gate.rs, plus any implementations of
check_tool_output) so the intent matches usage; ensure metrics.record_tool_call,
ToolOutput::error and the variable name input_str remain unchanged while you
update the method name or add the explanatory comment.
In `@src/kernel/mod.rs`:
- Around line 190-195: The shutdown method currently swallows errors with "let _
= client.shutdown().await;", losing useful info; change this to capture the
result and log failures at trace/debug level (e.g., let res =
client.shutdown().await; if let Err(e) = res { trace!("MCP client shutdown
failed: {:?}", e); }) so that the loop over self.mcp_clients reports shutdown
errors without changing control flow; update imports if needed to use your
project's logging macro (trace!/debug!); keep behavior of continuing shutdown on
error.
Summary
create_agent_with_template()intoZeptoKernel::boot()— a single entry point that assembles provider chain, tools, safety, metrics, hooks, memory, runtime, and cronsrc/kernel/module withprovider.rs(provider chain assembly),registrar.rs(30+ tool registrations withToolFilter),gate.rs(tool execution entry point), andmod.rs(ZeptoKernel struct + boot)ToolRegistry::merge()for bulk kernel→agent tool transfer andAgentLoop::merge_kernel_tools()/set_provider_arc()for kernel integrationcli/common.rsshrinks from ~1300 lines to ~65 lines of diff; new features (feat: MCP server mode — expose tools to Claude Desktop, VS Code, Cursor #217-feat: per-template capability sandbox (tools, shell patterns, resource limits) #222) now wire intokernel/registrar.rsinstead of the monolithic functionCloses #223
Test plan
cargo clippy -- -D warningscleancargo fmt -- --checkclean🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements