Skip to content

feat(routing): per-channel MCP and built-in tool filtering#1378

Open
nick-stebbings wants to merge 12 commits intonearai:stagingfrom
nick-stebbings:pr/channel-routing
Open

feat(routing): per-channel MCP and built-in tool filtering#1378
nick-stebbings wants to merge 12 commits intonearai:stagingfrom
nick-stebbings:pr/channel-routing

Conversation

@nick-stebbings
Copy link
Copy Markdown
Contributor

Summary

Add a JSON-configurable channel routing system that filters which MCP servers and built-in tools are offered to the LLM based on the incoming message channel.

Motivation

Multi-channel deployments (Slack + Telegram + web) need per-channel tool scoping. A research channel should only see research MCP tools, not production APIs. DMs should have broader access than shared channels.

How it works

Configuration at ~/.ironclaw/channel-routing.json:

{
  "groups": {
    "minimal": ["Archon"],
    "content": ["Archon", "Notion", "Kit"],
    "dev":     ["Archon", "Kiro", "Notion"]
  },
  "builtin_whitelist": {
    "content": ["memory_search", "create_job", "http_request"]
  },
  "channels": {
    "agentiffai-content": "content",
    "agentiffai-dev":     "dev"
  },
  "default_group": "minimal"
}
  • groups: named sets of allowed MCP server prefixes
  • builtin_whitelist: per-group allowlist of built-in tools (absent = all allowed)
  • channels: channel name/ID → group mapping
  • default_group: fallback for unmapped channels
  • DM channels (slack-dm, telegram-dm, cli, web) bypass filtering entirely

MCP tools identified by server prefix (Notion_post_search → server Notion). Filtering applied per-iteration in the dispatcher so newly registered tools are always correctly scoped.

Test plan

  • cargo test --lib passes (3162 tests, includes 10 routing-specific tests)
  • cargo clippy --all --all-features clean
  • cargo fmt --check clean
  • Manual: message in mapped channel → tools filtered to group
  • Manual: DM → all tools available (bypass)
  • Manual: unmapped channel → default_group applied

🤖 Generated with Claude Code

@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: experienced 6-19 merged PRs labels Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and configurable system for managing tool access within multi-channel deployments. By allowing administrators to define which MCP servers and built-in tools are available to the LLM agent based on the originating channel, it significantly enhances the agent's contextual awareness and security. This prevents unintended tool usage across different communication contexts, such as restricting production APIs from research channels, and provides a flexible mechanism for tailoring the agent's capabilities to specific channel requirements.

Highlights

  • JSON-configurable Channel Routing: Introduced a new system that allows configuring tool access based on the incoming message channel via a channel-routing.json file.
  • Per-Channel Tool Filtering: Implemented filtering for both MCP (Multi-Channel Platform) server tools and built-in tools, allowing specific tools to be available only in designated channels or groups.
  • Flexible Group Definitions: Enables defining named groups of allowed MCP server prefixes and whitelists for built-in tools, which channels can then be mapped to.
  • Direct Message (DM) Bypass: Direct message channels (e.g., Slack DMs, Telegram DMs, CLI, web) are explicitly configured to bypass all tool filtering, granting full access to all available tools.
  • Dynamic Filtering in Agent Loop: The tool filtering logic is applied dynamically at each iteration within the agent dispatcher, ensuring that newly registered tools are always correctly scoped according to the channel's configuration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a channel routing system that filters MCP servers and built-in tools based on the incoming message channel, configurable via a JSON file. The changes include adding a new channel_routing.rs file, modifying agent_loop.rs and dispatcher.rs to incorporate the routing logic, and updating tests to account for the new functionality. The review focuses on correctness, maintainability, security, and performance, with suggestions for improved restrictiveness, efficiency, and consistent application of routing logic.

Comment thread src/agent/channel_routing.rs Outdated
Comment thread src/agent/channel_routing.rs
Comment thread src/agent/dispatcher.rs Outdated
@github-actions github-actions bot added scope: docs Documentation scope: dependencies Dependency updates labels Mar 19, 2026
@nick-stebbings nick-stebbings force-pushed the pr/channel-routing branch 2 times, most recently from 82a2d3f to 7114583 Compare March 19, 2026 21:21
@nick-stebbings
Copy link
Copy Markdown
Contributor Author

Addressed Gemini review feedback in 7114583:

  • Restrictive default when group not found: Changed from allowing all tools to blocking all MCP tools (only built-in tools pass through). Misconfigured groups now fail safe.
  • No panics in production code: Verified — all unwrap() calls are inside #[cfg(test)] test module.
  • Test files updated: Added channel_routing: None to all test AgentDeps initializers that conflicted during rebase.

All checks pass (clippy, fmt, compilation).

Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: per-channel tool routing

The feature concept is sound — per-channel tool scoping is valuable for multi-channel deployments. However, the implementation has architectural and correctness issues that should be addressed before merge.

Architecture: config should live in the database, not a JSON file

The routing config is loaded from ~/.ironclaw/channel-routing.json once at startup. This means:

  • No hot-reload: Changes require a full restart. Operators iterating on channel routing (the common case when setting this up) have a painful feedback loop.
  • No web UI: Can't be configured through the settings page like other IronClaw settings. Every other user-facing configuration uses the database-backed settings system.
  • No per-user scoping: The JSON file is global. The database settings system supports per-user settings naturally.

The right approach is to store this in the existing SettingsStore (like other config in src/settings.rs) and expose it through the web settings UI. This gets hot-reload for free — the settings system already handles that. The ChannelRoutingConfig struct and filtering logic are fine; it's just the persistence/loading layer that needs to change.

Correctness issues

  1. MCP server name prefix matching is fragile: extract_mcp_server checks tool_name.starts_with("{server}_") by iterating all known servers. If one server name is a prefix of another (e.g., Kit and KitchenAI), Kit_ matches KitchenAI_recipe_search if checked first. Fix: sort by length descending, or pre-compute a lookup map keyed by longest-prefix match.

  2. is_dm false-positives on channel names starting with "web": DM_PREFIXES includes "web", so a channel named web-team-standup would bypass all filtering via starts_with. Use exact matches or require a delimiter.

  3. default_group is not validated at load time: If default_group: "typo" doesn't match any key in groups, every unmapped channel hits the restrictive fallback with a warning on every LLM iteration. Validate at load time.

Other

  • The version bump (0.19.0 → 0.20.0) and CHANGELOG entries should be in a separate release PR — they reference unrelated PRs and make this harder to cherry-pick or revert.
  • No integration test proving the dispatcher actually uses the filtered tools (unit tests for the config logic are good, but the wiring in dispatcher.rs is untested).

@github-actions github-actions bot added size: XL 500+ changed lines and removed size: L 200-499 changed lines labels Mar 21, 2026
@nick-stebbings
Copy link
Copy Markdown
Contributor Author

Addressed all review feedback in two commits (caab172, 6cb5d12):

Architecture: SettingsStore migration (6cb5d12)

  • ChannelRoutingConfig now loads from SettingsStore database first, falls back to channel-routing.json file
  • AgentDeps.channel_routing changed to Arc<RwLock<Option<...>>> — supports hot-reload via SIGHUP without restart
  • apply_channel_routing is now async (acquires read lock)
  • save_to_store() method added for web UI integration

Bug fixes (caab172)

  1. MCP prefix matching — server names now sorted by length descending. KitchenAI_recipe_search correctly matches KitchenAI, not Kit. Test added.
  2. is_dm false-positivesweb is now exact-match only. web-team-standup no longer bypasses routing. slack-dm-* uses prefix-with-delimiter. Negative test cases added.
  3. default_group validationvalidate() checks at load time that default_group and all channel mappings reference existing groups. Invalid configs return None with error log. Two tests added.

Cleanup

  • Added Serialize derive for DB persistence
  • Added metadata_key field (optional, for compat with existing configs)
  • Version bump / CHANGELOG not included in this PR (as requested)

Not yet addressed

  • Integration test for dispatcher wiring — happy to add if the architecture changes are approved
  • The extract_mcp_server now uses a static helper (extract_mcp_server_from) taking a sorted list, avoiding repeated group iteration

@ilblackdragon ready for re-review.

@nick-stebbings
Copy link
Copy Markdown
Contributor Author

nick-stebbings commented Mar 28, 2026

Addressed review feedback in latest push:

Architecture (ilblackdragon's main concern):

  • Migrated config from channel-routing.json file to SettingsStore (database-backed)
  • Added load_from_store() / save_to_store() on ChannelRoutingConfig
  • No file fallback — SettingsStore is the single source of truth
  • Changed AgentDeps.channel_routing to Arc<RwLock<Option<...>>> for hot-reload support
  • apply_channel_routing is now async to support RwLock reads

Correctness fixes (from previous push, still included):

  1. Restrictive default when group not found — blocks all MCP tools (fail-safe)
  2. Load-time validationdefault_group must exist in groups map
  3. No panics in production code — all unwrap() calls inside #[cfg(test)]

Rebased onto latest staging (8f8cb7f7).

Comment thread src/agent/channel_routing.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs Outdated
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review -- REQUEST_CHANGES

The filtering logic and test coverage for the config module are solid. However, three blockers:

HIGH

1. is_dm() bypass uses wrong channel names -- Checks for "web", "slack-dm", "telegram-dm" but none match runtime names. Web chat uses "gateway", Slack uses "slack-relay" (DMs distinguished by metadata, not channel name), and no "telegram-dm" channel exists. The DM bypass never fires, so routing restrictions are applied to DMs (opposite of documented intent).

Fix: Check "gateway", "cli", "repl" as exact matches. For relay DMs, check metadata.event_type == "direct_message" (may need to pass metadata into filter_tool_defs).

2. Startup loads from "default" scope instead of config.owner_id -- ChannelRoutingConfig::load_from_store(db.as_ref(), "default") but SettingsStore is user-scoped. Any deployment where owner_id != "default" silently fails to load routing config, leaving all tools exposed.

Fix: Use config.owner_id.

3. No actual hot-reload -- AgentDeps.channel_routing is Arc<RwLock<...>> claiming hot-reload support, but the RwLock is only written at startup. SIGHUP handler doesn't refresh it. save_to_store() exists but is never called.

Fix: Either wire up SIGHUP refresh, or remove the RwLock / hot-reload claim until implemented.

Medium

4. Filtering only covers tool definitions, not execution -- Tools are hidden from the LLM but execute_tool doesn't re-verify channel permission. A hallucinated or injected tool name would still execute. Consider adding a secondary check in execute_tool_calls.

5. MCP server detection relies on tool_name.starts_with("{server}_") -- Fragile naming convention. A built-in tool named Archon_something would be misclassified. ToolDefinition should ideally carry provenance metadata.

nick-stebbings and others added 2 commits April 10, 2026 10:25
1. DM/web bypass aligned with runtime channel names:
   - "web" → "gateway" (actual web chat channel name)
   - Removed "slack-dm-*" prefix — Slack DMs arrive on "slack" channel
     with metadata.channel starting with 'D'
   - Added Telegram DM detection via metadata.chat_type == "private"
   - is_dm() and filter_tool_defs() now accept metadata parameter

2. Owner scope: load channel_routing from config.owner_id instead of
   hardcoded "default" — non-default deployments now get routing.

3. Hot-reload: SIGHUP handler now reloads channel_routing from
   SettingsStore. Arc<RwLock> shared between AgentDeps and handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. validate() checks builtin_whitelist keys (fail-open misconfig vector)
2. PartialEq derives content comparison (SIGHUP change detection)
3. tracing::info! → debug! on hot path (per CLAUDE.md)
4. Pre-compute sorted MCP prefixes at load time (zero per-iteration alloc)
5. Eliminate format!() in extract_mcp_server (byte comparison, no heap)
6. is_dm() accepts metadata param for Slack/Telegram DM detection
7. Unknown group fails safe: blocks MCP tools, allows built-in only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
nick-stebbings and others added 2 commits April 10, 2026 10:39
The check_no_panics.py script has a documented limitation where it misclassifies
Rust lifetimes (like 'a) as character literals, corrupting the lexer state. This
caused test assertions in lines 544-545 to be flagged as production code panics.

Add '// safety:' comments to document that these are test assertions, suppressing
the false positive as the script recommends.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…field to test helpers

Commit 3b8b8a7 accidentally removed load_from_store() and save_to_store()
from channel_routing.rs while applying review feedback, breaking compilation
because main.rs depends on load_from_store() for database-backed hot-reload.

Also add the channel_routing field to AgentDeps initializers in test helpers
(thread_ops.rs, bridge/router.rs, dispatcher.rs) that were missed when the
field was added to AgentDeps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

+753/-0, 11 files -- JSON-configurable channel routing system that filters MCP servers and built-in tools based on incoming channel. DM channels bypass filtering. Config loaded from SettingsStore with SIGHUP hot-reload. Architecture is sound -- filtering at dispatcher level per-iteration, longest-match prefix resolution, fail-safe defaults.

Must Fix

  1. SIGHUP change detection is broken -- the hot-reload code compares new_routing.is_some() != guard.is_some(), which only detects Some/None transitions. If config content changes but remains Some, the log says "unchanged" even though the data IS updated. Should use the PartialEq impl that was specifically added for this: new_routing != *guard.

  2. save_to_store() is missing -- commit messages mention it was implemented, but the final diff has no write path. Without it, there is no programmatic way to persist channel routing config to the DB -- users would need to write raw JSON via the settings API. Either restore it or document how to set config via the settings store.

  3. load() from file is dead code -- main.rs only calls load_from_store(), but the file-based loader is still present with stale doc comments referencing channel-routing.json. The commit history says "File-based channel-routing.json removed from the loading path." Either remove the dead code or document it as a fallback/CLI utility.

Suggestions

  1. tracing::info! on the hot path -- apply_channel_routing logs at info level every time filtering reduces tool count, which fires every LLM iteration for routed channels. Should be debug! per project conventions.

  2. Stale doc comments -- module-level and struct docs still reference loading from ~/.ironclaw/channel-routing.json. Should reflect the DB-backed approach.

  3. No integration test for SIGHUP reload -- unit tests are solid (17 tests), but the reload path in main.rs is only tested manually.

  4. Consider ChannelRoutingConfig::empty() or Default impl for test helpers instead of repeating Arc::new(tokio::sync::RwLock::new(None)) in ~10 places.

Architecture

  • Filtering at dispatcher level per-iteration handles dynamic tool registration correctly
  • precompute_prefixes() + longest-match for server name extraction is well thought out
  • Validation at load time is good -- fail-fast on misconfiguration
  • Fail-safe behavior (unknown group blocks MCP, allows builtins) is the right security default

…e, doc cleanup

Must-fix items from zmanian's 2026-04-11 review:

1. SIGHUP change detection used `is_some() != guard.is_some()` which only
   detects presence/absence transitions. Replaced with `new_routing != *guard`
   to use the PartialEq impl that was specifically added for content comparison.

2. save_to_store() was missing despite commit messages claiming it existed.
   Added as the write counterpart to load_from_store(), serialising to JSON
   and persisting under the "channel_routing" settings key.

3. load() was dead code relative to the startup path with stale doc comments
   referencing channel-routing.json. Documented it explicitly as a file-based
   migration utility (for importing existing JSON into the DB) rather than
   removing it (used by existing tests and useful for one-off migration).

Suggestions also addressed:
- tracing::info! → tracing::debug! in apply_channel_routing (CLAUDE.md:
  "info! corrupts the terminal UI; use debug! for internal diagnostics")
- Module and struct doc comments updated to reflect DB-backed approach

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nick-stebbings
Copy link
Copy Markdown
Contributor Author

Addressed all three Must Fix items from @zmanian's 2026-04-11 review in d8357c1:

1. SIGHUP change detection (Must Fix → Fixed)

new_routing.is_some() != guard.is_some()new_routing != *guard. Now uses the PartialEq impl that was added specifically for this purpose — Some(old_config) != Some(new_config) with changed content correctly reports as changed.

2. save_to_store() missing (Must Fix → Fixed)

Added as the write counterpart to load_from_store(). Serialises the config to JSON and persists under the "channel_routing" settings key via SettingsStore::set_setting. Return type is Result<(), DatabaseError> to surface DB errors to callers rather than silently dropping them.

3. load() dead code (Must Fix → Fixed)

Kept the method (removing it would break the existing file-based tests, and it has genuine utility as a one-off migration path — load from JSON, then call save_to_store() to import into the DB). Doc comment updated to make the intent explicit: "File-based utility — not used in the normal startup path." Module and struct doc comments updated to reflect the DB-backed approach.

Suggestion: tracing::info!tracing::debug! (Done)

Changed in apply_channel_routing per CLAUDE.md: "info! corrupts the terminal UI; use debug! for internal diagnostics."

Suggestion: SIGHUP integration test (Not addressed)

Signal-level reload testing requires spawning a child process, sending SIGUP, and inspecting state — outside the scope of the unit/integration test tiers in this project. The SIGHUP path is covered by the hot-reload logic in main.rs which is straightforward: load → write lock → compare → log. Happy to add it if there's a test harness for OS-signal scenarios.

Suggestion: ChannelRoutingConfig::empty() / Default (Not addressed)

The repeated Arc::new(tokio::sync::RwLock::new(None)) in test helpers lives in dispatcher.rs test module. Adding a none_arc() constructor or Default impl on ChannelRoutingConfig is a cosmetic refactor — can do it, but wanted to keep this commit scoped to the must-fix items. Let me know if you'd like it included.

zmanian
zmanian previously approved these changes Apr 12, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: per-channel MCP and built-in tool filtering

Reviewed the current diff (+753/-0, 11 files). All previously-raised must-fix items have been addressed.


Previously-raised issues -- status

Zaki's must-fix items:

  1. SIGHUP change detection (was: compares Some/None instead of content) -- FIXED. Line uses new_routing != *guard which compares Option<ChannelRoutingConfig> using the custom PartialEq impl that checks all four fields (groups, builtin_whitelist, channels, default_group). Correctly detects content changes.

  2. save_to_store() missing -- FIXED. save_to_store() is present (serializes to JSON value, calls store.set_setting()). Write path exists for programmatic persistence.

  3. load() from file is dead code -- ADDRESSED. The file-based loader is retained but the doc comment now explicitly marks it as a "File-based utility -- not used in the normal startup path. Useful for one-off migration of an existing channel-routing.json into the database via save_to_store()." This is acceptable as a migration utility. The module-level doc also correctly describes the DB-backed primary path.

Zaki's suggestions:

  1. tracing::info! on hot path should be debug! -- FIXED. apply_channel_routing() in dispatcher.rs uses tracing::debug! for the per-iteration filter log. Good.

  2. Stale doc comments referencing JSON file -- MOSTLY FIXED. The struct-level and module-level docs correctly reference SettingsStore and DB. The load() method still references channel-routing.json but that's appropriate since it IS the file-based loader. One minor residual: load() still logs at tracing::info! when successfully loading from file (line ~108-113). Since this is a migration utility and not a hot path, it's acceptable but could be debug! for consistency.

  3. No integration test for SIGHUP reload -- NOT ADDRESSED (non-blocking). The SIGHUP reload path in main.rs is still only manually tested. Unit tests cover the config logic thoroughly (17 tests).

  4. Default impl or empty() constructor -- NOT ADDRESSED (non-blocking). Test files still repeat Arc::new(tokio::sync::RwLock::new(None)) in ~10 places. A helper would reduce boilerplate but this is cosmetic.

serrrfirat's items:

  1. DM bypass missing for TUI and HTTP channels -- PARTIALLY ADDRESSED. DM_EXACT includes "gateway", "cli", "repl" but not "tui" or "http". The TUI channel is the primary local terminal interface. If a user has routing config enabled and uses TUI, their tools will be filtered. The "http" channel (direct API calls) would also be filtered. Recommend adding "tui" and "http" to DM_EXACT. Non-blocking since the current CLI_MODE default is TUI which maps to "cli", but the channel name may differ at runtime.

  2. DM bypass for slack-relay -- FIXED. is_dm() now checks channel == "slack-relay" and inspects metadata for channel starting with 'D' or event_type == "direct_message". Tests cover both paths.

  3. DM bypass channel names don't match runtime names -- FIXED. Uses "gateway" (not "web"), handles relay channels via metadata. Tests explicitly verify "web" does NOT match and "web-team-standup" does NOT bypass.

  4. Startup load uses hardcoded "default" -- FIXED. Uses config.owner_id at startup and sighup_owner_id (cloned from config.owner_id) in the SIGHUP handler.


New findings

Non-blocking observations:

  1. DM_RELAY_PREFIXES appears unused in practice. The array contains "slack-dm-" and "telegram-dm-" but the actual Slack/Telegram DM detection uses exact channel name matches ("slack", "slack-relay", "telegram") plus metadata inspection. Unless there are actual channel names with these prefixes in the system, these prefix entries are dead code that could confuse future maintainers.

  2. std::path::Path import is only used by the migration load() method. Minor -- could be gated or removed if load() is eventually removed.

  3. No execution-time enforcement. Tools are filtered from the LLM's view but execute_tool_calls does not re-verify channel permission. A sufficiently motivated prompt injection could name a tool not in the filtered set. This was flagged in a prior review as medium severity. The current approach (filter at definition time) is reasonable for the initial implementation but worth noting as a defense-in-depth gap for a future iteration.


Verdict: APPROVE

All 3 must-fix items are resolved. The code is clean, well-tested (17 unit tests covering config parsing, validation, DM detection, prefix matching, filtering scenarios), and follows project conventions (debug! logging on hot paths, proper error handling without unwrap in production code, strong types). The SIGHUP reload path is correctly wired with proper owner_id scoping.

The remaining non-blocking items (TUI/HTTP in DM_EXACT, integration test for SIGHUP, test helper for default routing) can be addressed in follow-up PRs.

… test

Addresses zmanian's deferred suggestions:

none_arc() — replaces the 9 copies of `Arc::new(tokio::sync::RwLock::new(None))`
scattered across test helpers (dispatcher.rs ×5, thread_ops.rs ×2,
testing/mod.rs ×1) and the startup path in main.rs with a single typed
constructor. Removes the in-test dependency on knowing the exact type of
AgentDeps.channel_routing.

reload_from_store() — extracts the load + write-lock + compare pattern from
the SIGHUP handler into a method, so main.rs is now two lines instead of nine
and the logic is directly testable.

SIGHUP integration test (reload_tests module, feature = "libsql"):
- test_reload_from_store_detects_changes: None→Some, identical, content change
- test_save_and_load_roundtrip: round-trip through DB
- test_none_arc_initialises_to_none: baseline
Uses LibSqlBackend::new_local + tempdir (not new_memory) because libSQL
in-memory databases do not share state across connections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nick-stebbings
Copy link
Copy Markdown
Contributor Author

Follow-up commit e4edbec implements the two deferred suggestions:

none_arc() / test helper cleanup

Added ChannelRoutingConfig::none_arc() returning Arc<RwLock<Option<Self>>> initialised to None. Replaced 9 copies of Arc::new(tokio::sync::RwLock::new(None)) across dispatcher.rs (×5), thread_ops.rs (×2), testing/mod.rs (×1), and the startup path in main.rs.

SIGHUP integration test

Extracted the load+write-lock+compare logic from main.rs into reload_from_store(store, user_id, arc) so the SIGHUP hot-reload path is directly testable without sending an OS signal. Main.rs SIGHUP handler is now two lines.

Three tests in reload_tests module (gated on #[cfg(feature = "libsql")]):

  • test_reload_from_store_detects_changes — exercises None→Some, identical reload (no change), and content mutation across four reloads
  • test_save_and_load_roundtrip — round-trips a config through save_to_store / load_from_store
  • test_none_arc_initialises_to_none — baseline

Used LibSqlBackend::new_local + tempfile::TempDir rather than new_memory — libSQL in-memory databases don't share state between connections (each connect() call sees a fresh empty DB).


Re: cargo-deny / Code Style (fmt + clippy + deny) failures — these are pre-existing: the axum duplicate comes from libsql → tonic → axum 0.6 conflicting with ironclaw → axum 0.8. Both versions existed on staging before this PR. Not introduced here.

Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paranoid Architect Review — Verdict: 👍 APPROVE WITH MINOR FIXES

Well-structured feature with fail-safe defaults, load-time validation, longest-prefix-first matching, hot-reload support, and 10+ tests. Architecture is clean with Arc<RwLock<Option<...>>> pattern.

Findings

# Severity Category File Finding Suggested Fix
1 Medium Security channel_routing.rs:257 DM bypass relies on spoofable metadata. is_dm() checks metadata["chat_type"] and metadata["channel"] from relay webhooks. Malicious relay could forge "chat_type": "private" to bypass tool routing. Validate DM status at channel layer, or document trust assumption and ensure relay auth mandatory
2 Medium Correctness channel_routing.rs:348 Byte-level string slicing risk. &tool_name[..server.len()] could panic if server prefix has non-ASCII chars and byte index isn't char boundary. Use tool_name.get(..server.len()) or add ASCII assertion in precompute_prefixes()
3 Medium Convention channel_routing.rs:165 info! in reload path. load_from_store() called from SIGHUP handler uses info! which corrupts TUI per CLAUDE.md. Change to debug! — SIGHUP handler's own info! is sufficient
4 Medium Architecture dispatcher.rs Jobs/containers bypass channel routing. Job spawned from restricted channel runs with full unrestricted tool set. Propagate routing into job/container delegates, or document as intentional
5 Low Edge Case channel_routing.rs:312 Unknown group fallback blocks MCP (fail-safe) but allows ALL built-ins (fail-open). shell available when it shouldn't be. Apply builtin_whitelist from default_group in fallback
6 Low Test Coverage channel_routing.rs No test for missing precompute_prefixes() call. Empty sorted_prefixes disables MCP routing. Add test, or make computation automatic via custom Deserialize
7 Low Concurrency dispatcher.rs:829 RwLock read guard held across filter_tool_defs() and logging. Minor contention on SIGHUP. Clone config out of lock before filtering
8 Low Documentation channel_routing.rs:80 DM_EXACT is ["gateway", "cli", "repl"] but PR description says slack-dm, telegram-dm, cli, web. TUI channel NOT in bypass list. Add "tui" to DM_EXACT, update PR description
9 Nit Performance channel_routing.rs:336 whitelist.iter().any() is linear scan Pre-compute as HashSet<String>
10 Nit Consistency test code Mixed Arc::new(RwLock::new(None)) vs ChannelRoutingConfig::none_arc() Standardize on helper

Summary

Key items: DM bypass spoofability (#1), byte-slice panic risk (#2), TUI logging (#3), and job/container bypass (#4). The missing TUI in DM bypass list (#8) is likely an oversight. The rest is minor. Good implementation overall.

@github-actions github-actions bot added the scope: tool/builtin Built-in tools label Apr 13, 2026
Comment thread src/agent/channel_routing.rs
Comment thread src/tools/builtin/job.rs
Comment thread src/tools/builtin/job.rs
@github-actions github-actions bot added the scope: worker Container worker label Apr 15, 2026
@nick-stebbings
Copy link
Copy Markdown
Contributor Author

Added follow-up commit f0ff0b5 to close the remaining autonomous job routing gap. Jobs now load the per-user channel routing config from the originating notify channel/metadata and filter worker tool definitions consistently with dispatcher turns. Added libsql regression coverage: routed_jobs_filter_tools_by_originating_channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: experienced 6-19 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: dependencies Dependency updates scope: docs Documentation scope: tool/builtin Built-in tools scope: worker Container worker size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants