Skip to content

fix(wasm): run leak scan before credential injection in tools wrapper#791

Merged
zmanian merged 3 commits intonearai:stagingfrom
nick-stebbings:pr/wasm-tools-leak-scan-order
Mar 10, 2026
Merged

fix(wasm): run leak scan before credential injection in tools wrapper#791
zmanian merged 3 commits intonearai:stagingfrom
nick-stebbings:pr/wasm-tools-leak-scan-order

Conversation

@nick-stebbings
Copy link
Copy Markdown
Contributor

Summary

The tools WASM wrapper runs LeakDetector::scan_http_request() on HTTP request
headers after inject_host_credentials() has inserted real secrets (e.g.
xoxb- Slack bot tokens). This causes the leak detector to flag the tool's own
legitimate outbound API calls as secret exfiltration.

  • Move leak scan to run on raw_headers before any credential injection
  • Add regression test test_leak_scan_runs_before_credential_injection

This is the same class of bug fixed in #421 (dc7d9cc), which corrected the
ordering in channels/wasm/wrapper.rs — the tools wrapper was missed.

BEFORE: parse → inject_credentials → inject_host_credentials → scan (WRONG)
AFTER:  parse → scan → inject_credentials → inject_host_credentials (CORRECT)

Fixes false-positive leak blocks on any WASM tool using host-based credential injection.

Test plan

  • cargo test --lib — 2757 tests pass
  • cargo clippy --all --tests — zero warnings
  • cargo fmt -- --check — clean
  • New test: test_leak_scan_runs_before_credential_injection

🤖 Generated with Claude Code

@github-actions github-actions bot added scope: tool/wasm WASM tool sandbox size: M 50-199 changed lines labels Mar 10, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue in the WASM tools wrapper where the leak detector was incorrectly flagging legitimate outbound API calls as secret exfiltration. The problem stemmed from the leak scan occurring after sensitive host credentials, such as Slack bot tokens, had been injected into the HTTP request headers. By reordering the process to perform the leak scan on the raw headers before any credential injection, the change eliminates these false positives, ensuring that WASM tools can reliably make authenticated API calls without being erroneously blocked.

Highlights

  • Reordered Leak Scan: The leak detection process in the WASM tools wrapper was reordered to run on raw HTTP request headers before any host credentials are injected.
  • Prevented False Positives: This change resolves false-positive secret leak blocks that occurred when legitimate outbound API calls were flagged due to host-injected secrets (e.g., Slack bot tokens).
  • Added Regression Test: A new test, test_leak_scan_runs_before_credential_injection, was introduced to ensure the correct ordering of leak scanning and credential injection.
Changelog
  • src/tools/wasm/wrapper.rs
    • Moved the LeakDetector::scan_http_request call to occur before inject_host_credentials.
    • Removed the redundant LeakDetector::scan_http_request call that was previously executed after credential injection.
    • Added a new test test_leak_scan_runs_before_credential_injection to validate the fix.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot added risk: medium Business logic, config, or moderate-risk modules contributor: regular 2-5 merged PRs labels Mar 10, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where the leak detector was running after credential injection, which caused false positives. The change moves the leak scan to run on the raw headers before any host credentials are injected, which is the correct approach. A good regression test has also been added to verify this behavior. I've suggested a performance improvement to avoid unnecessary allocations during the leak scan.

Comment thread src/tools/wasm/wrapper.rs Outdated
Copy link
Copy Markdown
Contributor Author

@nick-stebbings nick-stebbings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed cargo fmt CI failure in 68b3b5b (closure formatting in leak scan loop).

@henrypark133 henrypark133 changed the base branch from main to staging March 10, 2026 02:18
zmanian
zmanian previously approved these changes Mar 10, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(wasm): run leak scan before credential injection

Clean, well-scoped security fix. The ordering change is correct -- scanning pre-injection headers prevents false positives from host-injected tokens (like xoxb- Slack tokens).

Strengths:

  • Good commit progression: fix -> perf optimization (inline scan) -> fmt cleanup
  • Regression test clearly demonstrates both the pre-injection pass and post-injection block
  • Using scan_and_clean() inline avoids unnecessary Vec allocation per HTTP request
  • Consistent with the channels wrapper fix from #421

Minor notes:

  • The injected_url variable name is slightly confusing in the pre-injection context (it's the URL that will be injected into, not one that has been injected). Consider renaming to raw_url or just url in a follow-up, but not blocking.

LGTM -- approve.

@nick-stebbings nick-stebbings force-pushed the pr/wasm-tools-leak-scan-order branch from 68b3b5b to cbf8dbd Compare March 10, 2026 08:28
@github-actions github-actions bot added scope: channel/web Web gateway channel scope: llm LLM integration scope: setup Onboarding / setup scope: ci CI/CD workflows size: L 200-499 changed lines risk: high Safety, secrets, auth, or critical infrastructure and removed size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules labels Mar 10, 2026
@nick-stebbings nick-stebbings force-pushed the pr/wasm-tools-leak-scan-order branch from cbf8dbd to 6d42bd0 Compare March 10, 2026 09:20
nick-stebbings and others added 3 commits March 10, 2026 11:30
The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in nearai#421.

Fixes the same class of bug as nearai#421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nick-stebbings nick-stebbings force-pushed the pr/wasm-tools-leak-scan-order branch from 6d42bd0 to c76919e Compare March 10, 2026 10:31
@github-actions github-actions bot added size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules and removed size: L 200-499 changed lines risk: high Safety, secrets, auth, or critical infrastructure labels Mar 10, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: fix(wasm): run leak scan before credential injection

The ordering fix is correct. Approving.

Ordering analysis

The tools wrapper has a two-phase credential injection pipeline:

  1. inject_credentials() -- resolves WASM-declared placeholders (e.g., {TELEGRAM_BOT_TOKEN}) that the WASM module explicitly templated into its request. The WASM module knows about these by design.
  2. inject_host_credentials() -- injects host-side secrets (Bearer tokens, API keys) that the WASM module never sees. These are resolved by the host based on the target URL's hostname.

The leak scan now sits between phases 1 and 2, which is the correct position:

  • Phase 1 credentials are WASM-declared, so scanning after phase 1 catches any exfiltration attempt where the WASM module smuggles a known secret into an unexpected field (e.g., encoding a token into a query param to a different domain).
  • Phase 2 credentials are host-injected and invisible to WASM -- scanning these would produce the false positives described in the PR.

This matches the architecture documented in leak_detector.rs (line 22: Allowlist -> Leak Scan -> Credential Injector) and is consistent with the channels wrapper fix in #421.

Code quality

  • The inline scan_and_clean() approach avoids the Vec allocation from the original code and provides more specific error messages (URL vs header vs body). Good improvement over the Gemini-suggested version that was adopted.
  • Comment explains the "why" clearly.

Test coverage

The regression test is well-structured: it demonstrates that pre-injection headers pass the scan while post-injection headers (with a real xoxb- token) are blocked. This directly validates the ordering invariant.

Note: the test uses scan_http_request() with Vec allocation (the old API), not the inline scan_and_clean() pattern used in production. This is fine for a regression test -- it's testing the detector's behavior, not the allocation strategy.

Minor note (non-blocking, carry forward from previous review)

The variable injected_url at the scan site is the URL after inject_credentials() (phase 1) but before inject_host_credentials() (phase 2). The name "injected" refers to WASM-placeholder injection, which is accurate but could be confusing in the context of the leak scan comment that says "before credential injection." Consider renaming to url_with_wasm_creds or similar in a follow-up for clarity.

Verdict

The security fix is correct, the test coverage is adequate, and the code quality is good. No unwrap/expect in the changed production paths (the existing unwrap_or_default() on line 280 is pre-existing and is a safe default for header parsing failure).

LGTM.

@zmanian zmanian merged commit 66e834d into nearai:staging Mar 10, 2026
9 checks passed
PierreLeGuen pushed a commit that referenced this pull request Mar 10, 2026
…#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
henrypark133 added a commit that referenced this pull request Mar 11, 2026
* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test credential constants into testing::credentials (#829)

* refactor: central…
zmanian added a commit that referenced this pull request Mar 12, 2026
…1063)

* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize …
henrypark133 added a commit that referenced this pull request Mar 13, 2026
* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test cre…
ilblackdragon added a commit that referenced this pull request Mar 14, 2026
* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test cre…
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…nearai#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in nearai#421.

Fixes the same class of bug as nearai#421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…earai#1063)

* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize …
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
)

* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test cre…
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit c566faf28fb77c2fa4df92c2947fb48f1a25df9b.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test credential constants into testing::credentials (#829)

* refactor: central…
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…nearai#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in nearai#421.

Fixes the same class of bug as nearai#421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…earai#1063)

* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit 1fc2b85fa70d8421a9395e69d491d0e8858046b8.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize …
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
)

* chore: promote staging to main (2026-03-10 15:19 UTC) (#865)

* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit 1fc2b85fa70d8421a9395e69d491d0e8858046b8.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test cre…
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
* fix: Channel HTTP: server doesn't start after config change (no hot-r… (#779)

* fix: Channel HTTP: server doesn't start after config change (no hot-reload)

* review fixes

* review fixes

* fix linter

* fix code style

* fix: prevent session lock contention blocking message processing (#783)

* fix: prevent session lock contention blocking message processing

## Problem
After container restart, POST /api/chat/send returns 202 ACCEPTED but messages
don't appear in conversation_messages and agent never responds. Messages get
stuck in "stale state" after restart.

Root cause: Session lock was held for entire duration of chat_threads_handler
and chat_history_handler, including during slow database queries. This blocked
the agent loop from acquiring the session lock to process incoming messages,
causing them to hang indefinitely.

## Solution
1. **Release session lock early in chat_threads_handler**: Only acquire lock
   when reading active_thread at response time, not during DB queries for
   thread list. DB operations no longer block message processing.

2. **Release session lock early in chat_history_handler**: Only acquire lock
   when accessing in-memory thread state, not during paginated DB queries or
   thread ownership checks. DB operations no longer block message processing.

3. **Add comprehensive logging**: Track message flow from receipt through
   session resolution, thread hydration, and state transitions. Helps diagnose
   future issues:
   - Message queued to agent loop (chat_send_handler)
   - Processing message from channel (handle_message)
   - Hydrating thread from DB (maybe_hydrate_thread)
   - Resolving session and thread (resolve_thread)
   - Checking thread state (process_user_input)
   - Persisting user message (persist_user_message)

## Impact
- Message processing no longer blocks on session lock contention
- API response times for thread list/history queries unaffected (DB queries
  still happen, but lock is not held)
- Better diagnostics for future debugging

## Testing
- All 2756 tests pass
- Code compiles with zero clippy warnings
- No changes to user-facing API or behavior, only lock timing

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* security: redact PII from info-level logs

Downgrade user_id and channel logging to debug level to prevent exposing
Personally Identifiable Information (PII) in production logs.

The user_id field can contain sensitive information such as phone numbers
(e.g., for Signal messages). Logging PII in cleartext at the info level
creates a security and privacy risk, as these logs may be stored in
persistent storage, indexed by log management systems, or accessible to
unauthorized personnel.

Changes:
- Info level: logs only message_id (UUID) for tracking
- Debug level: logs user_id, channel, thread_id for troubleshooting

This maintains debugging capability for developers while protecting user
privacy in production logs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* chore: sync main into staging (#855)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: Chat input is hidden in mobile browser mode (#877)

* fix: stop XML-escaping tool output content (#598) (#874)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stop XML-escaping tool output content in wrap_for_llm (#598)

Remove content escaping that corrupted JSON in tool output. The
<tool_output> structural boundary is preserved but content now passes
through raw, fixing downstream parse failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(safety): allow empty string tool params (#848)

* fix(safety): allow empty string tool params

* fix(safety): preserve heuristic checks and add path context to tool validation

This follow-up refactor addresses PR review feedback by restoring
heuristic checks (whitespace ratio, character repetition) for tool
parameter validation and improving error reporting.

Changes:
- Restored heuristic warnings in validate_non_empty_input so they apply
  to both user input and tool parameters (when non-empty).
- Refactored check_strings to recursively build and pass JSON paths
  (e.g., "metadata.tags[1]").
- Updated validation errors to use the specific JSON path as the field
  name instead of the generic "input".
- Added regression tests for whitespace/repetition warnings and JSON
  path reporting in tool parameters.

This ensures the safety layer remains semantically neutral about empty
strings (fixing the memory_tree path: "" issue) while maintaining
rigorous protection and providing better developer ergonomics.

* style: run cargo fmt

* perf: optimize release and dist build profiles (#843)

* perf: optimize release and dist build profiles

Add [profile.release] with strip=true and panic="abort" for smaller,
faster release binaries. Upgrade [profile.dist] from lto="thin" to
lto="fat" with codegen-units=1 for maximum optimization in CI releases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove panic=abort from release profile

Reviewers (zmanian, Copilot, Gemini) correctly flagged that panic=abort
in the release profile would kill the entire process on any tokio task
panic, breaking fault isolation for the long-running server. Removed
from release profile entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add PR template with risk assessment (#837)

* feat: add PR template with risk assessment and review tracks

Add a pull request template that includes summary, change type,
validation checklist, security/database impact sections, blast radius,
and rollback plan. Update CONTRIBUTING.md with review track definitions
(A/B/C) based on change risk level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: expand CONTRIBUTING.md with setup, workflow, and guidelines

Add getting started, development workflow, code style summary,
database change guidance, and dependency management sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add fuzzing targets for untrusted input parsers (#835)

* feat: add fuzzing targets for untrusted input parsers

Add cargo-fuzz infrastructure with 5 fuzz targets exercising
security-critical code paths:

- fuzz_safety_sanitizer: Aho-Corasick + regex injection detection
- fuzz_safety_validator: Input validation (length, encoding, patterns)
- fuzz_leak_detector: Secret leak scanning (API keys, tokens)
- fuzz_tool_params: Tool parameter JSON validation
- fuzz_config_env: TOML/JSON config parsing

Each target exercises real IronClaw business logic with invariant
assertions. Includes corpus directories and setup documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: improve fuzz targets to exercise real IronClaw code paths

- fuzz_config_env: exercise SafetyLayer end-to-end (sanitize, validate,
  policy check) instead of generic TOML/JSON parsing
- fuzz_tool_params: add validate_tool_schema coverage alongside
  validate_tool_params
- Add "fuzz" to workspace exclude in root Cargo.toml
- Update README descriptions to match actual target behavior

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace redundant detect() call with meaningful invariant assertion

Replace the double sanitize()+detect() call with an assertion that
critical severity warnings always trigger content modification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: rewrite fuzz_config_env to exercise IronClaw safety code directly

Replace SafetyLayer wrapper usage with direct Sanitizer, Validator, and
LeakDetector instantiation and invocation. Adds meaningful consistency
assertions (non-empty output, valid-means-no-errors, scan/clean agreement).
Removes the config construction that was only exercising struct instantiation.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(wasm): run leak scan before credential injection in tools wrapper (#791)

* fix(wasm): run leak scan before credential injection in tools wrapper

The tools WASM wrapper runs the LeakDetector on HTTP request headers
AFTER inject_host_credentials() has already substituted real secrets
(e.g., xoxb- Slack bot tokens). This causes the leak detector to
flag the tool's own legitimate outbound API calls as secret exfiltration.

Move the scan to run on raw_headers before any credential injection,
matching the fix already applied to the channels wrapper in #421.

Fixes the same class of bug as #421 (which only fixed channels/wasm/wrapper.rs).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: inline leak scan to avoid Vec allocation on every HTTP request

Address review feedback: instead of cloning all header keys/values into
a Vec to pass to scan_http_request(), iterate over raw_headers directly
using scan_and_clean(). This also provides more specific error messages
(URL vs header vs body).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix cargo fmt formatting in leak scan loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(setup): drain residual terminal events before secret input (#747) (#849)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: skip the regression check
[skip-regression-check]

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* feat(agent): add context size logging before LLM prompt (#810)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat(agent): add context size logging before LLM prompt

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>

* fix: preserve text before tool-call XML in forced-text responses (#852)

* fix: preserve text before tool-call XML in forced-text responses (#789)

Local models (Qwen3, DeepSeek, GLM) emit <tool_call> XML even when no
tools are available (force_text mode). The existing strip_xml_tag()
discards everything from an unclosed opening tag onward, producing an
empty string that triggers the "I'm not sure how to respond" fallback.

Add truncate_at_tool_tags() — a code-region-aware pre-processing step
that truncates at the first tool-call XML tag BEFORE clean_response()
runs, preserving all useful text before the tag. Protect all 7
clean_response() call sites. Case-insensitive matching handles models
that emit <TOOL_CALL> or <Tool_Call> variants.

Secondary fix: add has_native_thinking() model detection to skip
<think>/<final> system prompt injection for models with built-in
reasoning (Qwen3, QwQ, DeepSeek-R1, GLM-Z1, etc.), preventing
thinking-only responses that clean to empty.

Wire with_model_name(active_model_name()) at all 9 production sites
that construct Reasoning, so the runtime model name (not static config)
drives system prompt generation.

126 new/updated tests covering truncation edge cases, code-block
awareness, Unicode, case-insensitivity, StubLlm integration for
complete/plan/evaluate_success/respond_with_tools paths, model
detection, and conditional system prompt generation.

Closes #789

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Copilot review — unclosed-only truncation, ASCII case folding

- truncate_at_tool_tags() now only truncates at UNCLOSED tool tags;
  properly closed tags (e.g. <tool_call>...</tool_call>) are left intact
  for clean_response() to strip normally, preserving any text after them
- Switch from to_lowercase() to to_ascii_lowercase() to prevent byte
  offset misalignment with non-ASCII characters whose lowercase form
  has different byte length (e.g. Kelvin sign U+212A)
- Add closing_tag_for() helper to derive closing tags from open patterns
- Fix doc comment: "fenced markdown code blocks or inline code spans"
  (not "indented", which find_code_regions() doesn't detect)
- Add regression tests: closed vs unclosed for each tag variant,
  Unicode + case-insensitive offset safety, and mixed closed/unclosed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: minor review items — consistent ascii_lowercase, closing_tag_for tests

- Switch has_native_thinking() from to_lowercase() to to_ascii_lowercase()
  for consistency with truncate_at_tool_tags() approach
- Add unit tests for closing_tag_for(): standard tags, space-suffixed
  patterns, pipe-delimited tags, and exhaustive coverage of all
  TOOL_TAG_PATTERNS entries
- Add test for mixed closed+unclosed tags of different types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Feat/docker shell edition (#804)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers (#795)

* feat(llm): per-provider unsupported parameter filtering (#749, #728) (#809)

Add declarative `unsupported_params` field to provider definitions in
providers.json. Parameters listed are stripped from requests before
sending, preventing 400 errors from providers that reject them (e.g.
gpt-5 family and kimi-k2.5 rejecting custom temperature values).

- Add `unsupported_params` to ProviderDefinition and RegistryProviderConfig
- Propagate from registry through config resolution
- Generic strip helpers handle temperature, max_tokens, stop_sequences
- Apply filtering in RigAdapter and AnthropicOAuthProvider
- Mark openai and tinfoil providers as unsupporting temperature
- Update openai default model to gpt-5-mini

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): strip top-level null params before forwarding to MCP servers

LLMs frequently emit `"field": null` for optional parameters in tool
calls. Many MCP servers reject explicit nulls for fields that should
simply be absent — e.g. Notion returns 400 for `"sort": null` in a
search call, expecting the field to be omitted entirely.

Strip top-level null keys from the params object before calling
`call_tool()`. Only top-level keys are stripped; nested nulls are
preserved since they may be semantically meaningful.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Add event-triggered routines and workflow skill templates (#756)

* Add event-triggered routines and workflow skill templates

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for event_emit security and quality

Security fixes:
- Require approval (UnlessAutoApproved) for event_emit, matching routine_fire
- Enable sanitization on event_emit payload (external JSON reaches LLM)
- Remove user_id parameter from event_emit to prevent IDOR — always use ctx.user_id

Correctness fixes:
- Rename source → event_source in event_emit for consistency with routine_create
- Use json_value_as_filter_string for filter parsing (handles numbers/booleans)
- Case-insensitive matching for event source and event_type
- Add debug logging for missing filter keys in payload
- Fix skill_install_routine_webhook_sim test missing .with_skills()
- Fix schema_validator test for event_emit payload properties

Code quality:
- Move EventEmitTool struct/impl after RoutineHistoryTool (fix split layout)
- Deduplicate routine_to_info into RoutineInfo::from_routine in types.rs
- Add test section headers in e2e_routine_heartbeat.rs
- Clarify event_emit description to specify system_event routines only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make routine_system_event_emit test create routine before emitting

- Add routine_create step to trace fixture so event_emit has a matching
  routine to fire
- Assert fired_routines > 0, not just key presence (Copilot review)
- Add .with_auto_approve_tools(true) since event_emit now requires approval

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: renumber test headers after system_event test insertion

Test 4 was duplicated (routine_cooldown and heartbeat_findings).
Renumber heartbeat_findings to Test 5 and heartbeat_empty_skip to Test 6.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: merge staging and add missing RoutineEngine args in test

RoutineEngine::new on staging requires `tools` and `safety` params.
Update system_event_trigger_matches_and_filters test to pass them.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address new Copilot review comments

- Add .with_auto_approve_tools(true) to skill_install_routine_webhook_sim
  test so event_emit doesn't block on approval
- Fix module-level doc comment for event_emit to specify system_event trigger

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: deduplicate json_value_as_string helper

Remove private `json_value_as_string` from routine_engine.rs and use
the identical public `json_value_as_filter_string` from routine.rs,
eliminating divergence risk. (Copilot review)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: enable WASM credential injection in No-DB environments (#845)

* fix(wasm): enable credential injection in no-DB environments via env var fallback

When a secrets store is unavailable (e.g. no-DB mode), WASM channel
credentials were silently not injected, causing channels to start without
credentials. Fix by:

- Changing `inject_channel_credentials_from_secrets` to accept
  `Option<&dyn SecretsStore>` — secrets store is tried first when present
- Adding env var fallback (`inject_env_credentials`) for credentials not
  covered by the secrets store
- Enforcing a channel-name prefix security check on env var names to
  prevent WASM channels from reading unrelated host credentials
  (e.g. `AWS_SECRET_ACCESS_KEY`)
- Extracting pure `resolve_env_credentials` helper for testability
- Adding case-insensitive prefix matching for secrets store lookup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wasm): inject credentials at startup when no secrets store (setup.rs path)

The startup path (setup_wasm_channels -> register_channel) was guarded by
`if let Some(secrets) = secrets_store`, so in No-DB mode credentials were
never injected and the channel started without them.

Fix by:
- Changing inject_channel_credentials to accept Option<&dyn SecretsStore>
- Always calling it (removing the if-let guard) — env var fallback runs
  even when secrets_store is None
- Adding channel-name prefix security check to the env var fallback path
  (e.g. TELEGRAM_ for channel "telegram"), consistent with manager.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): correct misleading comment on ICTEST1_UNRELATED_OTHER placeholder

* fix(wasm): guard against empty channel name in credential injection

An empty channel_name would produce prefix "_", allowing any env var
starting with "_" to pass the security check and be injected. Add an
early-return guard in resolve_env_credentials, inject_env_credentials,
and inject_channel_credentials. Add a test to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: lizican123 <lizican123@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: promote to main (#878)

* fix: replace unsafe env::set_var with thread-safe inject_single_var in SIGHUP handler

Fixes race condition where SIGHUP handler modifies global environment variables
while other threads may be reading them via Config::from_env().

Changes:
- Replace unsafe { std::env::set_var() } with ironclaw::config::inject_single_var()
- Uses INJECTED_VARS mutex instead of unsafe global state modification
- All reads via optional_env() check the thread-safe overlay first
- Prevents data races between SIGHUP reload and concurrent config reads

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: spawn webhook restart as background task to avoid blocking I/O across lock

Prevents holding Mutex lock during async I/O operations (TcpListener::bind,
task shutdown). The SIGHUP handler no longer blocks webhook processing during
listener restart.

Changes:
- Read old_addr and drop lock immediately
- Spawn restart_with_addr() as background task via tokio::spawn
- Lock is only held during the actual restart operation, not the signal handler

Benefits:
- SIGHUP handler returns immediately without blocking
- Webhook requests not delayed by listener restart I/O
- Lock contention significantly reduced

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: add graceful shutdown mechanism for SIGHUP handler background task

Prevents unbounded loop without cancellation token. The SIGHUP handler now
listens for a shutdown signal and exits cleanly during graceful termination.

Changes:
- Create broadcast channel for shutdown signaling
- SIGHUP handler uses tokio::select! to wait for shutdown or SIGHUP
- Send shutdown signal to all background tasks after agent.run() completes
- Ensures clean task lifecycle and no orphaned background tasks

Benefits:
- Proper task cancellation during graceful shutdown
- Follows Tokio best practices for background task management
- No background tasks orphaned when runtime shuts down

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: replace stringly-typed parameter filtering with typed enum and single helper

Fixes DRY violation where unsupported parameter filtering was duplicated across
rig_adapter.rs and anthropic_oauth.rs using string contains checks.

Changes:
- Add UnsupportedParam typed enum in provider.rs (Temperature, MaxTokens, StopSequences)
- Create strip_unsupported_completion_params() helper function
- Create strip_unsupported_tool_params() helper function
- Update rig_adapter.rs to use shared helpers
- Update anthropic_oauth.rs to use shared helpers
- Replace 60+ lines of duplicate stringly-typed logic

Benefits:
- Type safety: parameter names checked at compile time
- Single source of truth: adding a new param updates one place
- Reduced maintenance burden: no duplicate logic to keep in sync
- Better code clarity: named enum variant is self-documenting

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs: clarify intentional parameter asymmetry between completion and tool requests

Add documentation explaining why strip_unsupported_tool_params does not handle
StopSequences: the field doesn't exist in ToolCompletionRequest.

Changes:
- Add clarifying comments to strip_unsupported_tool_params()
- Explain why StopSequences is only in CompletionRequest
- Note that ToolCompletionRequest only supports Temperature and MaxTokens
- Inline comment confirms no action needed for StopSequences

This addresses the appearance of incomplete implementation without changing logic,
as the asymmetry is intentional and correct (ToolCompletionRequest lacks the field).

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* perf: isolate webhook_secret to reduce lock contention on hot path

Move webhook_secret from shared HttpChannelState RwLock into its own Arc<RwLock<>>.
This eliminates contention between secret validation and other state operations.

Changes:
- Change webhook_secret field type from RwLock<Option<SecretString>> to Arc<RwLock<Option<SecretString>>>
- Update initialization in HttpChannel::new()
- Update comments to explain isolation rationale

Benefits:
- Reduce lock contention on webhook request hot path (secret validation)
- Rarely-changing field (SIGHUP only) isolated from frequent state accesses
- Other state operations (tx, pending_responses) no longer wait behind secret reads
- Minimal code change: only field declaration and initialization

The Arc wrapper allows cloning the RwLock handle to separate concerns. With this
change, every webhook request acquires its own isolated lock for secret validation,
not the shared HttpChannelState lock. This scales better under high request volume.

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix: prevent partial state corruption on SIGHUP restart failure

Ensure atomicity of configuration reload: if webhook listener restart fails,
secret update is skipped to prevent inconsistent state.

Changes:
- Wait for restart_with_addr() to complete (don't spawn background task)
- Track restart result with restart_failed flag
- Only update secret if restart succeeded or wasn't needed
- Ensure listener and secret stay synchronized

Problem addressed:
- Before: restart spawned as background task, secret updated immediately
- If restart failed, secret was changed but listener still on old address
- This left system in inconsistent state (partial corruption)

Solution:
- Make restart blocking (SIGHUP handler can wait, it's not on request hot path)
- Atomically update secret only after successful restart
- Flag prevents race between restart and secret update

Benefits:
- Configuration changes are atomic (both succeed or both fail together)
- No partial state corruption on restart failure
- Failed restarts don't silently leave inconsistent state
- Secret and listener address stay in sync

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* refactor: generalize hot-secret-swapping with ChannelSecretUpdater trait

Decouple SIGHUP handler from HTTP channel internals by introducing a trait
for channels that support zero-downtime secret updates.

Changes:
- Add ChannelSecretUpdater trait in channels/channel.rs
- Implement ChannelSecretUpdater for HttpChannelState
- Export trait from channels module
- Update SIGHUP handler to use trait-based secret updater collection
- Replace explicit HTTP channel knowledge with generic updater loop

Benefits:
- SIGHUP handler no longer depends on HttpChannelState details
- Tight coupling removed: main.rs doesn't need HTTP channel imports
- Extensible: new channels can opt-in by implementing the trait
- Scalable: multiple channels supported without main.rs changes
- Maintainable: adding channels requires only trait implementation, not SIGHUP handler edits

Pattern:
- ChannelSecretUpdater trait defines the interface for all updaters
- Channels that support hot-secret-swapping implement the trait
- SIGHUP handler loops through all registered updaters generically

Verification:
- All 2,787 tests pass
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* feat: validate parameter names at deserialization time, not just tests

Add custom serde deserializer for unsupported_params that validates parameter
names at runtime when loading providers.json (or user overrides).

Changes:
- Add unsupported_params_de module with custom deserializer
- Only allows: "temperature", "max_tokens", "stop_sequences"
- Invalid parameter names cause immediate deserialization error
- Update ProviderDefinition to use custom deserializer
- Enhanced test with explicit parameter name validation
- Add new test that verifies invalid parameters are rejected

Problem solved:
- Before: Invalid param names (e.g., "temperrature") silently ignored
- Now: Rejected at deserialization time with clear error message
- Prevents runtime failures caused by typos in configuration

Example error:
  unsupported parameter name 'temperrature': must be one of: temperature, max_tokens, stop_sequences

Benefits:
- Fail-fast: errors caught when loading config, not at runtime
- Clear feedback: error message lists valid parameter names
- Type safety: validators run during deserialization
- Configuration errors detected immediately, not silently ignored

Verification:
- All 2,788 tests pass (including new validation test)
- Zero clippy warnings
- Code compiles successfully

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>

* merge: resolve conflicts for PR #800 and #822 into staging (#881)

* fix(ci): secrets can't be used in step if conditions [skip-regression-check] (#787)

GitHub Actions step-level `if:` doesn't have access to `secrets` context.
Replace `if: secrets.X != ''` with `continue-on-error: true` and let
the Set token step handle the fallback.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ci): clean up staging pipeline — remove hacks, skip redundant checks [skip-regression-check] (#794)

- Remove continue-on-error from staging-ci.yml app token steps (secrets are configured)
- Skip test.yml and code_style.yml on PRs targeting staging (staging-ci.yml
  already runs tests before promoting, promotion PR gets full CI on main)
- Allow ironclaw-ci[bot] in Claude Code review for bot-created promotion PRs

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): run fmt + clippy on staging PRs, skip Windows clippy [skip-regression-check] (#802)

- Remove branches:[main] filter from code_style.yml so it runs on all PRs
- Gate clippy-windows with `if: github.base_ref == 'main'` (skip on staging PRs)
- Update rollup job to allow skipped clippy-windows
- Simplify claude-review.yml to only trigger on labeled event (avoids duplicate runs)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: persist user_id in save_job and expose job_id on routine runs (#709)

* feat: persist worker events to DB and fix activity tab rendering

In-process Worker (used by Scheduler::dispatch_job) now persists events
via save_job_event at key execution points: plan creation, LLM
responses, tool_use, tool_result, and job completion/failure/stuck.
Event data shapes match the container worker format so the gateway
activity tab renders them correctly.

Frontend: tool_result errors now show a red X icon with danger styling
instead of a silent empty output. The result event falls back to the
error field when message is absent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: wire RoutineEngine into gateway for direct manual trigger firing

Replace the message-channel hack in routines_trigger_handler with a
direct call to RoutineEngine::fire_manual(), ensuring FullJob routines
dispatch correctly when triggered from the web UI. Inject the engine
into GatewayState from Agent::run after construction.

Also persists user_id in save_job for both PG and libSQL backends,
removes the source='sandbox' filter so all jobs are visible, and
exposes job_id on RoutineRunInfo for the frontend job link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale gateway_state argument from Agent::new test call sites

The gateway_state parameter was removed from Agent::new during rebase
(replaced by post-construction set_routine_engine_slot), but three test
call sites still passed the extra None argument.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — restore sandbox source filter, remove blank lines

- Revert removal of `source = 'sandbox'` filter in all SandboxStore
  queries (8 sites across PG and libSQL). Sandbox-specific APIs should
  stay scoped to sandbox jobs; unified job listing for the Jobs tab
  should use a separate query path.
- Remove extra blank lines in agent_loop.rs and worker.rs that caused
  formatting CI failure.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review — regenerate Cargo.lock, add user_id regression test

- Regenerate Cargo.lock from main's lockfile to eliminate dependency
  version downgrades (anyhow, syn, etc.) that were churn from rebase.
- Add regression test verifying user_id round-trips through save_job
  and get_job in the libSQL backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: remove trailing blank line in libsql jobs.rs

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add Postgres-side regression test for user_id persistence in save_job

Mirrors the existing libSQL test (test_save_job_persists_user_id) for the
Postgres backend. Gated behind #[cfg(feature = "postgres")] + #[ignore]
since it requires a running PostgreSQL instance (integration tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: unify three agentic loops into single AgenticLoop engine (#654)

Replace three independent copy-pasted agentic loops (dispatcher, worker,
container runtime) with a single shared engine in `agentic_loop.rs` that
all consumers customize via the `LoopDelegate` trait.

Phase 1 — Shared engine (`src/agent/agentic_loop.rs`, 205 lines):
  - `run_agentic_loop()` owns the core LLM → tool exec → repeat cycle
  - `LoopDelegate` trait (Send + Sync, &dyn dispatch) with 6 hook points
  - Tool intent nudge logic consolidated (was duplicated in 3 files)
  - Iteration limit + force-text behavior preserved

Phase 2 — Three delegate implementations:
  - `ChatDelegate` (dispatcher.rs): 3-phase approval flow, hooks, cost
    guard, context compaction, skill attenuation, interruption
  - `JobDelegate` (worker/job.rs): planning pre-loop phase, parallel
    JoinSet exec, mark_completed/stuck/failed, SSE streaming, self-repair
  - `ContainerDelegate` (worker/container.rs): sequential tool exec,
    HTTP-proxied LLM, container-safe tools, credential injection

Phase 3 — File moves and cleanup:
  - Delete `src/agent/worker.rs` — job logic moved to `src/worker/job.rs`
  - Rename `src/worker/runtime.rs` → `src/worker/container.rs`
  - Re-export `Worker`/`WorkerDeps` from `crate::worker` in `agent/mod.rs`
  - Update `scheduler.rs` imports to new worker location

Shared helpers (`src/tools/execute.rs`):
  - `execute_tool_with_safety()` replaces 4 copies of validate → timeout
    → execute → serialize
  - `process_tool_result()` replaces 3 copies of sanitize → wrap →
    ChatMessage (also used by thread_ops.rs approval resume paths)

Net result: -2,408 lines, zero duplicated loop logic, single code path
for tool intent nudge and completion detection.

Closes #654

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review feedback from Copilot

1. scheduler.rs: Replace `unwrap_or` fallback with proper error
   propagation when parsing tool output JSON — surfaces bugs instead
   of silently changing the output type.

2. worker/job.rs: Drop MutexGuard before the cancellation `.await` in
   `check_signals()` to avoid holding a lock across an async I/O call
   (prevents `await_holding_lock` lint).

3. worker/job.rs: Restore consecutive rate-limit counter
   (MAX_CONSECUTIVE_RATE_LIMITS = 10) so sustained rate limiting marks
   the job stuck with "Persistent rate limiting" instead of silently
   burning through max_iterations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: incorporate staging changes — token budget tracking + mark_failed

Merge staging's changes into the refactored JobDelegate:
- Add token budget tracking in call_llm (update_context/add_tokens)
- mark_stuck → mark_failed for iteration cap and rate-limit exhaustion
  (aligns with staging's #788 fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address zmanian's PR review — eliminate type erasure, clean up

Address all 6 review points from zmanian on PR #800:

1. Replace LoopOutcome::Custom(Box<dyn Any>) with typed
   LoopOutcome::NeedApproval(Box<PendingApproval>) — eliminates
   type erasure and downcast, resolves clippy large_enum_variant.

2. Remove dead max_tool_iterations field from ChatDelegate struct.

3. Add on_tool_intent_nudge() hook to LoopDelegate trait with
   implementations in Job and Container delegates for observability.

4. Fix SSE events in job worker to emit raw sanitized content
   instead of XML-wrapped <tool_output> tags.

5. Remove 4 duplicate completion tests from job.rs that were
   already covered by the shared util module.

6. Avoid logging full tool results — use result_size_bytes in
   debug logs (execute.rs, job.rs).

Also updates path references in CLAUDE.md, COVERAGE_PLAN.md,
and add-sse-event.md command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(doctor): expand diagnostics from 7 to 16 health checks

* test: add unit tests for agentic_loop and execute shared modules

Add 16 tests covering the two new critical shared modules:

agentic_loop.rs (10 tests):
- Text response exits loop immediately
- Tool call → text response continuation
- LoopSignal::Stop exits before LLM call
- LoopSignal::InjectMessage adds user message to context
- Max iterations terminates with LoopOutcome::MaxIterations
- Tool intent nudge fires twice then caps
- before_llm_call early exit bypasses LLM
- truncate_for_preview: short string, long string, multibyte safety

execute.rs (6 tests):
- execute_tool_with_safety success path
- Missing tool returns ToolError::NotFound
- Tool execution failure propagates
- Per-tool timeout enforcement (50ms)
- process_tool_result XML wrapping on success
- process_tool_result error formatting

All 2,777 unit tests pass, 0 clippy warnings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address code review — 9 issues across agentic loop, job worker, container

CRITICAL fixes:
- Rate-limit exhaustion now returns Err(LlmError::RateLimited) instead of
  Ok(Text("")), stopping the loop immediately with no ghost iteration.
  Below-threshold retries still use Text("") with an explicit empty-string
  guard in handle_text_response to skip injection.
- check_signals drains the entire message channel before returning,
  prioritizing Stop over UserMessage. Previously returned early on first
  UserMessage, silently dropping any queued Stop or additional messages.
- check_signals now detects all non-progressing job states (Cancelled,
  Failed, Stuck, Completed, Submitted, Accepted) instead of only
  Cancelled and Failed.

HIGH fixes:
- Error path in process_tool_result_job applies truncate_for_preview to
  bound error strings in SSE/DB events (was unbounded).
- Document Send+Sync lifetime constraint on LoopDelegate trait.
- Test mock before_llm_call refactored from double-lock to single lock
  acquisition, eliminating deadlock risk on refactor.

MEDIUM fixes:
- CompletionReport includes actual iteration count via shared
  Arc<Mutex<u32>> tracker (was hardcoded 0).
- process_tool_result_job return type changed from Result<bool> to
  Result<()> — the bool was always false (dead API).
- Deduplicate truncate in container.rs; now uses truncate_for_preview
  from agentic_loop.

Verified: 0 clippy warnings, 2781 tests pass, cargo fmt clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Henry Park <henrypark133@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
Co-authored-by: Umesh Kumar Singh <brijbiharisingh1971@outlook.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>

* Revert "Feat/docker shell edition" + fix fmt/clippy (#886)

* Revert "Feat/docker shell edition (#804)"

This reverts commit 1fc2b85fa70d8421a9395e69d491d0e8858046b8.

* style: fix formatting issues from revert

Run cargo fmt to fix formatting across 7 files after the revert of
the docker shell edition feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: centralize test credential constants into testing::credentials (#829)

* refactor: central…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: regular 2-5 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/web Web gateway channel scope: ci CI/CD workflows scope: llm LLM integration scope: setup Onboarding / setup scope: tool/wasm WASM tool sandbox size: M 50-199 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants