Skip to content

chore: promote staging to staging-promote/13774cc0-24076446124 (2026-04-07 11:19 UTC)#2108

Merged
henrypark133 merged 40 commits intostaging-promote/13774cc0-24076446124from
staging-promote/f765958f-24078644272
Apr 10, 2026
Merged

chore: promote staging to staging-promote/13774cc0-24076446124 (2026-04-07 11:19 UTC)#2108
henrypark133 merged 40 commits intostaging-promote/13774cc0-24076446124from
staging-promote/f765958f-24078644272

Conversation

@ironclaw-ci
Copy link
Copy Markdown
Contributor

@ironclaw-ci ironclaw-ci bot commented Apr 7, 2026

Auto-promotion from staging CI

Batch range: a55aff980a4e235590c3af57ded2542512e2f9f6..f765958fc3edb44096061039458f9414681fdfa7
Promotion branch: staging-promote/f765958f-24078644272
Base: staging-promote/13774cc0-24076446124
Triggered by: Staging CI batch at 2026-04-07 11:19 UTC

Commits in this batch (42):

Current commits in this promotion (1)

Current base: staging-promote/13774cc0-24076446124
Current head: staging-promote/f765958f-24078644272
Current range: origin/staging-promote/13774cc0-24076446124..origin/staging-promote/f765958f-24078644272

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

* fix(tools): gate claude_code and acp modes behind enabled flags (#1987)

CreateJobTool always exposed claude_code and acp as valid modes in its
schema and silently accepted them at runtime, even when
CLAUDE_CODE_ENABLED=false / ACP_ENABLED=false. This caused the LLM to
sometimes select disabled modes, spawning containers that fail.

- Add claude_code_enabled and acp_enabled flags to ContainerJobConfig
  (following the existing mcp_per_job_enabled pattern)
- Expose via ContainerJobManager accessors, query from CreateJobTool
  through the already-injected job_manager
- Dynamically build the mode enum in parameters_schema() — only show
  enabled modes; omit mode field entirely when only worker is available
- Conditionally include agent_name field only when ACP is enabled
- Add defense-in-depth guards in execute() rejecting disabled modes
  with ToolError::InvalidParameters
- Add 10 regression tests covering schema gating and runtime rejection

* fix(tools,web): address review feedback on mode gating (#2003)

- Remove hardcoded "Set mode to claude_code" from CreateJobTool description;
  mode guidance is already provided dynamically via parameters_schema()
- Add check_mode_enabled() guard in jobs_restart_handler to reject disabled
  modes on job restart via REST API, closing the bypass path
- Add ContainerJobManager::is_mode_enabled(mode) to centralize mode validation
- Clean up fully-qualified paths in jobs.rs with proper use imports
- 5 regression tests (description, restart rejection, is_mode_enabled)

* fix: harden mode gating with defense-in-depth and synchronous persistence

- Add ModeDisabled variant to OrchestratorError and guard inside
  ContainerJobManager::create_job() so disabled modes are rejected
  even if callers forget to validate
- Make job mode persistence synchronous instead of fire-and-forget
  to prevent silent mode loss on transient DB errors (restarts would
  silently downgrade to worker mode)
- Refactor parameters_schema() to build serde_json::Map directly,
  removing an unreachable if-let guard and an .expect() call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Rajul Bhatnagar <brajul@amazon.com>
Co-authored-by: serrrfirat <f@nuff.tech>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added scope: channel/web Web gateway channel scope: tool/builtin Built-in tools scope: orchestrator Container orchestrator size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 7, 2026
serrrfirat and others added 22 commits April 7, 2026 19:11
* fix(web): emit Done status after response to fix SSE ordering (#2079)

Move the terminal "Done" status out of thread_ops and emit it only
after the gateway successfully responds via a new respond_then_done()
helper in agent_loop. This guarantees the browser receives the
assistant message before the turn-closing event, preventing the web UI
from appearing stuck.

Adds a regression test asserting the response event is captured before
the Done status in the ordered event log.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(web): add frontend safety net for lost SSE response events (#2079)

Track whether a `response` SSE event was received for the current turn.
When "Done" arrives without a preceding response, schedule a
loadHistory() call after 1500ms so the user sees the answer even if
the response event was lost to broadcast lag or a brief disconnect.

This is the second prong of the fix described in #2079 — the backend
ordering fix alone prevents the race, but this fallback handles
residual edge cases (proxy buffering, SSE reconnection gaps).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — Done on all paths, fix frontend timer leaks

Backend:
- Send Done status when BeforeOutbound hook blocks the response, so the
  client still knows the turn is complete.
- Send Done status for empty/suppressed responses (e.g. approval handled
  via send_status) to match pre-refactor behavior.

Frontend:
- Set _turnResponseReceived on stream_chunk events so streaming
  responses don't trigger a spurious loadHistory() when Done arrives.
- Clear _doneWithoutResponseTimer on sendMessage() to prevent stale
  timers from a previous turn firing during the new one.
- Clear turn-tracking state on switchThread() to prevent cross-thread
  contamination of the timer and flag.
- Clear turn-tracking state on SSE reconnect (eventSource.onopen) to
  prevent stale timers from before the disconnect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review — always emit Done, extract helper, add test

- respond_then_done now emits Done regardless of respond outcome so the
  client always knows the turn ended, even on delivery failure
- Extract send_done() helper to deduplicate the inline Done+warn blocks
  in the hook-blocked and empty-response paths
- Add done_emitted_for_empty_response test covering the empty-response
  branch ordering invariant
- Lift 1500ms magic number to DONE_WITHOUT_RESPONSE_TIMEOUT_MS constant
- Add comment explaining _turnResponseReceived single-thread tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): suppress Done while awaiting approval; introduce HandleOutcome

Distinguish "no response, turn complete" from "no response, turn paused"
in handle_message's return type so the run loop can decide whether to
emit the terminal Done status. The previous code lumped both into
Ok(Some("")), causing v1 NeedApproval to incorrectly emit Done after
ApprovalNeeded — which then tripped the new web UI safety net and
triggered a spurious loadHistory() under the live approval prompt.

- New HandleOutcome enum with Shutdown / Respond / NoResponse / Pending
- SubmissionResult::NeedApproval now maps to HandleOutcome::Pending
- Bridge handlers wrapped via HandleOutcome::from_legacy (their approval
  flows return non-empty descriptive text, so they never need Pending)
- Regression test no_done_emitted_while_awaiting_approval drives a
  v1 Always-approval probe and asserts no Done is captured
- Repaired pre-existing done_emitted_for_empty_response test, which
  asserted the wrong invariant: the dispatcher substitutes empty LLM
  responses with a fallback message, so a truly empty response never
  reaches the run loop. Renamed and updated to assert the ordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>
* feat(slack): implement on_broadcast and fix message tool channel hints

Implement the on_broadcast callback for the Slack WASM channel, enabling
proactive message delivery to Slack channels/users via the message tool.
Previously this was a stub returning "not implemented".

The implementation:
- Uses the user_id parameter as the broadcast target (channel ID or user ID)
- Strips leading # from targets for convenience
- Warns when target doesn't look like a Slack ID (C/U/D/G prefix)
- Posts via chat.postMessage with host-injected Bearer token
- Tracks active threads for broadcast replies (consistent with on_respond)

Also fixes the message tool's channel parameter description to list 'slack'
alongside 'slack-relay' and clarifies that Slack targets must be IDs.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(slack): extract shared post helper, harden broadcast validation

Address review findings on the slack broadcast implementation:

- Extract `post_slack_message()` shared by `on_respond` and `on_broadcast`,
  eliminating ~40 lines of duplicated HTTP-call-then-parse logic.
- Log `track_active_thread` errors at Warn level instead of silently
  swallowing them with `let _ =` (restores observability lost in original).
- Make non-ID broadcast targets a hard error instead of a soft warning —
  consistent with the message tool schema that says "must be an ID, not a
  name".
- Fix empty-target error message to not assume a name was provided.
- `resolve_broadcast_target` now returns `&str` (avoids allocation).
- Add 2 tests covering the resolve+validate pipeline end-to-end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(slack): track broadcast message ts so replies are recognized

Address Gemini review: broadcast messages now track the Slack-returned
timestamp as an active thread (falling back from response.thread_id to
the posted message ts). This ensures that if a user replies to a
broadcast, the agent recognizes the reply as an active thread.

Also fix stale doc comment on looks_like_slack_id (was missing W prefix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(staging): repair 4 categories of CI test failures

1. Telegram token test flaky race: add env mutex guard so concurrent
   tests that override IRONCLAW_TEST_TELEGRAM_API_BASE_URL don't
   pollute the unguarded read in the colon-preservation test.

2. SSE/connection E2E tests: #sse-status element was removed from HTML
   and replaced with #sse-dot colored indicator. Update tests to check
   the dot's CSS class instead of text content. Add SSE-ready wait to
   the page fixture so chat tests don't race against connection setup.

3. Tool approval E2E tests: API unified legacy pending_approval and
   engine v2 gates into a single pending_gate response field. Update
   all E2E test helpers to use the correct field name.

4. WASM tar.gz extraction bug: canonicalized extension names use
   underscores (web_search) but release archives use hyphens
   (web-search.wasm). Accept both filename forms when extracting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback — stronger SSE signal, consistent naming

- SSE wait: use sseHasConnectedBefore JS flag (set in onopen) instead
  of checking #sse-dot CSS class, which defaults to connected state
  before SSE actually connects
- Rename _wait_for_pending_approval → _wait_for_pending_gate and
  _wait_for_no_pending_approval → _wait_for_no_pending_gate
- Update all docstrings/error messages to say pending_gate
- Deduplicate name.replace('_', '-') in tar.gz extraction and include
  both accepted filenames in the error message

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address second round of review feedback

- Add comment documenting invariant: canonical names use underscores,
  archives may use hyphens, reverse is not supported
- Fix quote wrapping in single-name error case
- Remove dead SEL["sse_status"] selector from helpers.py
- Simplify SSE wait: use window.sseHasConnectedBefore === true
  (fails fast on rename instead of silent 10s timeout)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use global binding for sseHasConnectedBefore, not window property

sseHasConnectedBefore is declared with let at global scope, which
does not create a window property. window.sseHasConnectedBefore
would always be undefined. Use typeof guard + direct reference instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): cap tar.gz entry pre-allocation to MAX_ENTRY_SIZE

The tar header's declared size is attacker-controlled. Without capping,
Vec::with_capacity could attempt a huge allocation and OOM before the
read_to_end take() limit kicks in.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make test_sse_status_shows_connected non-redundant, document flag reset

- test_sse_status_shows_connected now checks #sse-dot CSS class (visual
  indicator) instead of re-checking sseHasConnectedBefore which the
  page fixture already guarantees
- Add comment to test_sse_reconnect_after_disconnect explaining why
  sseHasConnectedBefore is reset and that the history-reload path is
  covered by test_sse_reconnect_preserves_chat_history

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Improve channel onboarding and Telegram pairing flow

* fix: remove dead restart_required code, fix review findings, and stabilize polling E2E test

- Remove restart_required/needs_restart dead code from 6 files (no real
  extension uses it; all channels hot-activate at runtime)
- Remove dead extensions.configuredRestart i18n key from all 3 locales
- Fix pairing test asserting wrong upsert semantics (test expected
  idempotent behavior but impl always rotates codes)
- Fix pairing test using expired code for approval (req.code -> req_again.code)
- Fix missing i18n fallback for auth.extensionTokenPlaceholder
- Validate setup_url scheme (https?://) before assigning to <a>.href
- Replace hardcoded English "Approve"/"Pairing code is required" with i18n keys
- Demote misleading "bot is open to all users" Telegram log from Warn to Debug
- Move polling E2E test to run first (polling loop dies during refresh_active_channel)
- Add poll_interval_ms config field to Telegram WASM channel
- Fix conversations.rs compilation (missing ? operator)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review comments (i18n regressions)

- Remove hardcoded pairing_instructions() function from server.rs;
  use onboarding metadata from ExtensionManager instead (fixes i18n
  regression where pairing instructions were always English)
- Restore i18n calls for stepper labels in renderWasmChannelStepper
  (was using hardcoded English strings instead of missions.step* keys)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restart polling loop on channel refresh, address Copilot review

- Add WasmChannel::ensure_polling() that stops any stale polling task
  and starts a fresh one from the on_start config
- Call ensure_polling() in refresh_active_channel after re-running
  on_start, fixing the root cause of the dead polling loop in E2E tests
- Move polling test back to its original position (no longer order-dependent)
- Fix requires_pairing in channel_onboarding_for_state to use
  channel_requires_pairing() instead of legacy owner_id-only check
- Add rel='noopener noreferrer' to all setup_url target=_blank links

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: collapse nested if-let to satisfy clippy collapsible_if

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: return promise from approvePairing, stop polling unconditionally in ensure_polling

- Add missing `return` before apiFetch in approvePairing() so callers
  can await/chain the result
- Move poll_shutdown_tx.take() before the enabled check in
  ensure_polling() so switching from polling to webhook stops the old
  polling task

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove trailing commas in JSON test fixtures after restart_required removal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(web): intercept approval text input ("yes"/"no"/"always") in chat

When a tool requires approval in the web UI, typing "yes", "no", or
"always" in the chat input now resolves the approval card directly
instead of sending a regular message. This prevents duplicate approval
prompts and "No pending approval" errors that occurred when text went
through the backend message pipeline.

The frontend intercepts approval keywords in sendMessage() and routes
them through sendApprovalAction() — the same code path as clicking
the Approve/Deny/Always buttons on the card.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: find most recent unresolved approval card for text interception

Address review feedback: instead of checking the last card and then
separately checking if it's resolved, find the most recent unresolved
card directly. Handles the edge case where the last card is resolved
(during 1.5s removal animation) but an earlier one isn't.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add E2E test for skip-resolved-card behavior

Addresses review feedback: adds a test where two approval cards are
visible, the newer one is resolved via button click, then typing "yes"
correctly targets the older unresolved card instead of falling through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n visibility bug (#2126)

Missions created via the agent were invisible in the gateway UI for
non-owner users because list_engine_missions/list_engine_threads fell
back to the engine owner's default_project_id instead of resolving
the authenticated user's per-user project.

Broader change: replace three inconsistent ownership patterns
(can_act_on, engine is_owned_by, raw string comparisons) with a
single Owned trait providing uniform is_owned_by(user_id) checks
across jobs, routines, and all internal code paths.

- Fix project_id resolution in list_engine_missions/list_engine_threads
- Add Owned trait to src/ownership with impls on AgentJobRecord,
  SandboxJobRecord, Routine, and JobContext
- Migrate all can_act_on calls in web handlers (jobs.rs, routines.rs)
- Migrate raw user_id comparisons in tenant.rs, commands.rs,
  routine_engine.rs, context/manager.rs, tools/builtin/job.rs
- Add missing ownership check in routines_trigger_handler
- Migrate active routines_runs_handler and verify_project_ownership
  in server.rs
- Remove dead can_act_on function and ownership_identity helper
- Add regression tests for concrete Owned impls on real types

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…2129)

* fix(e2e): canonicalize extension names in configure/setup + update tests

1. Bug fix: `configure()` and `get_setup_schema()` in ExtensionManager
   called `validate_extension_name()` which discards the canonical form.
   Hyphenated names from URL paths (e.g., `web-search`) were used raw
   for capabilities file lookups, causing "Capabilities file not found"
   errors. Now both methods canonicalize with `canonicalize_extension_name()`
   so `web-search` → `web_search` before any file I/O.

2. Bug fix: `extensions_setup_handler` in server.rs compared raw URL
   path param against canonical stored names in the kind lookup.

3. Test fix: `test_wasm_lifecycle.py` — update all `web-search` refs
   to `web_search` (canonical form returned by registry/API).

4. Test fix: `test_mcp_auth_flow.py` — update `mock-mcp` / `mock-mcp-400`
   to `mock_mcp` / `mock_mcp_400`.

5. Test fix: `test_chat.py` — `test_send_message_and_receive_response`
   now counts assistant messages before sending to avoid picking up the
   pre-existing onboarding greeting as the LLM response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use centralized SEL selector in wait_for_function JS call

Address review feedback: pass the selector from SEL dict into the
JS function instead of hardcoding '#chat-messages .message.assistant'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use send_chat_and_wait_for_terminal_message helper for robustness

Replace manual count+wait_for_function with the existing helper that
correctly handles streaming chunks and waits for the terminal message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…2130)

* fix(engine): repair mission ACL regression and 4 stale engine tests

The mission access control in pause_mission/resume_mission had a
logic error that allowed any user to manage shared/system missions.
The `&& !mission.owner_id().is_shared()` condition short-circuited
the entire check for shared missions. Replaced with proper two-branch
logic using is_shared_owner().

Also fixed 4 test assertions that checked thread.messages instead of
thread.internal_messages (orchestrator stores working messages in the
internal transcript), and made the trace test resilient to event
ordering by filtering by EventKind.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(engine): extend ACL fix to update_mission, error on missing missions

Address review feedback:
- Fix same shared-mission ACL bug in update_mission (line 130)
- pause_mission/resume_mission now error on missing missions instead
  of silently proceeding, matching update_mission's pattern
- Update doc comments to reflect that the engine enforces shared
  ownership checks directly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ownership): remove silent cross-tenant credential fallback in WASM wrappers (#2069, #2070)

WASM tool credential resolution silently fell back to looking up secrets
under the hardcoded "default" scope when the calling user had no credential
configured, leaking the instance owner's API keys to other users without
error or audit trail.

- Remove "default" fallback in resolve_host_credentials(); return
  Err(ToolError::NotAuthorized) with actionable message instead of
  silently skipping missing credentials
- Fix resolve_websocket_identify_message() to accept owner_scope_id
  parameter instead of hardcoding "default"
- Document legacy broadcast metadata fallback with removal tracking
- Document setup.rs boot-time owner_id lookups as intentional
  instance-level resource ownership
- Add regression tests proving cross-tenant credentials do not leak

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(review): differentiate expired vs missing credential errors, exclude UrlPath from store check

Address PR review feedback:
- Filter out UrlPath credentials in the no-store check so tools with
  only UrlPath mappings don't incorrectly get NotAuthorized
- Match SecretError::Expired separately to produce "has expired" message
  instead of misleading "not found"
- Add tests for both: UrlPath-only no-store (Ok), expired credential
  (specific error message)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* fix(review): map backend SecretErrors to ExecutionFailed, not NotAuthorized

Address @serrrfirat review: Database, DecryptionFailed, KeychainError,
and other backend errors were incorrectly mapped to "not found". Now:
- NotFound → ToolError::NotAuthorized ("not found, configure via secrets set")
- Expired → ToolError::NotAuthorized ("has expired, refresh or re-set")
- All others → ToolError::ExecutionFailed (preserves real cause)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(review): AccessDenied → NotAuthorized, fix issue refs, reduce visibility

- Map SecretError::AccessDenied to ToolError::NotAuthorized (not
  ExecutionFailed) since it's an authorization failure
- Update legacy fallback comments to reference #2100 (the tracking
  issue) instead of #2069
- Revert resolve_websocket_identify_message to private — test uses
  super:: import instead of pub(crate) path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(review): redact secrets in Debug impl, document all-or-nothing invariant

- Replace #[derive(Debug)] on ResolvedHostCredential with custom impl
  that redacts secret_value and auth headers to prevent latent leakage
- Add comment documenting that all declared non-UrlPath credentials are
  required — tool execution fails on first missing credential rather
  than running with partial auth

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
* chore(engine): rename ENGINE_V2_TRACE to IRONCLAW_RECORD_TRACE

Aligns the trace-recording env var with the project-wide IRONCLAW_*
naming convention so it's discoverable alongside other ironclaw flags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* unify engine v2 trace recording with v1 RecordingLlm

Address PR review feedback. Instead of having two separate trace
systems both named IRONCLAW_RECORD_TRACE (the v1 RecordingLlm in
src/llm/recording.rs and the engine v2 executor/trace.rs JSON dumper),
collapse them to one.

Engine v2's LlmBackend is wired to the host's full LLM provider chain,
which already includes RecordingLlm when IRONCLAW_RECORD_TRACE=1.
That means engine v2 LLM interactions are already captured by the
unified trace_*.json fixture file -- no engine-side env var, no second
JSON output, no risk of one flag enabling two divergent recorders.

Removed:
- is_trace_enabled() and write_trace() from executor/trace.rs
- the engine_trace_*.json write site in runtime/manager.rs

Kept (still useful, runs unconditionally for the self-improvement
mission):
- build_trace() / analyze_trace() / log_trace_summary()

Docs updated to point to RecordingLlm as the single trace mechanism.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ner (#2042)

* test: add Slack E2E tests, Rust integration tests, and smoke runner

Replicate the Telegram test infrastructure for the Slack WASM channel:
- Add Slack URL rewriting in wrapper.rs for test API redirection
- Create fake_slack_api.py mock server for E2E tests
- Add 12 Python E2E tests covering setup, DM, mentions, auth, threads, files
- Add 12 Rust integration tests for WASM channel behavior
- Add conftest.py fixtures for isolated Slack test instances
- Add local smoke test runner for pre-release validation with real Slack

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: wrap env::set_var/remove_var in unsafe blocks for Rust 1.83+

CI uses Rust 1.94 which requires unsafe blocks for std::env::set_var
and std::env::remove_var. Wrap the test-only calls in unsafe blocks
with safety comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback

- Replace fragile time.time()-1 fallback with explicit SmokeError in
  run_smoke.py attachment case (reviewer finding #1)
- Add OnceLock<Mutex> guard around env var mutation in wrapper.rs unit
  test to prevent parallel test races (reviewer finding #2)
- Extract duplicated git-worktree discovery into find_project_file()
  helper in slack_auth_integration.rs (reviewer finding #3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(channels): generalize WASM HTTP test rewrites

* fix(channels): gate Slack test URL rewrites from release builds

* fix(ci): update wrapper test pairing store ctor

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* chore(ci): add Dependabot and pin GitHub Actions by SHA

Add automated dependency vulnerability scanning via Dependabot for both
Cargo crates (weekly) and GitHub Actions (weekly). Pin all 101 external
action references across 14 workflow files to full commit SHAs to prevent
supply-chain attacks via compromised tags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): harden workflows — persist-credentials, permissions, template injection

Address zizmor security audit findings:
- Add persist-credentials: false to all checkout steps (artipacked)
- Add explicit minimal permissions to all workflows (excessive-permissions)
- Move workflow-level write permissions to job level where possible
- Fix template injection in regression-test-check.yml by using env vars

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): address PR review comments

- Group Dependabot updates by ecosystem to reduce PR noise (gemini)
- Add persist-credentials: false to docker.yml checkout (Copilot)
- Move inputs.tag and other expansions to env vars in docker.yml to
  eliminate template injection from workflow_dispatch user input

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): restore git push auth and harden git fetch

Address PR #2043 review comments:
- staging-ci create-promotion-pr: generate App token before checkout
  and pass it to checkout so 'git push origin "$BRANCH"' works
- staging-ci update-tag: re-enable credential persistence so the
  'staging-tested' tag force-push succeeds (job is internal-only)
- release update-registry-checksums: re-enable credential persistence
  so the checksum-update branch push succeeds
- regression-test-check: add '--' to git fetch to prevent refs that
  start with '-' from being interpreted as options

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): address serrrfirat PR review comments

- release.yml: move github.ref_name, needs.plan.outputs.tag, and
  needs.plan.outputs.tag-flag to env vars across plan, build-local-artifacts,
  build-global-artifacts, and host jobs. The tag pattern
  '[0-9]+.[0-9]+.[0-9]+*' has a trailing glob, so a tag like
  '1.2.3\$(curl evil)' could match and be shell-expanded.
- dependabot.yml: split Cargo groups into tokio-ecosystem, serialization,
  wasm, and everything-else to make regression bisection easier when
  CI fails on a Dependabot PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): add missing job-level permissions for gh CLI calls

- resolve-promotion-base: add pull-requests: read for 'gh pr list'
- gate: add checks: read for 'gh api .../commits/{sha}/check-runs'

Both were dropped when workflow-level permissions moved to job level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zaki Manian <zaki@iqlusion.io>
* feat: port ratatui tui onto staging

* Add TUI model picker for /model

* Fix TUI CI lint failures

* Format /tools output as vertical list

* Restore TUI approval modal on thread switch

* Re-emit pending approval events on follow-up messages

* Improve TUI thread handling and activity UI

* Sort TUI resume conversations by activity

* fix(tui): address PR review feedback

* Add TUI thread detail modal for activity sidebar

* feat(tui): improve conversation scrolling UX

- Mouse wheel: 1-line increments (was 3-line jumps)
- PageUp/PageDown: full-page scroll based on viewport height (was 5 lines)
- Add scrollbar widget on conversation right edge (track │, thumb ┃)
- Add "↓ N more ↓ End to return" indicator when scrolled up
- Add auto-follow (pinned_to_bottom) that disengages on scroll-up
  and re-engages when reaching bottom or pressing End
- Clamp scroll offset to valid range (can't scroll past content)
- Add End key binding to jump to bottom

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tui): use engine context pressure data for status bar

The context bar was using cumulative session tokens (total_input +
total_output) which grow unboundedly across turns, making the bar
always show 100% after a few exchanges. Now uses the actual context
window usage from ContextPressure events when available, falling back
to cumulative tokens only before the first engine update arrives.

Also syncs context_window from the engine's max_tokens so the limit
reflects the real model capability instead of name-based heuristics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tui): render markdown in thread detail modal

The thread detail modal was displaying raw markdown text (plain
line splitting). Now uses render_markdown() for proper formatting
of headers, lists, bold, code blocks, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tui): hydrate sidebar with engine threads and routines at startup

The TUI sidebar was empty until the first user message because
EngineThreadList and RoutineUpdate events were only sent after
processing a message. Now sends initial data right before the
message loop so the activity panel shows existing threads and
routines immediately on startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tui): use owner_id for engine thread hydration at startup

list_engine_threads filters by user_id, so passing "" matched no
threads. Now uses self.owner_id() which matches the TUI channel's
user_id, so threads are visible in the sidebar immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tui): fix CI — type errors and formatting in TUI tests

Wrap `started_at` and `updated_at` in `Some(...)` to match
`Option<DateTime<Utc>>` after upstream struct change, and run
`cargo fmt` on files with formatting drift.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): resolve clippy warnings — collapsible ifs and needless borrow

Collapse three nested `if` blocks into `if && let` chains and remove
a needless `&` on the `process_list_threads` call, all in agent_loop.rs.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): add live_harness.rs with updated StatusUpdate patterns

The live_harness.rs file was added to staging after this branch diverged.
When CI merges the PR into staging, the file uses old StatusUpdate patterns
that don't account for the new `detail` and `call_id` fields added by this
branch. Add the file with `..` rest patterns to fix the merge-time compile
errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(engine): add skill repair learning loop

* fix(engine): guard skill repair mission updates

* fix(engine): persist skill repair provenance

* fix(engine): address skill-repair PR review feedback

- Fix hex formatting: iterate GenericArray bytes individually instead of
  relying on Display impl which produces debug-like output
- Always recompute content hash from actual doc.content when archiving a
  revision to prevent drift from out-of-band writes
- Prune repair history on rollback to remove records for versions newer
  than the one being restored
- Combine collect_error_messages + collect_observed_actions into a single
  pass (collect_errors_and_actions) to avoid redundant event iteration
- Document bounded revision eviction policy (cap at 10)
- Add comment clarifying concurrent skill-repair / error-diagnosis triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(engine): constrain skill repair updates

* fix(engine): keep insights on completed threads

* style(engine): satisfy fmt and clippy on mission.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
* allow private local llm endpoints

* Fix private endpoint review issues

* Fix link-local clippy warning

* Tighten base URL validation follow-ups

* fix(config): non-blocking DNS validation and admin-only LLM key filtering

Address PR #1955 review feedback:

- Wrap to_socket_addrs() in tokio::task::block_in_place when called from
  a multi-threaded async runtime so the LLM utility handlers
  (/api/llm/test_connection, /api/llm/list_models) can no longer stall a
  worker thread on slow DNS. Synchronous callers (env config, CLI) are
  unaffected.

- Add ADMIN_ONLY_LLM_SETTING_KEYS + strip_admin_only_llm_keys helper as
  defense-in-depth: Config::from_db_with_toml and re_resolve_llm_with_secrets
  now take an is_operator flag and strip admin-only base-URL-bearing keys
  (llm_builtin_overrides, llm_custom_providers, ollama_base_url,
  openai_compatible_base_url) from the DB merge for non-operator users.
  This guards future per-user resolve paths and any pre-existing legacy
  rows from reactivating a private/loopback endpoint via the operator
  validation policy. Existing call sites pass true (owner_id is the
  operator scope).

Adds regression tests covering:
  * strip_admin_only_llm_keys removes all four keys, leaves others
  * validate_base_url is callable from a multi-thread tokio runtime
    without panicking on the strict short-circuit path
  * validate_operator_base_url remains callable from async handlers
  * re_resolve_llm filters admin-only keys when is_operator=false and
    keeps them when is_operator=true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(settings): de-dup admin-only LLM key list (#1955 review)

`handlers/settings.rs::is_admin_only_setting_key` now delegates to
`crate::config::helpers::ADMIN_ONLY_LLM_SETTING_KEYS` so the write-side
gate cannot drift from the read-side `strip_admin_only_llm_keys` filter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(workspace): admin system prompt shared with all users (#2088)

Introduce SYSTEM.md in a well-known __admin__ scope so admins can set a
system prompt that all tenants receive. Gated behind multi-tenant mode
(WorkspacePool sets admin_prompt_enabled on each workspace; owner
workspace in app.rs also gets the flag when has_any_users() is true).

New endpoints:
- GET  /api/admin/system-prompt — read admin system prompt
- PUT  /api/admin/system-prompt — set admin system prompt (64 KB limit)

Safety:
- SYSTEM.md added to injection scan list
- is_reserved_scope() guard on user creation (defense-in-depth)
- Multi-tenancy gate on both API and prompt assembly layers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove review audit file from tracked files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add 64 KB size limit to admin system prompt PUT handler

Addresses PR review feedback:
- Enforce 64 KB limit on system prompt content to prevent token budget
  exhaustion (the content is injected into every user's system prompt)
- Add regression tests for the size limit (413 for oversized, not-413
  for at-limit)
- Document that is_multi_tenant is evaluated once at startup and the
  owner workspace requires a restart after the first user is created

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review feedback on admin system prompt

- Restore rustdoc comments stripped from document.rs (DocumentMetadata,
  HygieneMetadata, DocumentVersion, VersionSummary, PatchResult, etc.)
  to keep the diff focused on feature additions only
- Replace silent error swallowing (if let Ok) with discriminated match
  in admin prompt read — only DocumentNotFound is silent, other errors
  logged at debug! level
- Cache admin system prompt on WorkspacePool to avoid an extra DB read
  on every turn; invalidated on PUT via invalidate_admin_prompt()
- Add cache invalidation integration test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(workspace): tighten reserved-scope check and admin-prompt body limit

- is_reserved_scope: case-insensitive, whitespace-tolerant, and reserves
  the entire `__*__` namespace so future system scopes (alongside
  `__admin__`) cannot be impersonated by hand-crafted user IDs
- admin system-prompt route: layer-level DefaultBodyLimit of 128 KB
  rejects oversized payloads before JSON parse, complementing the
  in-handler 64 KB content cap
- system_prompt put_handler: clarify that the in-handler size check is
  a clearer-error fallback for the layer cap
- users_create_handler: drop the dead is_reserved_scope check on a
  freshly-minted UUID; the guard belongs at a code path that actually
  accepts user-supplied IDs
- expand is_reserved_scope tests for case, whitespace, and the wider
  `__*__` namespace

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
* Fix skill installs for invalid catalog names

* Fix clippy test module ordering

* fix: address PR review feedback

* fix: use PairingStore::new_noop() in SSRF test after merge with staging

The staging branch introduced a new test (test_http_request_rejects_private_ip_targets)
that calls PairingStore::new(), but this branch changed the signature to require
db and cache arguments. Use new_noop() since this is a test context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR #2040 review — remove expect() and dead strip_prefix

- Restructure download_key flow in skills_install_handler to use the
  value directly instead of round-tripping through Option + expect(),
  satisfying the no-expect-in-production-code rule.
- Remove dead strip_prefix("---\n") in render_skill_md — serde_yml does
  not emit a leading document marker for structs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(skills): preserve unknown frontmatter and tighten install matching

- rewrite install-recovery to mutate the `name` field via raw YAML
  Value rather than re-serializing the typed SkillManifest, so unknown
  frontmatter keys (vendor extensions, future fields) survive the
  install rewrite
- catalog_entry_is_installed: case-insensitive comparison for the
  display-name and normalized-slug branches, matching the slug branch
- normalize_skill_identifier: document non-ASCII handling
- normalizing-invalid-name log: warn -> debug (REPL/TUI rule)
- add round-trip test asserting unknown top-level keys, nested
  mappings, and sequences survive install recovery

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
The test used a hyphenated channel name ("test-failing-channel") but
canonicalize_extension_name() converts hyphens to underscores. This
caused configure() to look for "test_failing_channel.capabilities.json"
which didn't exist, returning an early Err before reaching the
activation code path the test was designed to exercise.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1770

chore: promote staging to staging-promote/bb2c3e1d-24154330911 (2026-04-08 21:41 UTC)
…6101

chore: promote staging to staging-promote/6895cdad-24185214226 (2026-04-09 13:31 UTC)
…4226

chore: promote staging to staging-promote/63a48e4e-24182836482 (2026-04-09 10:24 UTC)
…6482

chore: promote staging to staging-promote/288fe49a-24110798843 (2026-04-09 09:26 UTC)
…8843

chore: promote staging to staging-promote/79c1b0fd-24108317021 (2026-04-08 00:16 UTC)
…7021

chore: promote staging to staging-promote/86c15903-24100112892 (2026-04-07 22:53 UTC)
…2892

chore: promote staging to staging-promote/00fd2e88-24092158668 (2026-04-07 19:23 UTC)
…8668

chore: promote staging to staging-promote/f765958f-24078644272 (2026-04-07 16:22 UTC)
@henrypark133 henrypark133 merged commit b0f4a2b into staging-promote/13774cc0-24076446124 Apr 10, 2026
12 of 14 checks passed
@github-actions github-actions bot removed the size: L 200-499 changed lines label Apr 10, 2026
@henrypark133 henrypark133 deleted the staging-promote/f765958f-24078644272 branch April 10, 2026 22:47
@github-actions github-actions bot added size: XL 500+ changed lines size: L 200-499 changed lines scope: agent Agent core (agent loop, router, scheduler) scope: channel Channel infrastructure scope: channel/wasm WASM channel runtime scope: tool Tool infrastructure scope: tool/wasm WASM tool sandbox scope: tool/builder Dynamic tool builder scope: db Database trait / abstraction scope: db/postgres PostgreSQL backend scope: workspace Persistent memory / workspace scope: extensions Extension management scope: setup Onboarding / setup scope: pairing Pairing mode scope: ci CI/CD workflows scope: docs Documentation scope: dependencies Dependency updates risk: high Safety, secrets, auth, or critical infrastructure and removed size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules labels Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: high Safety, secrets, auth, or critical infrastructure scope: agent Agent core (agent loop, router, scheduler) scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: channel Channel infrastructure scope: ci CI/CD workflows scope: db/postgres PostgreSQL backend scope: db Database trait / abstraction scope: dependencies Dependency updates scope: docs Documentation scope: extensions Extension management scope: orchestrator Container orchestrator scope: pairing Pairing mode scope: setup Onboarding / setup scope: tool/builder Dynamic tool builder scope: tool/builtin Built-in tools scope: tool/wasm WASM tool sandbox scope: tool Tool infrastructure scope: workspace Persistent memory / workspace size: L 200-499 changed lines staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants