chore: promote staging to staging-promote/13774cc0-24076446124 (2026-04-07 11:19 UTC)#2108
Merged
henrypark133 merged 40 commits intostaging-promote/13774cc0-24076446124from Apr 10, 2026
Conversation
* fix(tools): gate claude_code and acp modes behind enabled flags (#1987) CreateJobTool always exposed claude_code and acp as valid modes in its schema and silently accepted them at runtime, even when CLAUDE_CODE_ENABLED=false / ACP_ENABLED=false. This caused the LLM to sometimes select disabled modes, spawning containers that fail. - Add claude_code_enabled and acp_enabled flags to ContainerJobConfig (following the existing mcp_per_job_enabled pattern) - Expose via ContainerJobManager accessors, query from CreateJobTool through the already-injected job_manager - Dynamically build the mode enum in parameters_schema() — only show enabled modes; omit mode field entirely when only worker is available - Conditionally include agent_name field only when ACP is enabled - Add defense-in-depth guards in execute() rejecting disabled modes with ToolError::InvalidParameters - Add 10 regression tests covering schema gating and runtime rejection * fix(tools,web): address review feedback on mode gating (#2003) - Remove hardcoded "Set mode to claude_code" from CreateJobTool description; mode guidance is already provided dynamically via parameters_schema() - Add check_mode_enabled() guard in jobs_restart_handler to reject disabled modes on job restart via REST API, closing the bypass path - Add ContainerJobManager::is_mode_enabled(mode) to centralize mode validation - Clean up fully-qualified paths in jobs.rs with proper use imports - 5 regression tests (description, restart rejection, is_mode_enabled) * fix: harden mode gating with defense-in-depth and synchronous persistence - Add ModeDisabled variant to OrchestratorError and guard inside ContainerJobManager::create_job() so disabled modes are rejected even if callers forget to validate - Make job mode persistence synchronous instead of fire-and-forget to prevent silent mode loss on transient DB errors (restarts would silently downgrade to worker mode) - Refactor parameters_schema() to build serde_json::Map directly, removing an unreachable if-let guard and an .expect() call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Rajul Bhatnagar <brajul@amazon.com> Co-authored-by: serrrfirat <f@nuff.tech> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(web): emit Done status after response to fix SSE ordering (#2079) Move the terminal "Done" status out of thread_ops and emit it only after the gateway successfully responds via a new respond_then_done() helper in agent_loop. This guarantees the browser receives the assistant message before the turn-closing event, preventing the web UI from appearing stuck. Adds a regression test asserting the response event is captured before the Done status in the ordered event log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(web): add frontend safety net for lost SSE response events (#2079) Track whether a `response` SSE event was received for the current turn. When "Done" arrives without a preceding response, schedule a loadHistory() call after 1500ms so the user sees the answer even if the response event was lost to broadcast lag or a brief disconnect. This is the second prong of the fix described in #2079 — the backend ordering fix alone prevents the race, but this fallback handles residual edge cases (proxy buffering, SSE reconnection gaps). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review findings — Done on all paths, fix frontend timer leaks Backend: - Send Done status when BeforeOutbound hook blocks the response, so the client still knows the turn is complete. - Send Done status for empty/suppressed responses (e.g. approval handled via send_status) to match pre-refactor behavior. Frontend: - Set _turnResponseReceived on stream_chunk events so streaming responses don't trigger a spurious loadHistory() when Done arrives. - Clear _doneWithoutResponseTimer on sendMessage() to prevent stale timers from a previous turn firing during the new one. - Clear turn-tracking state on switchThread() to prevent cross-thread contamination of the timer and flag. - Clear turn-tracking state on SSE reconnect (eventSource.onopen) to prevent stale timers from before the disconnect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review — always emit Done, extract helper, add test - respond_then_done now emits Done regardless of respond outcome so the client always knows the turn ended, even on delivery failure - Extract send_done() helper to deduplicate the inline Done+warn blocks in the hook-blocked and empty-response paths - Add done_emitted_for_empty_response test covering the empty-response branch ordering invariant - Lift 1500ms magic number to DONE_WITHOUT_RESPONSE_TIMEOUT_MS constant - Add comment explaining _turnResponseReceived single-thread tracking Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(agent): suppress Done while awaiting approval; introduce HandleOutcome Distinguish "no response, turn complete" from "no response, turn paused" in handle_message's return type so the run loop can decide whether to emit the terminal Done status. The previous code lumped both into Ok(Some("")), causing v1 NeedApproval to incorrectly emit Done after ApprovalNeeded — which then tripped the new web UI safety net and triggered a spurious loadHistory() under the live approval prompt. - New HandleOutcome enum with Shutdown / Respond / NoResponse / Pending - SubmissionResult::NeedApproval now maps to HandleOutcome::Pending - Bridge handlers wrapped via HandleOutcome::from_legacy (their approval flows return non-empty descriptive text, so they never need Pending) - Regression test no_done_emitted_while_awaiting_approval drives a v1 Always-approval probe and asserts no Done is captured - Repaired pre-existing done_emitted_for_empty_response test, which asserted the wrong invariant: the dispatcher substitutes empty LLM responses with a fallback message, so a truly empty response never reaches the run loop. Renamed and updated to assert the ordering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: ilblackdragon@gmail.com <ilblackdragon@gmail.com>
* feat(slack): implement on_broadcast and fix message tool channel hints Implement the on_broadcast callback for the Slack WASM channel, enabling proactive message delivery to Slack channels/users via the message tool. Previously this was a stub returning "not implemented". The implementation: - Uses the user_id parameter as the broadcast target (channel ID or user ID) - Strips leading # from targets for convenience - Warns when target doesn't look like a Slack ID (C/U/D/G prefix) - Posts via chat.postMessage with host-injected Bearer token - Tracks active threads for broadcast replies (consistent with on_respond) Also fixes the message tool's channel parameter description to list 'slack' alongside 'slack-relay' and clarifies that Slack targets must be IDs. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(slack): extract shared post helper, harden broadcast validation Address review findings on the slack broadcast implementation: - Extract `post_slack_message()` shared by `on_respond` and `on_broadcast`, eliminating ~40 lines of duplicated HTTP-call-then-parse logic. - Log `track_active_thread` errors at Warn level instead of silently swallowing them with `let _ =` (restores observability lost in original). - Make non-ID broadcast targets a hard error instead of a soft warning — consistent with the message tool schema that says "must be an ID, not a name". - Fix empty-target error message to not assume a name was provided. - `resolve_broadcast_target` now returns `&str` (avoids allocation). - Add 2 tests covering the resolve+validate pipeline end-to-end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(slack): track broadcast message ts so replies are recognized Address Gemini review: broadcast messages now track the Slack-returned timestamp as an active thread (falling back from response.thread_id to the posted message ts). This ensures that if a user replies to a broadcast, the agent recognizes the reply as an active thread. Also fix stale doc comment on looks_like_slack_id (was missing W prefix). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(staging): repair 4 categories of CI test failures
1. Telegram token test flaky race: add env mutex guard so concurrent
tests that override IRONCLAW_TEST_TELEGRAM_API_BASE_URL don't
pollute the unguarded read in the colon-preservation test.
2. SSE/connection E2E tests: #sse-status element was removed from HTML
and replaced with #sse-dot colored indicator. Update tests to check
the dot's CSS class instead of text content. Add SSE-ready wait to
the page fixture so chat tests don't race against connection setup.
3. Tool approval E2E tests: API unified legacy pending_approval and
engine v2 gates into a single pending_gate response field. Update
all E2E test helpers to use the correct field name.
4. WASM tar.gz extraction bug: canonicalized extension names use
underscores (web_search) but release archives use hyphens
(web-search.wasm). Accept both filename forms when extracting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address review feedback — stronger SSE signal, consistent naming
- SSE wait: use sseHasConnectedBefore JS flag (set in onopen) instead
of checking #sse-dot CSS class, which defaults to connected state
before SSE actually connects
- Rename _wait_for_pending_approval → _wait_for_pending_gate and
_wait_for_no_pending_approval → _wait_for_no_pending_gate
- Update all docstrings/error messages to say pending_gate
- Deduplicate name.replace('_', '-') in tar.gz extraction and include
both accepted filenames in the error message
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address second round of review feedback
- Add comment documenting invariant: canonical names use underscores,
archives may use hyphens, reverse is not supported
- Fix quote wrapping in single-name error case
- Remove dead SEL["sse_status"] selector from helpers.py
- Simplify SSE wait: use window.sseHasConnectedBefore === true
(fails fast on rename instead of silent 10s timeout)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use global binding for sseHasConnectedBefore, not window property
sseHasConnectedBefore is declared with let at global scope, which
does not create a window property. window.sseHasConnectedBefore
would always be undefined. Use typeof guard + direct reference instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): cap tar.gz entry pre-allocation to MAX_ENTRY_SIZE
The tar header's declared size is attacker-controlled. Without capping,
Vec::with_capacity could attempt a huge allocation and OOM before the
read_to_end take() limit kicks in.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: cargo fmt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: make test_sse_status_shows_connected non-redundant, document flag reset
- test_sse_status_shows_connected now checks #sse-dot CSS class (visual
indicator) instead of re-checking sseHasConnectedBefore which the
page fixture already guarantees
- Add comment to test_sse_reconnect_after_disconnect explaining why
sseHasConnectedBefore is reset and that the history-reload path is
covered by test_sse_reconnect_preserves_chat_history
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Improve channel onboarding and Telegram pairing flow * fix: remove dead restart_required code, fix review findings, and stabilize polling E2E test - Remove restart_required/needs_restart dead code from 6 files (no real extension uses it; all channels hot-activate at runtime) - Remove dead extensions.configuredRestart i18n key from all 3 locales - Fix pairing test asserting wrong upsert semantics (test expected idempotent behavior but impl always rotates codes) - Fix pairing test using expired code for approval (req.code -> req_again.code) - Fix missing i18n fallback for auth.extensionTokenPlaceholder - Validate setup_url scheme (https?://) before assigning to <a>.href - Replace hardcoded English "Approve"/"Pairing code is required" with i18n keys - Demote misleading "bot is open to all users" Telegram log from Warn to Debug - Move polling E2E test to run first (polling loop dies during refresh_active_channel) - Add poll_interval_ms config field to Telegram WASM channel - Fix conversations.rs compilation (missing ? operator) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining review comments (i18n regressions) - Remove hardcoded pairing_instructions() function from server.rs; use onboarding metadata from ExtensionManager instead (fixes i18n regression where pairing instructions were always English) - Restore i18n calls for stepper labels in renderWasmChannelStepper (was using hardcoded English strings instead of missions.step* keys) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restart polling loop on channel refresh, address Copilot review - Add WasmChannel::ensure_polling() that stops any stale polling task and starts a fresh one from the on_start config - Call ensure_polling() in refresh_active_channel after re-running on_start, fixing the root cause of the dead polling loop in E2E tests - Move polling test back to its original position (no longer order-dependent) - Fix requires_pairing in channel_onboarding_for_state to use channel_requires_pairing() instead of legacy owner_id-only check - Add rel='noopener noreferrer' to all setup_url target=_blank links Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: collapse nested if-let to satisfy clippy collapsible_if Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: return promise from approvePairing, stop polling unconditionally in ensure_polling - Add missing `return` before apiFetch in approvePairing() so callers can await/chain the result - Move poll_shutdown_tx.take() before the enabled check in ensure_polling() so switching from polling to webhook stops the old polling task Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove trailing commas in JSON test fixtures after restart_required removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(web): intercept approval text input ("yes"/"no"/"always") in chat
When a tool requires approval in the web UI, typing "yes", "no", or
"always" in the chat input now resolves the approval card directly
instead of sending a regular message. This prevents duplicate approval
prompts and "No pending approval" errors that occurred when text went
through the backend message pipeline.
The frontend intercepts approval keywords in sendMessage() and routes
them through sendApprovalAction() — the same code path as clicking
the Approve/Deny/Always buttons on the card.
[skip-regression-check]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: find most recent unresolved approval card for text interception
Address review feedback: instead of checking the last card and then
separately checking if it's resolved, find the most recent unresolved
card directly. Handles the edge case where the last card is resolved
(during 1.5s removal animation) but an earlier one isn't.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add E2E test for skip-resolved-card behavior
Addresses review feedback: adds a test where two approval cards are
visible, the newer one is resolved via button click, then typing "yes"
correctly targets the older unresolved card instead of falling through.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n visibility bug (#2126) Missions created via the agent were invisible in the gateway UI for non-owner users because list_engine_missions/list_engine_threads fell back to the engine owner's default_project_id instead of resolving the authenticated user's per-user project. Broader change: replace three inconsistent ownership patterns (can_act_on, engine is_owned_by, raw string comparisons) with a single Owned trait providing uniform is_owned_by(user_id) checks across jobs, routines, and all internal code paths. - Fix project_id resolution in list_engine_missions/list_engine_threads - Add Owned trait to src/ownership with impls on AgentJobRecord, SandboxJobRecord, Routine, and JobContext - Migrate all can_act_on calls in web handlers (jobs.rs, routines.rs) - Migrate raw user_id comparisons in tenant.rs, commands.rs, routine_engine.rs, context/manager.rs, tools/builtin/job.rs - Add missing ownership check in routines_trigger_handler - Migrate active routines_runs_handler and verify_project_ownership in server.rs - Remove dead can_act_on function and ownership_identity helper - Add regression tests for concrete Owned impls on real types Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…2129) * fix(e2e): canonicalize extension names in configure/setup + update tests 1. Bug fix: `configure()` and `get_setup_schema()` in ExtensionManager called `validate_extension_name()` which discards the canonical form. Hyphenated names from URL paths (e.g., `web-search`) were used raw for capabilities file lookups, causing "Capabilities file not found" errors. Now both methods canonicalize with `canonicalize_extension_name()` so `web-search` → `web_search` before any file I/O. 2. Bug fix: `extensions_setup_handler` in server.rs compared raw URL path param against canonical stored names in the kind lookup. 3. Test fix: `test_wasm_lifecycle.py` — update all `web-search` refs to `web_search` (canonical form returned by registry/API). 4. Test fix: `test_mcp_auth_flow.py` — update `mock-mcp` / `mock-mcp-400` to `mock_mcp` / `mock_mcp_400`. 5. Test fix: `test_chat.py` — `test_send_message_and_receive_response` now counts assistant messages before sending to avoid picking up the pre-existing onboarding greeting as the LLM response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use centralized SEL selector in wait_for_function JS call Address review feedback: pass the selector from SEL dict into the JS function instead of hardcoding '#chat-messages .message.assistant'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use send_chat_and_wait_for_terminal_message helper for robustness Replace manual count+wait_for_function with the existing helper that correctly handles streaming chunks and waits for the terminal message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…2130) * fix(engine): repair mission ACL regression and 4 stale engine tests The mission access control in pause_mission/resume_mission had a logic error that allowed any user to manage shared/system missions. The `&& !mission.owner_id().is_shared()` condition short-circuited the entire check for shared missions. Replaced with proper two-branch logic using is_shared_owner(). Also fixed 4 test assertions that checked thread.messages instead of thread.internal_messages (orchestrator stores working messages in the internal transcript), and made the trace test resilient to event ordering by filtering by EventKind. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): extend ACL fix to update_mission, error on missing missions Address review feedback: - Fix same shared-mission ACL bug in update_mission (line 130) - pause_mission/resume_mission now error on missing missions instead of silently proceeding, matching update_mission's pattern - Update doc comments to reflect that the engine enforces shared ownership checks directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ownership): remove silent cross-tenant credential fallback in WASM wrappers (#2069, #2070) WASM tool credential resolution silently fell back to looking up secrets under the hardcoded "default" scope when the calling user had no credential configured, leaking the instance owner's API keys to other users without error or audit trail. - Remove "default" fallback in resolve_host_credentials(); return Err(ToolError::NotAuthorized) with actionable message instead of silently skipping missing credentials - Fix resolve_websocket_identify_message() to accept owner_scope_id parameter instead of hardcoding "default" - Document legacy broadcast metadata fallback with removal tracking - Document setup.rs boot-time owner_id lookups as intentional instance-level resource ownership - Add regression tests proving cross-tenant credentials do not leak Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * fix(review): differentiate expired vs missing credential errors, exclude UrlPath from store check Address PR review feedback: - Filter out UrlPath credentials in the no-store check so tools with only UrlPath mappings don't incorrectly get NotAuthorized - Match SecretError::Expired separately to produce "has expired" message instead of misleading "not found" - Add tests for both: UrlPath-only no-store (Ok), expired credential (specific error message) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> * fix(review): map backend SecretErrors to ExecutionFailed, not NotAuthorized Address @serrrfirat review: Database, DecryptionFailed, KeychainError, and other backend errors were incorrectly mapped to "not found". Now: - NotFound → ToolError::NotAuthorized ("not found, configure via secrets set") - Expired → ToolError::NotAuthorized ("has expired, refresh or re-set") - All others → ToolError::ExecutionFailed (preserves real cause) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(review): AccessDenied → NotAuthorized, fix issue refs, reduce visibility - Map SecretError::AccessDenied to ToolError::NotAuthorized (not ExecutionFailed) since it's an authorization failure - Update legacy fallback comments to reference #2100 (the tracking issue) instead of #2069 - Revert resolve_websocket_identify_message to private — test uses super:: import instead of pub(crate) path Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(review): redact secrets in Debug impl, document all-or-nothing invariant - Replace #[derive(Debug)] on ResolvedHostCredential with custom impl that redacts secret_value and auth headers to prevent latent leakage - Add comment documenting that all declared non-UrlPath credentials are required — tool execution fails on first missing credential rather than running with partial auth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
…workspace indexing) (#2127)
* chore(engine): rename ENGINE_V2_TRACE to IRONCLAW_RECORD_TRACE Aligns the trace-recording env var with the project-wide IRONCLAW_* naming convention so it's discoverable alongside other ironclaw flags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * unify engine v2 trace recording with v1 RecordingLlm Address PR review feedback. Instead of having two separate trace systems both named IRONCLAW_RECORD_TRACE (the v1 RecordingLlm in src/llm/recording.rs and the engine v2 executor/trace.rs JSON dumper), collapse them to one. Engine v2's LlmBackend is wired to the host's full LLM provider chain, which already includes RecordingLlm when IRONCLAW_RECORD_TRACE=1. That means engine v2 LLM interactions are already captured by the unified trace_*.json fixture file -- no engine-side env var, no second JSON output, no risk of one flag enabling two divergent recorders. Removed: - is_trace_enabled() and write_trace() from executor/trace.rs - the engine_trace_*.json write site in runtime/manager.rs Kept (still useful, runs unconditionally for the self-improvement mission): - build_trace() / analyze_trace() / log_trace_summary() Docs updated to point to RecordingLlm as the single trace mechanism. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ner (#2042) * test: add Slack E2E tests, Rust integration tests, and smoke runner Replicate the Telegram test infrastructure for the Slack WASM channel: - Add Slack URL rewriting in wrapper.rs for test API redirection - Create fake_slack_api.py mock server for E2E tests - Add 12 Python E2E tests covering setup, DM, mentions, auth, threads, files - Add 12 Rust integration tests for WASM channel behavior - Add conftest.py fixtures for isolated Slack test instances - Add local smoke test runner for pre-release validation with real Slack Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wrap env::set_var/remove_var in unsafe blocks for Rust 1.83+ CI uses Rust 1.94 which requires unsafe blocks for std::env::set_var and std::env::remove_var. Wrap the test-only calls in unsafe blocks with safety comments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback - Replace fragile time.time()-1 fallback with explicit SmokeError in run_smoke.py attachment case (reviewer finding #1) - Add OnceLock<Mutex> guard around env var mutation in wrapper.rs unit test to prevent parallel test races (reviewer finding #2) - Extract duplicated git-worktree discovery into find_project_file() helper in slack_auth_integration.rs (reviewer finding #3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(channels): generalize WASM HTTP test rewrites * fix(channels): gate Slack test URL rewrites from release builds * fix(ci): update wrapper test pairing store ctor --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* chore(ci): add Dependabot and pin GitHub Actions by SHA Add automated dependency vulnerability scanning via Dependabot for both Cargo crates (weekly) and GitHub Actions (weekly). Pin all 101 external action references across 14 workflow files to full commit SHAs to prevent supply-chain attacks via compromised tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): harden workflows — persist-credentials, permissions, template injection Address zizmor security audit findings: - Add persist-credentials: false to all checkout steps (artipacked) - Add explicit minimal permissions to all workflows (excessive-permissions) - Move workflow-level write permissions to job level where possible - Fix template injection in regression-test-check.yml by using env vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): address PR review comments - Group Dependabot updates by ecosystem to reduce PR noise (gemini) - Add persist-credentials: false to docker.yml checkout (Copilot) - Move inputs.tag and other expansions to env vars in docker.yml to eliminate template injection from workflow_dispatch user input Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): restore git push auth and harden git fetch Address PR #2043 review comments: - staging-ci create-promotion-pr: generate App token before checkout and pass it to checkout so 'git push origin "$BRANCH"' works - staging-ci update-tag: re-enable credential persistence so the 'staging-tested' tag force-push succeeds (job is internal-only) - release update-registry-checksums: re-enable credential persistence so the checksum-update branch push succeeds - regression-test-check: add '--' to git fetch to prevent refs that start with '-' from being interpreted as options Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): address serrrfirat PR review comments - release.yml: move github.ref_name, needs.plan.outputs.tag, and needs.plan.outputs.tag-flag to env vars across plan, build-local-artifacts, build-global-artifacts, and host jobs. The tag pattern '[0-9]+.[0-9]+.[0-9]+*' has a trailing glob, so a tag like '1.2.3\$(curl evil)' could match and be shell-expanded. - dependabot.yml: split Cargo groups into tokio-ecosystem, serialization, wasm, and everything-else to make regression bisection easier when CI fails on a Dependabot PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): add missing job-level permissions for gh CLI calls - resolve-promotion-base: add pull-requests: read for 'gh pr list' - gate: add checks: read for 'gh api .../commits/{sha}/check-runs' Both were dropped when workflow-level permissions moved to job level. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Zaki Manian <zaki@iqlusion.io>
* feat: port ratatui tui onto staging * Add TUI model picker for /model * Fix TUI CI lint failures * Format /tools output as vertical list * Restore TUI approval modal on thread switch * Re-emit pending approval events on follow-up messages * Improve TUI thread handling and activity UI * Sort TUI resume conversations by activity * fix(tui): address PR review feedback * Add TUI thread detail modal for activity sidebar * feat(tui): improve conversation scrolling UX - Mouse wheel: 1-line increments (was 3-line jumps) - PageUp/PageDown: full-page scroll based on viewport height (was 5 lines) - Add scrollbar widget on conversation right edge (track │, thumb ┃) - Add "↓ N more ↓ End to return" indicator when scrolled up - Add auto-follow (pinned_to_bottom) that disengages on scroll-up and re-engages when reaching bottom or pressing End - Clamp scroll offset to valid range (can't scroll past content) - Add End key binding to jump to bottom Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tui): use engine context pressure data for status bar The context bar was using cumulative session tokens (total_input + total_output) which grow unboundedly across turns, making the bar always show 100% after a few exchanges. Now uses the actual context window usage from ContextPressure events when available, falling back to cumulative tokens only before the first engine update arrives. Also syncs context_window from the engine's max_tokens so the limit reflects the real model capability instead of name-based heuristics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tui): render markdown in thread detail modal The thread detail modal was displaying raw markdown text (plain line splitting). Now uses render_markdown() for proper formatting of headers, lists, bold, code blocks, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tui): hydrate sidebar with engine threads and routines at startup The TUI sidebar was empty until the first user message because EngineThreadList and RoutineUpdate events were only sent after processing a message. Now sends initial data right before the message loop so the activity panel shows existing threads and routines immediately on startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tui): use owner_id for engine thread hydration at startup list_engine_threads filters by user_id, so passing "" matched no threads. Now uses self.owner_id() which matches the TUI channel's user_id, so threads are visible in the sidebar immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tui): fix CI — type errors and formatting in TUI tests Wrap `started_at` and `updated_at` in `Some(...)` to match `Option<DateTime<Utc>>` after upstream struct change, and run `cargo fmt` on files with formatting drift. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): resolve clippy warnings — collapsible ifs and needless borrow Collapse three nested `if` blocks into `if && let` chains and remove a needless `&` on the `process_list_threads` call, all in agent_loop.rs. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): add live_harness.rs with updated StatusUpdate patterns The live_harness.rs file was added to staging after this branch diverged. When CI merges the PR into staging, the file uses old StatusUpdate patterns that don't account for the new `detail` and `call_id` fields added by this branch. Add the file with `..` rest patterns to fix the merge-time compile errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(engine): add skill repair learning loop * fix(engine): guard skill repair mission updates * fix(engine): persist skill repair provenance * fix(engine): address skill-repair PR review feedback - Fix hex formatting: iterate GenericArray bytes individually instead of relying on Display impl which produces debug-like output - Always recompute content hash from actual doc.content when archiving a revision to prevent drift from out-of-band writes - Prune repair history on rollback to remove records for versions newer than the one being restored - Combine collect_error_messages + collect_observed_actions into a single pass (collect_errors_and_actions) to avoid redundant event iteration - Document bounded revision eviction policy (cap at 10) - Add comment clarifying concurrent skill-repair / error-diagnosis triggers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): constrain skill repair updates * fix(engine): keep insights on completed threads * style(engine): satisfy fmt and clippy on mission.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
* allow private local llm endpoints * Fix private endpoint review issues * Fix link-local clippy warning * Tighten base URL validation follow-ups * fix(config): non-blocking DNS validation and admin-only LLM key filtering Address PR #1955 review feedback: - Wrap to_socket_addrs() in tokio::task::block_in_place when called from a multi-threaded async runtime so the LLM utility handlers (/api/llm/test_connection, /api/llm/list_models) can no longer stall a worker thread on slow DNS. Synchronous callers (env config, CLI) are unaffected. - Add ADMIN_ONLY_LLM_SETTING_KEYS + strip_admin_only_llm_keys helper as defense-in-depth: Config::from_db_with_toml and re_resolve_llm_with_secrets now take an is_operator flag and strip admin-only base-URL-bearing keys (llm_builtin_overrides, llm_custom_providers, ollama_base_url, openai_compatible_base_url) from the DB merge for non-operator users. This guards future per-user resolve paths and any pre-existing legacy rows from reactivating a private/loopback endpoint via the operator validation policy. Existing call sites pass true (owner_id is the operator scope). Adds regression tests covering: * strip_admin_only_llm_keys removes all four keys, leaves others * validate_base_url is callable from a multi-thread tokio runtime without panicking on the strict short-circuit path * validate_operator_base_url remains callable from async handlers * re_resolve_llm filters admin-only keys when is_operator=false and keeps them when is_operator=true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(settings): de-dup admin-only LLM key list (#1955 review) `handlers/settings.rs::is_admin_only_setting_key` now delegates to `crate::config::helpers::ADMIN_ONLY_LLM_SETTING_KEYS` so the write-side gate cannot drift from the read-side `strip_admin_only_llm_keys` filter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(workspace): admin system prompt shared with all users (#2088) Introduce SYSTEM.md in a well-known __admin__ scope so admins can set a system prompt that all tenants receive. Gated behind multi-tenant mode (WorkspacePool sets admin_prompt_enabled on each workspace; owner workspace in app.rs also gets the flag when has_any_users() is true). New endpoints: - GET /api/admin/system-prompt — read admin system prompt - PUT /api/admin/system-prompt — set admin system prompt (64 KB limit) Safety: - SYSTEM.md added to injection scan list - is_reserved_scope() guard on user creation (defense-in-depth) - Multi-tenancy gate on both API and prompt assembly layers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove review audit file from tracked files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add 64 KB size limit to admin system prompt PUT handler Addresses PR review feedback: - Enforce 64 KB limit on system prompt content to prevent token budget exhaustion (the content is injected into every user's system prompt) - Add regression tests for the size limit (413 for oversized, not-413 for at-limit) - Document that is_multi_tenant is evaluated once at startup and the owner workspace requires a restart after the first user is created Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining review feedback on admin system prompt - Restore rustdoc comments stripped from document.rs (DocumentMetadata, HygieneMetadata, DocumentVersion, VersionSummary, PatchResult, etc.) to keep the diff focused on feature additions only - Replace silent error swallowing (if let Ok) with discriminated match in admin prompt read — only DocumentNotFound is silent, other errors logged at debug! level - Cache admin system prompt on WorkspacePool to avoid an extra DB read on every turn; invalidated on PUT via invalidate_admin_prompt() - Add cache invalidation integration test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(workspace): tighten reserved-scope check and admin-prompt body limit - is_reserved_scope: case-insensitive, whitespace-tolerant, and reserves the entire `__*__` namespace so future system scopes (alongside `__admin__`) cannot be impersonated by hand-crafted user IDs - admin system-prompt route: layer-level DefaultBodyLimit of 128 KB rejects oversized payloads before JSON parse, complementing the in-handler 64 KB content cap - system_prompt put_handler: clarify that the in-handler size check is a clearer-error fallback for the layer cap - users_create_handler: drop the dead is_reserved_scope check on a freshly-minted UUID; the guard belongs at a code path that actually accepts user-supplied IDs - expand is_reserved_scope tests for case, whitespace, and the wider `__*__` namespace Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
* Fix skill installs for invalid catalog names * Fix clippy test module ordering * fix: address PR review feedback * fix: use PairingStore::new_noop() in SSRF test after merge with staging The staging branch introduced a new test (test_http_request_rejects_private_ip_targets) that calls PairingStore::new(), but this branch changed the signature to require db and cache arguments. Use new_noop() since this is a test context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR #2040 review — remove expect() and dead strip_prefix - Restructure download_key flow in skills_install_handler to use the value directly instead of round-tripping through Option + expect(), satisfying the no-expect-in-production-code rule. - Remove dead strip_prefix("---\n") in render_skill_md — serde_yml does not emit a leading document marker for structs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(skills): preserve unknown frontmatter and tighten install matching - rewrite install-recovery to mutate the `name` field via raw YAML Value rather than re-serializing the typed SkillManifest, so unknown frontmatter keys (vendor extensions, future fields) survive the install rewrite - catalog_entry_is_installed: case-insensitive comparison for the display-name and normalized-slug branches, matching the slug branch - normalize_skill_identifier: document non-ASCII handling - normalizing-invalid-name log: warn -> debug (REPL/TUI rule) - add round-trip test asserting unknown top-level keys, nested mappings, and sequences survive install recovery Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
The test used a hyphenated channel name ("test-failing-channel") but
canonicalize_extension_name() converts hyphens to underscores. This
caused configure() to look for "test_failing_channel.capabilities.json"
which didn't exist, returning an early Err before reaching the
activation code path the test was designed to exercise.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1770 chore: promote staging to staging-promote/bb2c3e1d-24154330911 (2026-04-08 21:41 UTC)
…6101 chore: promote staging to staging-promote/6895cdad-24185214226 (2026-04-09 13:31 UTC)
…4226 chore: promote staging to staging-promote/63a48e4e-24182836482 (2026-04-09 10:24 UTC)
…6482 chore: promote staging to staging-promote/288fe49a-24110798843 (2026-04-09 09:26 UTC)
…8843 chore: promote staging to staging-promote/79c1b0fd-24108317021 (2026-04-08 00:16 UTC)
…7021 chore: promote staging to staging-promote/86c15903-24100112892 (2026-04-07 22:53 UTC)
…2892 chore: promote staging to staging-promote/00fd2e88-24092158668 (2026-04-07 19:23 UTC)
…8668 chore: promote staging to staging-promote/f765958f-24078644272 (2026-04-07 16:22 UTC)
b0f4a2b
into
staging-promote/13774cc0-24076446124
12 of 14 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-promotion from staging CI
Batch range:
a55aff980a4e235590c3af57ded2542512e2f9f6..f765958fc3edb44096061039458f9414681fdfa7Promotion branch:
staging-promote/f765958f-24078644272Base:
staging-promote/13774cc0-24076446124Triggered by: Staging CI batch at 2026-04-07 11:19 UTC
Commits in this batch (42):
Current commits in this promotion (1)
Current base:
staging-promote/13774cc0-24076446124Current head:
staging-promote/f765958f-24078644272Current range:
origin/staging-promote/13774cc0-24076446124..origin/staging-promote/f765958f-24078644272Auto-updated by staging promotion metadata workflow
Waiting for gates:
Auto-created by staging-ci workflow