chore: promote staging to staging-promote/ab38a0b2-24673295299 (2026-04-20 15:35 UTC)#2749
Conversation
* fix(gate): handle orphaned approval gates when thread is deleted (#2323) When a thread is deleted while an approval gate is pending, the gate becomes orphaned -- unresolvable on backend and undismissable on frontend. This fixes three root causes: 1. execute_pending_gate_action now detects missing threads and emits a gate_resolved event with resolution "expired" instead of erroring, so the frontend can dismiss the stale card. 2. PendingGateStore gains discard_for_thread() for bulk cleanup of all gates tied to a specific thread, regardless of user. 3. Frontend handleGateResolved now handles "expired" resolution to remove stale approval cards and re-enable chat input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gate): split Ok(None)/Err(_) arms and wire discard_for_thread into production Address review feedback on #2347: 1. Split the conflated `Ok(None) | Err(_)` match arm in `execute_pending_gate_action`: `Ok(None)` means the thread was genuinely deleted (emit "expired" gate resolution), while `Err(_)` indicates a transient DB failure (propagate the error so the caller can retry instead of permanently discarding the gate). 2. Wire `discard_for_thread()` into the conversation clear path (`clear_engine_conversation`), replacing the per-user `discard()` call. This ensures all pending gates for a thread are cleaned up regardless of which user created them, preventing orphaned gates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gate): pre-flight thread check before persisting AlwaysAllow Address review feedback on #2347: 1. **codex P2 — AlwaysAllow silent commit on thread-delete race.** When `resolve_gate` hit `Approved { always: true }`, it persisted `AlwaysAllow` to DB *before* calling `execute_pending_gate_action`. The rollback branch only fires on `result.is_err()`, but the thread-missing path now returns `Ok(BridgeOutcome::Respond(...))` for graceful `expired` UX — so the preference would stick even though the tool never ran. Added a pre-flight `load_thread` check in the `Approved` arm that short-circuits with `emit_gate_expired_dismissal` before any auto-approve / DB write occurs. 2. **Rebase onto staging (`scope_thread_id: Option<ExternalThreadId>`).** The PR predated #2561/#2473's identity-type tightening. The inline `pending.scope_thread_id.clone().or_else(|| Some(pending.thread_id.to_string()))` no longer type-checks against `Option<String>` on the wire field. Replaced with `Some(pending.effective_wire_thread_id())` — matches the five other `GateResolved` emit sites and satisfies the "don't re-derive identity values" invariant in `src/bridge/CLAUDE.md`. 3. **Extracted `emit_gate_expired_dismissal`** so the expired-SSE path is shared between the pre-flight check and `execute_pending_gate_action`'s `Ok(None)` arm. Documented the pre-flight contract on the helper itself. 4. **Regression test** `resolve_gate_approved_with_missing_thread_emits_expired_and_skips_persist` drives `resolve_gate` at the caller level with `Approved { always: true }` and no thread in the store. Asserts the first SSE event is `expired` (not `approved_always`), satisfying `.claude/rules/testing.md` → "Test Through the Caller, Not Just the Helper". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
* docs(rules): add review-driven guidance for Claude Code Synthesizes recurring patterns from ~30 merged PRs, 147 bot review comments (Copilot/Gemini), human reviews, and ~50 issues filed in the past 2 weeks. Each rule cites the motivating PR/issue numbers. New files: - error-handling.md — silent-failure taxonomy (unwrap_or_default, .ok()?, poisoned caches), persist-then-reload atomicity, channel-edge error mapping. (#2526, #2633, #2653, #2673, #2546, #2407, #2408) - agent-evidence.md — side-effect claims must cite tool evidence, empty-fast outputs are errors, external-effect tools must read back, setup UI round-trip. (#2544, #2580, #2582, #2541, #2545, #2411, #2543, #2586) - lifecycle.md — discovery vs. activation, terminal auth rejection, list_installed vs. list_active, deactivation unwinds, snapshot rehydrate must re-validate. (#2556, #2557, #2558, #2564, #2419, PR #2617, PR #2631) Extended: - types.md — from_trusted boundary rule, validated-newtype template with shared validate(&str), serde(try_from) required for validated types, wire-stable enums (no Debug; serde alias for migrations), canonical wire-contract field naming. (PR #2685, #2681, #2687, #2678, #2669, #2665, #2683, #2702) - safety-and-sandbox.md — every new ingress scans pre-transform/pre- injection, bounded resources (interners/streams/fan-out caps), cache keys must be complete. (#2491, #2676, #2470, #2633, #2673, #2710, PR #2702) - review-discipline.md — PR scope discipline, guardrail scripts are code (regression tests, grouped-import parsing, CI has_code inclusion), absolute-path ban in committed docs, stale comments after refactors. (PR #2668, #2628, #2680, #2687, #2647, #2689, #2701) All new files carry paths: frontmatter so they auto-load only on matching files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(rules): split agent-evidence into prompt + code rule agent-evidence.md mixed two concerns: runtime agent instruction (what the LLM should do when concluding a turn) and code-enforcement rules (what the dispatcher, engine, and tools must implement). Rules under .claude/rules/ only guide Claude Code when editing the repo — the runtime agent never reads them. Splits the two: - crates/ironclaw_engine/prompts/codeact_postamble.md — new section "Evidence before claiming side effects". Sits next to the existing "FINAL() answer quality" guidance; loaded via include_str! in executor/prompt.rs (no Rust change needed). - .claude/rules/tool-evidence.md — renamed from agent-evidence.md, keeps only the code invariants (engine v2 side-effect gate, empty-fast ToolError::EmptyResult, external-effect tools must read back, setup UI round-trip). Prompt tests pass unchanged; the postamble addition is pure text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * prompt: tighten evidence rule to FINAL() claims only, not tool use Live-test validation of the "Evidence before claiming side effects" section (added in the prior commit) showed it inhibited legitimate tool use. With the original wording, `zizmor_scan_v2` live-recording timed out at 302s with zero responses; reverting the postamble restored healthy behavior (88s run, 8 shell calls including `cargo install zizmor` and full workflow analysis). The original phrasing conflated two things: what the agent should claim and what tools it should call. The rule is only about the claim. Re-tunes the section to: - Open with an explicit "this does not restrict tool calls" scope. - Drop the "<1ms = failure" heuristic (too broad — normal tools like `tool_info(schema)` are legitimately fast). - Drop the full enumeration of forbidden side-effect verbs; keep the rule narrower and clearer. - Shorten the code example (remove redundant early-return). Re-tuned run: agent is active (shell calls, real reasoning), live recording completes in ~9s. The remaining test failure is a pre-existing assertion bug (exact `t == "shell"` match against tool strings that now carry arguments like `"shell(cmd)"`) — reproduces with the old postamble too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(live): fix tool-name assertions + re-record zizmor traces The two `zizmor_scan*` live tests had four broken tool-name assertions that silently failed to match: `tools.iter().any(|t| t == "shell")` against a tool list that now contains `"shell(cmd)"` strings (tool events carry args via `format_action_display_name` in `src/bridge/router.rs`). Two of the four were negative assertions checking for the absence of `tool_install` recovery loops — those silently passed even when a recovery loop actually ran. `sandbox_live_e2e.rs:203` already used the correct `t == "shell" || t.starts_with("shell(")` pattern; applied it consistently to all four sites. Verified live: - `IRONCLAW_LIVE_TEST=1 cargo test --test e2e_live -- zizmor_scan --ignored --test-threads=1` → 2 passed, 0 failed, 51.78s. Agent installs and runs zizmor end-to-end, producing real findings (exit code 14, dangerous triggers, excessive permissions, etc.). Traces re-recorded with the tuned postamble (commit 50d8517) and scrubbed: replaced `/home/illia/.cargo/bin/zizmor` with `/home/user/.cargo/bin/zizmor` per the developer-local-path ban in `.claude/rules/review-discipline.md`. No credentials, PII, or high-entropy secrets in either trace (only git SHAs from zizmor's workflow analysis output). Replay still passes: `cargo test --test e2e_live -- zizmor_scan --ignored` → 2/2 ok. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(replay): update zizmor_scan_v2 insta snapshot The engine v2 replay-snapshot gate (`engine_v2_tests::snapshot_zizmor_scan_v2`) failed against the re-recorded trace from 1691efe because the old snapshot encoded a broken run: - final_state: Failed - Missing Assistant message role - 3 issues: thread_failure (error), no_response (warning), llm_error (error) - 6 tool calls that never produced a final answer The new trace completes cleanly: - final_state: Done - System / User / Assistant roles present - 1 issue: mixed_mode (info) - 3 shell tool calls + successful `FINAL()` with real findings The snapshot was pinning a regression. Regenerated with `INSTA_UPDATE=always cargo test --test e2e_engine_v2 -- snapshot_zizmor_scan_v2`; passes on replay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rules): address PR #2714 review feedback - review-discipline: reword "Doc Absolute Paths" as a review convention (pre-commit only scans .rs; the rule misleadingly claimed enforcement). - safety-and-sandbox: broaden `paths:` frontmatter to include the actual ingress owners (`bridge`, `channels`, `workspace`, `agent`, engine crate) so the rule auto-loads where it applies. - tool-evidence: mark the side-effect gate, empty-fast rule, and `unverified` flag as target/aspirational invariants — neither `ToolError::EmptyResult`, an `unverified` field on `ToolOutput`, nor a byte-count field on `ActionRecord` exist today. Point at concrete interim conventions (`ToolError::ExecutionFailed`, `unverified: true` in the JSON result body). - types: scope "Validated newtypes must gate Deserialize" to *new* types, document the `CredentialName`/`ExtensionName` exception (they intentionally use `#[serde(transparent)]` + derived `Deserialize` under the `serde_does_not_revalidate` test). Clarify the `from_trusted` trust boundary (trusted = typed upstream, untrusted = raw JSON field even if the field *name* is "registry entry"). Switch `new` template to `impl Into<String>` to avoid an unnecessary clone when an owned `String` is passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rules): simplify types.md + split doc-hygiene; address review round 2 - types: collapse two templates into one canonical validated-newtype shape. New types use `#[serde(try_from = "String")]` with a shared `validate(&str)` helper — no more dual "transparent for some / try_from for others" guidance. `CredentialName`/`ExtensionName` are documented as the sole legacy exception (locked in by the `serde_does_not_revalidate` test); new code must not copy their `transparent` + `from_trusted` pattern. Removes the long "Using `from_trusted` safely" section and the separate "Validated newtypes must gate Deserialize" subsection that contradicted the Don'ts list. - doc-hygiene: new tiny rule file scoped to `**/*.md`, `**/*.py`, `docs/**` that carries the "no developer-local absolute paths in committed docs" convention. Removed from review-discipline.md where its `src/**/*.rs` scope meant the rule never loaded on the files it governed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(live): match hyphenated tool-install in attempted_relevant_tool The engine records `action_name` as the raw string the LLM emitted (`crates/ironclaw_engine/src/executor/structured.rs:381`), and the registry's lookup canonicalization only affects dispatch — not the name that reaches `StatusUpdate::ToolStarted`. The two other predicates in this file (`bad_recovery` at :420, `phase_b_recovery` at :531) already defend against both forms; this one should too, for consistency. Addresses PR #2714 review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#2742) * ci: validate Cargo.toml version before use in Docker workflows (#1901) Reject Cargo.toml versions that don't match strict semver before they reach any shell context, and stop splicing `${{ }}` expressions directly into `run:` blocks in Summary steps — pass values via `env:` and reference as shell variables instead. Also validate `inputs.tag` against Docker tag grammar in docker.yml. Closes #1901 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: reject SemVer build metadata in Docker workflow version validator Docker tags forbid '+', so accepting SemVer build metadata in the validator would pass values like '1.2.3+build.7' through only to fail at docker push. Tighten the regex to MAJOR.MINOR.PATCH[-prerelease] and spell out the constraint in the error message. Addresses PR #2742 review feedback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three parallel wiring gaps in the engine v2 Missions tab — each one was code that only handled the Projects drill-in (cr-*) path and silently no-op'd on the standalone tab, so the surface looked empty or frozen even when the backend returned data: - switchTab had no branch for 'missions', so opening the tab rendered the panel shell but never fetched data. /api/engine/missions returned rows; the table stayed empty. - close-mission-detail only hid the cr-detail drawer, so the Back button on the standalone detail view did nothing. - refreshMissionView refreshed the detail view or the project drill-in but not the Missions list, so fire/pause/resume actions succeeded but the list stayed stale. [skip-regression-check]
Code reviewFound 3 issues:
In The pre-flight check was added to prevent race condition #2347 (thread deleted between gate resolution and persistence). While the duplicate load is a deliberate trade-off for safety, consider refactoring to pass the loaded thread through to https://github.com/anthropics/ironclaw/blob/4577d0e82f4c0542a8c9f84a1e1e28b6e7bea72b/src/bridge/router.rs#L2461-L2467
In While this is in a cleanup path (not a user-request hot path), adding a timeout or spawning persistence removals concurrently would be more defensive.
Three new rule files (error-handling.md, lifecycle.md, types.md) establish best practices but lack CI enforcement:
These are guidance documents and acceptable as aspirational rules, but without automation (as noted in review-discipline.md "Guardrail Scripts Are Code"), they'll drift during reviews. No security vulnerabilities detected. Code changes for #2347 (orphaned gates), #2742 (Cargo version), and #2745 (missions tab) are well-tested and low-risk. |
Auto-promotion from staging CI
Batch range:
7fb41555a9e55677d1aaea29ca567a5b369c2b05..4577d0e82f4c0542a8c9f84a1e1e28b6e7bea72bPromotion branch:
staging-promote/4577d0e8-24675460727Base:
staging-promote/ab38a0b2-24673295299Triggered by: Staging CI batch at 2026-04-20 15:35 UTC
Commits in this batch (47):
onboardfails with "Failed to save settings to database", butironclawstarts successfully and applies migrations #846) (fix(setup): run migrations during onboard when DATABASE_URL preset (#846) #2309)Current commits in this promotion (4)
Current base:
staging-promote/ab38a0b2-24673295299Current head:
staging-promote/4577d0e8-24675460727Current range:
origin/staging-promote/ab38a0b2-24673295299..origin/staging-promote/4577d0e8-24675460727Auto-updated by staging promotion metadata workflow
Waiting for gates:
Auto-created by staging-ci workflow