Skip to content

chore: promote staging to staging-promote/ab38a0b2-24673295299 (2026-04-20 15:35 UTC)#2749

Merged
henrypark133 merged 4 commits intomainfrom
staging-promote/4577d0e8-24675460727
Apr 21, 2026
Merged

chore: promote staging to staging-promote/ab38a0b2-24673295299 (2026-04-20 15:35 UTC)#2749
henrypark133 merged 4 commits intomainfrom
staging-promote/4577d0e8-24675460727

Conversation

@ironclaw-ci
Copy link
Copy Markdown
Contributor

@ironclaw-ci ironclaw-ci Bot commented Apr 20, 2026

Auto-promotion from staging CI

Batch range: 7fb41555a9e55677d1aaea29ca567a5b369c2b05..4577d0e82f4c0542a8c9f84a1e1e28b6e7bea72b
Promotion branch: staging-promote/4577d0e8-24675460727
Base: staging-promote/ab38a0b2-24673295299
Triggered by: Staging CI batch at 2026-04-20 15:35 UTC

Commits in this batch (47):

Current commits in this promotion (4)

Current base: staging-promote/ab38a0b2-24673295299
Current head: staging-promote/4577d0e8-24675460727
Current range: origin/staging-promote/ab38a0b2-24673295299..origin/staging-promote/4577d0e8-24675460727

Auto-updated by staging promotion metadata workflow

Waiting for gates:

  • Tests: pending
  • E2E: pending
  • Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

zmanian and others added 4 commits April 20, 2026 23:55
* fix(gate): handle orphaned approval gates when thread is deleted (#2323)

When a thread is deleted while an approval gate is pending, the gate
becomes orphaned -- unresolvable on backend and undismissable on
frontend. This fixes three root causes:

1. execute_pending_gate_action now detects missing threads and emits a
   gate_resolved event with resolution "expired" instead of erroring,
   so the frontend can dismiss the stale card.

2. PendingGateStore gains discard_for_thread() for bulk cleanup of all
   gates tied to a specific thread, regardless of user.

3. Frontend handleGateResolved now handles "expired" resolution to
   remove stale approval cards and re-enable chat input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gate): split Ok(None)/Err(_) arms and wire discard_for_thread into production

Address review feedback on #2347:

1. Split the conflated `Ok(None) | Err(_)` match arm in
   `execute_pending_gate_action`: `Ok(None)` means the thread was
   genuinely deleted (emit "expired" gate resolution), while `Err(_)`
   indicates a transient DB failure (propagate the error so the caller
   can retry instead of permanently discarding the gate).

2. Wire `discard_for_thread()` into the conversation clear path
   (`clear_engine_conversation`), replacing the per-user `discard()`
   call. This ensures all pending gates for a thread are cleaned up
   regardless of which user created them, preventing orphaned gates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gate): pre-flight thread check before persisting AlwaysAllow

Address review feedback on #2347:

1. **codex P2 — AlwaysAllow silent commit on thread-delete race.**
   When `resolve_gate` hit `Approved { always: true }`, it persisted
   `AlwaysAllow` to DB *before* calling `execute_pending_gate_action`.
   The rollback branch only fires on `result.is_err()`, but the
   thread-missing path now returns `Ok(BridgeOutcome::Respond(...))` for
   graceful `expired` UX — so the preference would stick even though the
   tool never ran. Added a pre-flight `load_thread` check in the
   `Approved` arm that short-circuits with `emit_gate_expired_dismissal`
   before any auto-approve / DB write occurs.

2. **Rebase onto staging (`scope_thread_id: Option<ExternalThreadId>`).**
   The PR predated #2561/#2473's identity-type tightening. The inline
   `pending.scope_thread_id.clone().or_else(|| Some(pending.thread_id.to_string()))`
   no longer type-checks against `Option<String>` on the wire field.
   Replaced with `Some(pending.effective_wire_thread_id())` — matches
   the five other `GateResolved` emit sites and satisfies the
   "don't re-derive identity values" invariant in
   `src/bridge/CLAUDE.md`.

3. **Extracted `emit_gate_expired_dismissal`** so the expired-SSE path
   is shared between the pre-flight check and
   `execute_pending_gate_action`'s `Ok(None)` arm. Documented the
   pre-flight contract on the helper itself.

4. **Regression test**
   `resolve_gate_approved_with_missing_thread_emits_expired_and_skips_persist`
   drives `resolve_gate` at the caller level with `Approved { always: true }`
   and no thread in the store. Asserts the first SSE event is `expired`
   (not `approved_always`), satisfying `.claude/rules/testing.md` →
   "Test Through the Caller, Not Just the Helper".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Illia Polosukhin <ilblackdragon@gmail.com>
* docs(rules): add review-driven guidance for Claude Code

Synthesizes recurring patterns from ~30 merged PRs, 147 bot review
comments (Copilot/Gemini), human reviews, and ~50 issues filed in the
past 2 weeks. Each rule cites the motivating PR/issue numbers.

New files:
- error-handling.md — silent-failure taxonomy (unwrap_or_default, .ok()?,
  poisoned caches), persist-then-reload atomicity, channel-edge error
  mapping. (#2526, #2633, #2653, #2673, #2546, #2407, #2408)
- agent-evidence.md — side-effect claims must cite tool evidence,
  empty-fast outputs are errors, external-effect tools must read back,
  setup UI round-trip. (#2544, #2580, #2582, #2541, #2545, #2411, #2543,
  #2586)
- lifecycle.md — discovery vs. activation, terminal auth rejection,
  list_installed vs. list_active, deactivation unwinds, snapshot
  rehydrate must re-validate. (#2556, #2557, #2558, #2564, #2419,
  PR #2617, PR #2631)

Extended:
- types.md — from_trusted boundary rule, validated-newtype template with
  shared validate(&str), serde(try_from) required for validated types,
  wire-stable enums (no Debug; serde alias for migrations), canonical
  wire-contract field naming. (PR #2685, #2681, #2687, #2678, #2669,
  #2665, #2683, #2702)
- safety-and-sandbox.md — every new ingress scans pre-transform/pre-
  injection, bounded resources (interners/streams/fan-out caps), cache
  keys must be complete. (#2491, #2676, #2470, #2633, #2673, #2710,
  PR #2702)
- review-discipline.md — PR scope discipline, guardrail scripts are
  code (regression tests, grouped-import parsing, CI has_code inclusion),
  absolute-path ban in committed docs, stale comments after refactors.
  (PR #2668, #2628, #2680, #2687, #2647, #2689, #2701)

All new files carry paths: frontmatter so they auto-load only on
matching files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(rules): split agent-evidence into prompt + code rule

agent-evidence.md mixed two concerns: runtime agent instruction (what
the LLM should do when concluding a turn) and code-enforcement rules
(what the dispatcher, engine, and tools must implement). Rules under
.claude/rules/ only guide Claude Code when editing the repo — the
runtime agent never reads them.

Splits the two:

- crates/ironclaw_engine/prompts/codeact_postamble.md — new section
  "Evidence before claiming side effects". Sits next to the existing
  "FINAL() answer quality" guidance; loaded via include_str! in
  executor/prompt.rs (no Rust change needed).
- .claude/rules/tool-evidence.md — renamed from agent-evidence.md,
  keeps only the code invariants (engine v2 side-effect gate,
  empty-fast ToolError::EmptyResult, external-effect tools must read
  back, setup UI round-trip).

Prompt tests pass unchanged; the postamble addition is pure text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* prompt: tighten evidence rule to FINAL() claims only, not tool use

Live-test validation of the "Evidence before claiming side effects"
section (added in the prior commit) showed it inhibited legitimate
tool use. With the original wording, `zizmor_scan_v2` live-recording
timed out at 302s with zero responses; reverting the postamble
restored healthy behavior (88s run, 8 shell calls including
`cargo install zizmor` and full workflow analysis).

The original phrasing conflated two things: what the agent should
claim and what tools it should call. The rule is only about the
claim. Re-tunes the section to:

- Open with an explicit "this does not restrict tool calls" scope.
- Drop the "<1ms = failure" heuristic (too broad — normal tools like
  `tool_info(schema)` are legitimately fast).
- Drop the full enumeration of forbidden side-effect verbs; keep the
  rule narrower and clearer.
- Shorten the code example (remove redundant early-return).

Re-tuned run: agent is active (shell calls, real reasoning), live
recording completes in ~9s. The remaining test failure is a
pre-existing assertion bug (exact `t == "shell"` match against tool
strings that now carry arguments like `"shell(cmd)"`) — reproduces
with the old postamble too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(live): fix tool-name assertions + re-record zizmor traces

The two `zizmor_scan*` live tests had four broken tool-name assertions
that silently failed to match: `tools.iter().any(|t| t == "shell")`
against a tool list that now contains `"shell(cmd)"` strings (tool
events carry args via `format_action_display_name` in
`src/bridge/router.rs`). Two of the four were negative assertions
checking for the absence of `tool_install` recovery loops — those
silently passed even when a recovery loop actually ran. `sandbox_live_e2e.rs:203`
already used the correct `t == "shell" || t.starts_with("shell(")`
pattern; applied it consistently to all four sites.

Verified live:

- `IRONCLAW_LIVE_TEST=1 cargo test --test e2e_live -- zizmor_scan --ignored --test-threads=1`
  → 2 passed, 0 failed, 51.78s. Agent installs and runs zizmor
  end-to-end, producing real findings (exit code 14, dangerous
  triggers, excessive permissions, etc.).

Traces re-recorded with the tuned postamble (commit 50d8517) and
scrubbed: replaced `/home/illia/.cargo/bin/zizmor` with
`/home/user/.cargo/bin/zizmor` per the developer-local-path ban in
`.claude/rules/review-discipline.md`. No credentials, PII, or
high-entropy secrets in either trace (only git SHAs from zizmor's
workflow analysis output).

Replay still passes: `cargo test --test e2e_live -- zizmor_scan --ignored`
→ 2/2 ok.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(replay): update zizmor_scan_v2 insta snapshot

The engine v2 replay-snapshot gate (`engine_v2_tests::snapshot_zizmor_scan_v2`)
failed against the re-recorded trace from 1691efe because the old
snapshot encoded a broken run:

- final_state: Failed
- Missing Assistant message role
- 3 issues: thread_failure (error), no_response (warning), llm_error (error)
- 6 tool calls that never produced a final answer

The new trace completes cleanly:

- final_state: Done
- System / User / Assistant roles present
- 1 issue: mixed_mode (info)
- 3 shell tool calls + successful `FINAL()` with real findings

The snapshot was pinning a regression. Regenerated with
`INSTA_UPDATE=always cargo test --test e2e_engine_v2 -- snapshot_zizmor_scan_v2`;
passes on replay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rules): address PR #2714 review feedback

- review-discipline: reword "Doc Absolute Paths" as a review convention
  (pre-commit only scans .rs; the rule misleadingly claimed enforcement).
- safety-and-sandbox: broaden `paths:` frontmatter to include the actual
  ingress owners (`bridge`, `channels`, `workspace`, `agent`, engine
  crate) so the rule auto-loads where it applies.
- tool-evidence: mark the side-effect gate, empty-fast rule, and
  `unverified` flag as target/aspirational invariants — neither
  `ToolError::EmptyResult`, an `unverified` field on `ToolOutput`, nor a
  byte-count field on `ActionRecord` exist today. Point at concrete
  interim conventions (`ToolError::ExecutionFailed`, `unverified: true`
  in the JSON result body).
- types: scope "Validated newtypes must gate Deserialize" to *new*
  types, document the `CredentialName`/`ExtensionName` exception (they
  intentionally use `#[serde(transparent)]` + derived `Deserialize`
  under the `serde_does_not_revalidate` test). Clarify the
  `from_trusted` trust boundary (trusted = typed upstream, untrusted =
  raw JSON field even if the field *name* is "registry entry").
  Switch `new` template to `impl Into<String>` to avoid an unnecessary
  clone when an owned `String` is passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rules): simplify types.md + split doc-hygiene; address review round 2

- types: collapse two templates into one canonical validated-newtype
  shape. New types use `#[serde(try_from = "String")]` with a shared
  `validate(&str)` helper — no more dual "transparent for some /
  try_from for others" guidance. `CredentialName`/`ExtensionName` are
  documented as the sole legacy exception (locked in by the
  `serde_does_not_revalidate` test); new code must not copy their
  `transparent` + `from_trusted` pattern. Removes the long "Using
  `from_trusted` safely" section and the separate "Validated newtypes
  must gate Deserialize" subsection that contradicted the Don'ts list.
- doc-hygiene: new tiny rule file scoped to `**/*.md`, `**/*.py`,
  `docs/**` that carries the "no developer-local absolute paths in
  committed docs" convention. Removed from review-discipline.md where
  its `src/**/*.rs` scope meant the rule never loaded on the files it
  governed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(live): match hyphenated tool-install in attempted_relevant_tool

The engine records `action_name` as the raw string the LLM emitted
(`crates/ironclaw_engine/src/executor/structured.rs:381`), and the
registry's lookup canonicalization only affects dispatch — not the
name that reaches `StatusUpdate::ToolStarted`. The two other predicates
in this file (`bad_recovery` at :420, `phase_b_recovery` at :531)
already defend against both forms; this one should too, for
consistency. Addresses PR #2714 review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#2742)

* ci: validate Cargo.toml version before use in Docker workflows (#1901)

Reject Cargo.toml versions that don't match strict semver before they
reach any shell context, and stop splicing `${{ }}` expressions directly
into `run:` blocks in Summary steps — pass values via `env:` and
reference as shell variables instead. Also validate `inputs.tag`
against Docker tag grammar in docker.yml.

Closes #1901

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: reject SemVer build metadata in Docker workflow version validator

Docker tags forbid '+', so accepting SemVer build metadata in the
validator would pass values like '1.2.3+build.7' through only to fail
at docker push. Tighten the regex to MAJOR.MINOR.PATCH[-prerelease] and
spell out the constraint in the error message.

Addresses PR #2742 review feedback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three parallel wiring gaps in the engine v2 Missions tab — each one was
code that only handled the Projects drill-in (cr-*) path and silently
no-op'd on the standalone tab, so the surface looked empty or frozen
even when the backend returned data:

- switchTab had no branch for 'missions', so opening the tab rendered
  the panel shell but never fetched data. /api/engine/missions returned
  rows; the table stayed empty.
- close-mission-detail only hid the cr-detail drawer, so the Back
  button on the standalone detail view did nothing.
- refreshMissionView refreshed the detail view or the project drill-in
  but not the Missions list, so fire/pause/resume actions succeeded but
  the list stayed stale.

[skip-regression-check]
@github-actions github-actions Bot added scope: ci CI/CD workflows scope: docs Documentation size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 20, 2026
@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Code review

Found 3 issues:

  1. [HIGH:75] Duplicate thread load in approval path with auto-allow

In src/bridge/router.rs, the resolve_gate() function performs a pre-flight thread existence check before persisting an AlwaysAllow preference (lines 2450-2467), then immediately calls execute_pending_gate_action() which loads the same thread again (line 996-1002). This adds a redundant DB round-trip on every approval with auto-approval flag set.

The pre-flight check was added to prevent race condition #2347 (thread deleted between gate resolution and persistence). While the duplicate load is a deliberate trade-off for safety, consider refactoring to pass the loaded thread through to execute_pending_gate_action() to eliminate the redundancy.

https://github.com/anthropics/ironclaw/blob/4577d0e82f4c0542a8c9f84a1e1e28b6e7bea72b/src/bridge/router.rs#L2461-L2467
https://github.com/anthropics/ironclaw/blob/4577d0e82f4c0542a8c9f84a1e1e28b6e7bea72b/src/bridge/router.rs#L996-L1002

  1. [MEDIUM:70] Sequential persistence removal without backpressure

In src/gate/store.rs::discard_for_thread() (lines 245-251), when persistence is configured, gate keys are removed sequentially with no parallelism, timeout, or error budget. If the persistence layer is slow or hung, this blocks the clear_engine_conversation() call. The loop continues on individual failures (errors logged at debug level), but there's no test coverage for the persistence-configured code path.

While this is in a cleanup path (not a user-request hot path), adding a timeout or spawning persistence removals concurrently would be more defensive.

https://github.com/anthropics/ironclaw/blob/4577d0e82f4c0542a8c9f84a1e1e28b6e7bea72b/src/gate/store.rs#L245-L251

  1. [MEDIUM:65] Documentation rule enforcement gaps

Three new rule files (error-handling.md, lifecycle.md, types.md) establish best practices but lack CI enforcement:

  • error-handling.md documents four silent-failure anti-patterns (.unwrap_or_default, .ok()?, etc.) but references them by issue number without corresponding grep patterns or automated checks in CI.
  • lifecycle.md rule feat: adding Web UI #2 ("Terminal auth rejection") requires implementation changes to WASM channel reconnect logic that aren't shown in this diff.
  • types.md significantly changes the canonical newtype shape from #[serde(transparent)] + from_trusted to #[serde(try_from = "String")] but marks CredentialName and ExtensionName as legacy exceptions. The boundary between new code expectations and legacy exceptions could be clearer.

These are guidance documents and acceptable as aspirational rules, but without automation (as noted in review-discipline.md "Guardrail Scripts Are Code"), they'll drift during reviews.


No security vulnerabilities detected. Code changes for #2347 (orphaned gates), #2742 (Cargo version), and #2745 (missions tab) are well-tested and low-risk.

Base automatically changed from staging-promote/ab38a0b2-24673295299 to main April 21, 2026 03:18
@henrypark133 henrypark133 merged commit 4577d0e into main Apr 21, 2026
55 of 68 checks passed
@henrypark133 henrypark133 deleted the staging-promote/4577d0e8-24675460727 branch April 21, 2026 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: ci CI/CD workflows scope: docs Documentation size: L 200-499 changed lines staging-promotion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants