Preserve paused leases across engine auth resume by henrypark133 · Pull Request #2631 · nearai/ironclaw

henrypark133 · 2026-04-18T07:51:44Z

Summary

persist paused_lease through pending gate creation, storage, and replay
prefer the paused lease when resuming gated engine actions instead of re-deriving a fresh lease
keep successful OAuth callbacks from clearing the pending engine auth gate before replay completes

Verification

CARGO_TARGET_DIR=/Users/henry/near_ai/near_claw/ironclaw-ci-regression-fix-2/.shared-target cargo test -p ironclaw_engine parse_outcome_gate_paused
CARGO_TARGET_DIR=/Users/henry/near_ai/near_claw/ironclaw-ci-regression-fix-2/.shared-target cargo test -p ironclaw_engine normalize_pause_outcome_transitions_thread_to_waiting

gemini-code-assist

Code Review

This pull request implements the persistence and reuse of capability leases when an execution is paused by a gate. It adds a "paused_lease" field to the ThreadOutcome, EngineError, and PendingGate structures, allowing the engine to restore the specific lease context upon resumption. The changes span the orchestrator, scripting, and structured executors, as well as the bridge and gateway logic, including a refactor of the OAuth callback handler to use a dedicated dispatch function. I have no feedback to provide.

…ceptance Closes two gaps blocking v2 engine becoming the default: **Phase 4 — token + cost accounting** - Delete orphaned `crates/ironclaw_engine/src/executor/compaction.rs` (176 lines). The Python orchestrator (`default.py::compact_if_needed`) has owned compaction policy since #1557; the Rust module had no callers anywhere in the workspace. - Wire `cost_usd` in `LlmBridgeAdapter` by calling `LlmProvider::calculate_cost()` at both the no-tools and with-tools response paths. The engine's `Thread::total_cost_usd` accumulator and `max_budget_usd` gates were already plumbed — only the adapter was hardcoding 0.0. - Persist `total_cost_usd` through `ThreadArchiveSummary` round-trip in `store_adapter.rs`. Previously, rehydrating an archived thread silently dropped the cost to 0.0. `#[serde(default)]` keeps existing archive files deserializing cleanly. **Phase 6 — mission lifecycle acceptance** Three new integration tests in `bridge/effect_adapter.rs` driving `execute_action()` end-to-end (per `.claude/rules/testing.md` "Test Through the Caller"): - `mission_full_lifecycle_via_execute_action` — create → list → complete → list, asserting the `Completed` status surfaces through `mission_list` after `mission_complete`. - `mission_fire_returns_thread_id_for_manual_cadence_via_execute_action` — fresh manual mission fires successfully and returns a UUID thread_id rather than `not_fired`. - `mission_list_returns_all_user_missions_via_execute_action` — all three created missions appear in `mission_list` output. **Regression tests for cost wiring** Three new tests in `bridge/llm_adapter.rs`: - `complete_no_tools_populates_cost_usd_through_adapter` - `complete_with_tools_populates_cost_usd_through_adapter` - `complete_routes_subcalls_through_cheap_provider_for_cost` — pins that `depth > 0` is priced with the cheap provider, not the primary. Coordinated with in-flight work: skipped paths owned by #2504 (auth E2E), #2631 (paused-lease resume), #2570 (mission re-fire), #2549 (mission_get), #2452 (tool_calls persistence), #2621 (replay snapshot). Verified: `cargo fmt`, `cargo clippy --all --benches --tests --examples --all-features` (0 warnings), `cargo test -p ironclaw_engine` (409 passed), `cargo test -p ironclaw --lib` (5079 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ceptance (#2660) * feat(engine-v2): Phase 4 cost tracking + Phase 6 mission lifecycle acceptance Closes two gaps blocking v2 engine becoming the default: **Phase 4 — token + cost accounting** - Delete orphaned `crates/ironclaw_engine/src/executor/compaction.rs` (176 lines). The Python orchestrator (`default.py::compact_if_needed`) has owned compaction policy since #1557; the Rust module had no callers anywhere in the workspace. - Wire `cost_usd` in `LlmBridgeAdapter` by calling `LlmProvider::calculate_cost()` at both the no-tools and with-tools response paths. The engine's `Thread::total_cost_usd` accumulator and `max_budget_usd` gates were already plumbed — only the adapter was hardcoding 0.0. - Persist `total_cost_usd` through `ThreadArchiveSummary` round-trip in `store_adapter.rs`. Previously, rehydrating an archived thread silently dropped the cost to 0.0. `#[serde(default)]` keeps existing archive files deserializing cleanly. **Phase 6 — mission lifecycle acceptance** Three new integration tests in `bridge/effect_adapter.rs` driving `execute_action()` end-to-end (per `.claude/rules/testing.md` "Test Through the Caller"): - `mission_full_lifecycle_via_execute_action` — create → list → complete → list, asserting the `Completed` status surfaces through `mission_list` after `mission_complete`. - `mission_fire_returns_thread_id_for_manual_cadence_via_execute_action` — fresh manual mission fires successfully and returns a UUID thread_id rather than `not_fired`. - `mission_list_returns_all_user_missions_via_execute_action` — all three created missions appear in `mission_list` output. **Regression tests for cost wiring** Three new tests in `bridge/llm_adapter.rs`: - `complete_no_tools_populates_cost_usd_through_adapter` - `complete_with_tools_populates_cost_usd_through_adapter` - `complete_routes_subcalls_through_cheap_provider_for_cost` — pins that `depth > 0` is priced with the cheap provider, not the primary. Coordinated with in-flight work: skipped paths owned by #2504 (auth E2E), #2631 (paused-lease resume), #2570 (mission re-fire), #2549 (mission_get), #2452 (tool_calls persistence), #2621 (replay snapshot). Verified: `cargo fmt`, `cargo clippy --all --benches --tests --examples --all-features` (0 warnings), `cargo test -p ironclaw_engine` (409 passed), `cargo test -p ironclaw --lib` (5079 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(engine-v2): surface engine capability actions to LLM via available_actions Fixes the gap called out in the PR body: `EffectBridgeAdapter::available_actions` was only enumerating v1 `ToolRegistry` tools + latent OAuth actions, so engine-native capabilities like `missions` never appeared in the LLM's tools list even when a thread held an active lease for them. The LLM was therefore unable to call `mission_create` / `mission_list` / etc. via structured tool calls; the only ways to drive missions were CodeAct Python calls (which relied on the same `known_actions` set and hit the same gap) or `/routine` slash commands falling through to v1. Wire `CapabilityRegistry` into the adapter and iterate active leases to surface every leased, engine-registered capability action. Respects lease grant scope — a lease granting only `mission_list` does not leak `mission_create`. Skips the `"tools"` capability since that lease is already reconciled from the v1 path. Router wires the shared `Arc<CapabilityRegistry>` to both the adapter and `ThreadManager` at setup. Three new regression tests: - `available_actions_surfaces_leased_mission_capability` - `available_actions_respects_partial_lease_grant` - `available_actions_omits_capability_without_lease` Verified: `cargo fmt`, `cargo clippy --all --benches --tests --examples --all-features` (0 warnings), `cargo test -p ironclaw_engine` (409 passed), `cargo test -p ironclaw --lib` (5082 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(engine-v2): close review gaps — archive round-trip, v1/engine merge, defensive filters Addresses gaps raised in PR #2660 review: - **Consolidate `use ironclaw_engine::{...}`** into a single grouped import in `effect_adapter.rs` (was split across two statements). - **Apply `is_v1_only_tool` / `is_v1_auth_tool` filters** to the engine capability path in `available_actions`. Defensive guardrail: a future engine capability that registers an action under a v1-denylisted name (`create_job`, `tool_auth`, ...) must not bypass the v2-isolation filters by virtue of coming through a different capability registry. - **`ThreadArchiveSummary` serialization round-trip tests** in `store_adapter.rs`: - `archive_summary_preserves_total_cost_usd_through_round_trip` — pins the regression the PR fixed (cost silently zeroed on rehydration). - `archive_summary_handles_legacy_json_without_total_cost_usd_field` — pins `#[serde(default)]` back-compat for archive files written before this PR. - **`available_actions` combined advertising tests** in `effect_adapter.rs`: - `available_actions_merges_v1_tools_with_engine_capabilities` — v1 tool + mission capability both surface on one call. - `available_actions_filters_v1_denylisted_names_from_engine_capabilities` — pins the new defensive filter. - **`cost_usd_from` subscription-billed-provider test** in `llm_adapter.rs`: - `complete_with_subscription_billed_provider_yields_zero_cost` — zero `cost_per_token` round-trips to exactly `0.0`, no NaN/Inf. Verified: `cargo fmt`, `cargo clippy --all --benches --tests --examples --all-features` (0 warnings), `cargo test -p ironclaw_engine` (409 passed), `cargo test -p ironclaw --lib` (5087 passed, +5 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(engine-v2): price cache tokens correctly in LlmBridgeAdapter Addresses PR #2660 review (gemini-code-assist + Copilot, L23/L115/L189): `cost_usd_from` only priced `input_tokens + output_tokens`, ignoring `cache_read_input_tokens` and `cache_creation_input_tokens`. For providers with prompt caching (Anthropic, OpenAI), this undercounted input cost and silently neutered the `max_budget_usd` gate. Extend the helper to mirror the canonical formula in `src/agent/cost_guard.rs::CostGuard::record_llm_call`: uncached_input = input_tokens - (cache_read + cache_write) cache_read_cost = input_rate * cache_read / cache_read_discount() cache_write_cost = input_rate * cache_write * cache_write_multiplier() cost = input_rate * uncached_input + cache_read_cost + cache_write_cost + output_rate * output_tokens All `LlmProvider` implementations already supply `cache_read_discount()` (default 1, Anthropic 10, OpenAI 2) and `cache_write_multiplier()` (default 1, Anthropic 1.25 for 5m / 2.0 for 1h) through the decorator chain, so no trait surgery is required. Regression test: `complete_prices_cache_tokens_with_discount_and_multiplier` uses Anthropic Sonnet 5m-TTL rates, exercises a 10k-input / 2k-read / 1k-write / 500-output response, and pins the correct total ($0.03285) against the old naive $0.0375 that would have undercounted ~14%. Verified: `cargo fmt`, `cargo clippy --all --benches --tests --examples --all-features` (0 warnings), `cargo test -p ironclaw_engine` (435 passed), `cargo test -p ironclaw --lib` (5136 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ilblackdragon · 2026-04-19T15:11:33Z

Rebased onto latest `origin/staging` and addressed the two issues blocking merge. Marked ready for review.

Rebase notes

Conflict shape: the PR touched `src/channels/web/server.rs` but since it was authored, PR #2645 (stage 4a) extracted the OAuth callback into `src/channels/web/features/oauth/mod.rs`. Cherry-picked the engine/bridge/gate changes cleanly (they applied auto-merge) and re-applied the server.rs changes surgically to the new post-refactor location:

Split `clear_auth_mode_for_thread` in `server.rs` into two: the public helper still clears both legacy session + engine pending auth, and a new `clear_session_auth_mode_for_thread` clears only the session half. Both `pub(crate)` so `features/oauth/` can reach them.
Updated `features/oauth/mod.rs::oauth_callback_handler` to call `clear_session_auth_mode_for_thread(..., None)` instead of `clear_auth_mode(...)`, preserving the engine gate across the callback so the `ExternalCallback` replay path can resolve it (preserving the paused_lease).

Deliberately did not port the PR's `dispatch_engine_external_callback` helper refactor — that helper lives in `platform/engine_dispatch.rs` which is introduced by the not-yet-merged PR #2665 (stage 4b). The inline `tx.send` flow stays for now; rolling up the refactor into this PR would couple it to #2665's merge order.

Clippy fix

Adding `paused_lease` to the `ThreadOutcome::GatePaused` variant tipped three clippy `large_enum_variant` lints past the threshold (lint fires at ~200 bytes, `CapabilityLease` is ~360 bytes):

`ThreadOutcome::GatePaused` itself
`ThreadSignal::ChildCompleted { outcome: ThreadOutcome }` (transitively)
`PreflightResult::GatePaused(ThreadOutcome)` (transitively)

Fixed by boxing `paused_lease` in `ThreadOutcome::GatePaused` — matches what the PR already does in `EngineError::GatePaused`. The single consumer that stores it unboxed (`PendingGate.paused_lease`) gets an inline `paused_lease.map(|b| *b)` unbox at the assignment site in `router.rs`.

Verification

`cargo fmt` ✓
`cargo clippy -p ironclaw_engine --all-targets -- -D warnings` → 0 warnings
`cargo clippy --all --benches --tests --examples --all-features` → 0 warnings
`cargo test -p ironclaw_engine parse_outcome_gate_paused` → ok (the PR's named test)
`cargo test -p ironclaw_engine normalize_pause_outcome_transitions_thread_to_waiting` → ok (the PR's other named test)
`cargo test -p ironclaw_engine` → all pass
`cargo test --test engine_v2_gate_integration` → all pass (including the PR's new gate test)
`cargo test -p ironclaw --lib` → 5158 passed. 1 flake on `extensions::manager::tests::test_telegram_token_colon_preserved_in_validation_url` — known flake (Flaky test: test_telegram_token_colon_preserved_in_validation_url leaks across parallel runs #2102), passes in isolation, pre-existing.

Engine v2 default critical path

With #2452 already merged and this PR ready, only #2622 (unify v2 extension auth resume flow) remains before the default flip. It's also DIRTY and needs rebase; happy to rebase that next if you want me to keep unblocking.

Copilot

Pull request overview

Updates engine v2 gate handling to persist and reuse the original capability lease across auth-gated pauses/resumes, and adjusts web OAuth callback cleanup so the engine pending-auth gate remains until replay completes.

Changes:

Add paused_lease plumbing across ThreadOutcome::GatePaused, EngineError::GatePaused, and persisted PendingGate state.
Prefer paused_lease when replaying/resuming a pending gate action instead of re-deriving a lease.
Split legacy session auth cleanup from engine gate cleanup to avoid clearing the engine pending-auth gate in OAuth callbacks before replay finishes.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/engine_v2_gate_integration.rs	Updates test fixtures to include the new `paused_lease` field.
src/gate/store.rs	Updates pending-gate store tests to include `paused_lease`.
src/gate/persistence.rs	Updates persistence tests to include `paused_lease`.
src/gate/pending.rs	Adds `paused_lease` to persisted `PendingGate` model.
src/channels/web/server.rs	Splits auth cleanup into “session only” vs “session + engine gate” helpers.
src/channels/web/features/oauth/mod.rs	Uses session-only cleanup in OAuth callback to preserve engine gate until replay.
src/bridge/router.rs	Persists `paused_lease` into pending gates and prefers it during resume/replay.
src/bridge/effect_adapter.rs	Captures the current lease snapshot into `GatePaused` errors.
src/agent/agent_loop.rs	Allows `ExternalCallback` submissions to target a thread UUID similarly to approvals.
crates/ironclaw_engine/src/types/error.rs	Extends `EngineError::GatePaused` with `paused_lease`.
crates/ironclaw_engine/src/runtime/mission.rs	Updates test fixtures to include `paused_lease`.
crates/ironclaw_engine/src/runtime/messaging.rs	Extends `ThreadOutcome::GatePaused` with boxed `paused_lease`.
crates/ironclaw_engine/src/executor/structured.rs	Propagates `paused_lease` through GatePaused result classification and parsing.
crates/ironclaw_engine/src/executor/scripting.rs	Updates preflight GatePaused construction to include `paused_lease`.
crates/ironclaw_engine/src/executor/orchestrator.rs	Logs/serializes `paused_lease`, parses it from tool results, and adds a parsing test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T15:15:54Z

+    let lease = if let Some(lease) = pending.paused_lease.clone() {
+        lease
+    } else {
+        state
+            .thread_manager
+            .leases
+            .find_lease_for_action(pending.thread_id, &pending.action_name)
+            .await
+            .ok_or_else(|| {
+                engine_err(
+                    "resume lease",
+                    format!("no active lease covers action '{}'", pending.action_name),
+                )
+            })?
+    };


When resuming a pending gate, pending.paused_lease is used directly without any validation (thread_id match / action coverage / revocation / expiry). This bypasses the normal LeaseManager::find_lease_for_action checks and could allow resuming with a stale or mismatched lease. Consider validating that the snapshot lease applies to pending.thread_id and pending.action_name (and is not revoked/expired), otherwise fall back to find_lease_for_action or fail closed.

Suggested change

let lease = if let Some(lease) = pending.paused_lease.clone() {

lease

} else {

state

.thread_manager

.leases

.find_lease_for_action(pending.thread_id, &pending.action_name)

.await

.ok_or_else(|| {

engine_err(

"resume lease",

format!("no active lease covers action '{}'", pending.action_name),

)

})?

};

let lease = state

.thread_manager

.leases

.find_lease_for_action(pending.thread_id, &pending.action_name)

.await

.ok_or_else(|| {

engine_err(

"resume lease",

format!("no active lease covers action '{}'", pending.action_name),

)

})?;

Fixed in eaf91a42. Extracted the lease-picking into a standalone resume_lease_for_pending_gate(pending, leases) helper that now validates the snapshot before use: thread_id match, action coverage, !revoked, and expires_at in the future. If any check fails, falls through to LeaseManager::find_lease_for_action — the snapshot is still preferred when valid (the original bug: no active lease at resume after restart) but stale snapshots no longer silently bypass revocation/expiry.

Copilot · 2026-04-19T15:15:54Z

+    let lease = if let Some(lease) = pending.paused_lease.clone() {
+        lease
+    } else {
+        state
+            .thread_manager


The new resume path that prefers pending.paused_lease over re-deriving via LeaseManager is a key behavioral change but doesn’t appear to be covered by a router-level test. Adding a test that reproduces the regression (e.g., paused action where no active lease exists at resume, but paused_lease allows completion) would help prevent future breaks.

Fixed in eaf91a42. Added 6 libsql-free tests in bridge::router::tests driving resume_lease_for_pending_gate end-to-end:

resume_lease_prefers_snapshot_even_when_lease_manager_empty — pins the original bug fix.

resume_lease_rejects_revoked_snapshot_and_falls_back — revoked snapshot + live lease → use live.

resume_lease_rejects_expired_snapshot_and_falls_back — expired snapshot + live lease → use live.

resume_lease_rejects_snapshot_with_wrong_thread_id — defensive: drift between snapshot and pending.

resume_lease_rejects_snapshot_missing_action_coverage — lease must actually grant the pending action.

resume_lease_returns_none_when_no_snapshot_and_no_active_lease — clean None for caller's error path.

…ase-replay # Conflicts: # src/channels/web/features/oauth/mod.rs # src/channels/web/server.rs

Addresses PR #2631 review comments from Copilot: 1. **Snapshot used without validation** (src/bridge/router.rs:684 orig): `pending.paused_lease.clone()` was used directly to resume a gated action. A gate can sit in the pending-gate store for hours or across process restarts; during that window the original lease may have been revoked, expired, or the pending record could have drifted off its original thread. Extract `snapshot_lease_still_valid` + `resume_lease_for_pending_gate` helpers. The snapshot must pass four checks before use: - `lease.thread_id == pending.thread_id` - `granted_actions.covers(&pending.action_name)` - `!revoked` - `expires_at` is unset or in the future If any check fails, fall through to `LeaseManager::find_lease_for_action` (the normal path). Matches the reviewer's suggestion to avoid silently resuming a stale snapshot; still prefers the snapshot when valid so the original bug (no active lease at resume after restart) stays fixed. 2. **No router-level regression test** for the snapshot-vs-fallback decision. Six new libsql-free tests in `bridge::router::tests`: - `resume_lease_prefers_snapshot_even_when_lease_manager_empty` — reproduces the original bug; snapshot must carry the resume. - `resume_lease_rejects_revoked_snapshot_and_falls_back` - `resume_lease_rejects_expired_snapshot_and_falls_back` - `resume_lease_rejects_snapshot_with_wrong_thread_id` - `resume_lease_rejects_snapshot_missing_action_coverage` - `resume_lease_returns_none_when_no_snapshot_and_no_active_lease` Verified: `cargo fmt`, `cargo clippy --all --benches --tests --examples --all-features` (0 warnings), `cargo test -p ironclaw_engine` (435 passed), `cargo test -p ironclaw --lib` (5182 passed, +6 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI rustfmt (nightly/stable 1.95.0) wants the `assert!(matches!())` in `orchestrator.rs::parse_outcome_gate_paused` collapsed to fewer lines. Local rustfmt 1.94 was happy with the expanded form; matching CI to unblock the fmt check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T16:43:33Z

        let approval_thread_uuid = if matches!(
            submission,
            Submission::ExecApproval { .. }
                | Submission::ApprovalResponse { .. }
+                | Submission::ExternalCallback { .. }
                | Submission::GateAuthResolution { .. }
        ) {


Including Submission::ExternalCallback in approval_thread_uuid resolution can mutate sess.active_thread based solely on conversation_scope even when the request_id doesn’t match any pending auth gate. A crafted JSON ExternalCallback (parsed from message content) can therefore switch the user’s active thread and then fail with "No matching pending authentication gate found." Consider preventing thread switching unless the callback is known-valid for that thread (e.g., look up the pending gate by request_id first and only then set active_thread), or gate this cross-thread routing to trusted/internal submissions only.

Synthesizes recurring patterns from ~30 merged PRs, 147 bot review comments (Copilot/Gemini), human reviews, and ~50 issues filed in the past 2 weeks. Each rule cites the motivating PR/issue numbers. New files: - error-handling.md — silent-failure taxonomy (unwrap_or_default, .ok()?, poisoned caches), persist-then-reload atomicity, channel-edge error mapping. (#2526, #2633, #2653, #2673, #2546, #2407, #2408) - agent-evidence.md — side-effect claims must cite tool evidence, empty-fast outputs are errors, external-effect tools must read back, setup UI round-trip. (#2544, #2580, #2582, #2541, #2545, #2411, #2543, #2586) - lifecycle.md — discovery vs. activation, terminal auth rejection, list_installed vs. list_active, deactivation unwinds, snapshot rehydrate must re-validate. (#2556, #2557, #2558, #2564, #2419, PR #2617, PR #2631) Extended: - types.md — from_trusted boundary rule, validated-newtype template with shared validate(&str), serde(try_from) required for validated types, wire-stable enums (no Debug; serde alias for migrations), canonical wire-contract field naming. (PR #2685, #2681, #2687, #2678, #2669, #2665, #2683, #2702) - safety-and-sandbox.md — every new ingress scans pre-transform/pre- injection, bounded resources (interners/streams/fan-out caps), cache keys must be complete. (#2491, #2676, #2470, #2633, #2673, #2710, PR #2702) - review-discipline.md — PR scope discipline, guardrail scripts are code (regression tests, grouped-import parsing, CI has_code inclusion), absolute-path ban in committed docs, stale comments after refactors. (PR #2668, #2628, #2680, #2687, #2647, #2689, #2701) All new files carry paths: frontmatter so they auto-load only on matching files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rules): add review-driven guidance for Claude Code Synthesizes recurring patterns from ~30 merged PRs, 147 bot review comments (Copilot/Gemini), human reviews, and ~50 issues filed in the past 2 weeks. Each rule cites the motivating PR/issue numbers. New files: - error-handling.md — silent-failure taxonomy (unwrap_or_default, .ok()?, poisoned caches), persist-then-reload atomicity, channel-edge error mapping. (#2526, #2633, #2653, #2673, #2546, #2407, #2408) - agent-evidence.md — side-effect claims must cite tool evidence, empty-fast outputs are errors, external-effect tools must read back, setup UI round-trip. (#2544, #2580, #2582, #2541, #2545, #2411, #2543, #2586) - lifecycle.md — discovery vs. activation, terminal auth rejection, list_installed vs. list_active, deactivation unwinds, snapshot rehydrate must re-validate. (#2556, #2557, #2558, #2564, #2419, PR #2617, PR #2631) Extended: - types.md — from_trusted boundary rule, validated-newtype template with shared validate(&str), serde(try_from) required for validated types, wire-stable enums (no Debug; serde alias for migrations), canonical wire-contract field naming. (PR #2685, #2681, #2687, #2678, #2669, #2665, #2683, #2702) - safety-and-sandbox.md — every new ingress scans pre-transform/pre- injection, bounded resources (interners/streams/fan-out caps), cache keys must be complete. (#2491, #2676, #2470, #2633, #2673, #2710, PR #2702) - review-discipline.md — PR scope discipline, guardrail scripts are code (regression tests, grouped-import parsing, CI has_code inclusion), absolute-path ban in committed docs, stale comments after refactors. (PR #2668, #2628, #2680, #2687, #2647, #2689, #2701) All new files carry paths: frontmatter so they auto-load only on matching files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(rules): split agent-evidence into prompt + code rule agent-evidence.md mixed two concerns: runtime agent instruction (what the LLM should do when concluding a turn) and code-enforcement rules (what the dispatcher, engine, and tools must implement). Rules under .claude/rules/ only guide Claude Code when editing the repo — the runtime agent never reads them. Splits the two: - crates/ironclaw_engine/prompts/codeact_postamble.md — new section "Evidence before claiming side effects". Sits next to the existing "FINAL() answer quality" guidance; loaded via include_str! in executor/prompt.rs (no Rust change needed). - .claude/rules/tool-evidence.md — renamed from agent-evidence.md, keeps only the code invariants (engine v2 side-effect gate, empty-fast ToolError::EmptyResult, external-effect tools must read back, setup UI round-trip). Prompt tests pass unchanged; the postamble addition is pure text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * prompt: tighten evidence rule to FINAL() claims only, not tool use Live-test validation of the "Evidence before claiming side effects" section (added in the prior commit) showed it inhibited legitimate tool use. With the original wording, `zizmor_scan_v2` live-recording timed out at 302s with zero responses; reverting the postamble restored healthy behavior (88s run, 8 shell calls including `cargo install zizmor` and full workflow analysis). The original phrasing conflated two things: what the agent should claim and what tools it should call. The rule is only about the claim. Re-tunes the section to: - Open with an explicit "this does not restrict tool calls" scope. - Drop the "<1ms = failure" heuristic (too broad — normal tools like `tool_info(schema)` are legitimately fast). - Drop the full enumeration of forbidden side-effect verbs; keep the rule narrower and clearer. - Shorten the code example (remove redundant early-return). Re-tuned run: agent is active (shell calls, real reasoning), live recording completes in ~9s. The remaining test failure is a pre-existing assertion bug (exact `t == "shell"` match against tool strings that now carry arguments like `"shell(cmd)"`) — reproduces with the old postamble too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(live): fix tool-name assertions + re-record zizmor traces The two `zizmor_scan*` live tests had four broken tool-name assertions that silently failed to match: `tools.iter().any(|t| t == "shell")` against a tool list that now contains `"shell(cmd)"` strings (tool events carry args via `format_action_display_name` in `src/bridge/router.rs`). Two of the four were negative assertions checking for the absence of `tool_install` recovery loops — those silently passed even when a recovery loop actually ran. `sandbox_live_e2e.rs:203` already used the correct `t == "shell" || t.starts_with("shell(")` pattern; applied it consistently to all four sites. Verified live: - `IRONCLAW_LIVE_TEST=1 cargo test --test e2e_live -- zizmor_scan --ignored --test-threads=1` → 2 passed, 0 failed, 51.78s. Agent installs and runs zizmor end-to-end, producing real findings (exit code 14, dangerous triggers, excessive permissions, etc.). Traces re-recorded with the tuned postamble (commit 50d8517) and scrubbed: replaced `/home/illia/.cargo/bin/zizmor` with `/home/user/.cargo/bin/zizmor` per the developer-local-path ban in `.claude/rules/review-discipline.md`. No credentials, PII, or high-entropy secrets in either trace (only git SHAs from zizmor's workflow analysis output). Replay still passes: `cargo test --test e2e_live -- zizmor_scan --ignored` → 2/2 ok. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(replay): update zizmor_scan_v2 insta snapshot The engine v2 replay-snapshot gate (`engine_v2_tests::snapshot_zizmor_scan_v2`) failed against the re-recorded trace from 1691efe because the old snapshot encoded a broken run: - final_state: Failed - Missing Assistant message role - 3 issues: thread_failure (error), no_response (warning), llm_error (error) - 6 tool calls that never produced a final answer The new trace completes cleanly: - final_state: Done - System / User / Assistant roles present - 1 issue: mixed_mode (info) - 3 shell tool calls + successful `FINAL()` with real findings The snapshot was pinning a regression. Regenerated with `INSTA_UPDATE=always cargo test --test e2e_engine_v2 -- snapshot_zizmor_scan_v2`; passes on replay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rules): address PR #2714 review feedback - review-discipline: reword "Doc Absolute Paths" as a review convention (pre-commit only scans .rs; the rule misleadingly claimed enforcement). - safety-and-sandbox: broaden `paths:` frontmatter to include the actual ingress owners (`bridge`, `channels`, `workspace`, `agent`, engine crate) so the rule auto-loads where it applies. - tool-evidence: mark the side-effect gate, empty-fast rule, and `unverified` flag as target/aspirational invariants — neither `ToolError::EmptyResult`, an `unverified` field on `ToolOutput`, nor a byte-count field on `ActionRecord` exist today. Point at concrete interim conventions (`ToolError::ExecutionFailed`, `unverified: true` in the JSON result body). - types: scope "Validated newtypes must gate Deserialize" to *new* types, document the `CredentialName`/`ExtensionName` exception (they intentionally use `#[serde(transparent)]` + derived `Deserialize` under the `serde_does_not_revalidate` test). Clarify the `from_trusted` trust boundary (trusted = typed upstream, untrusted = raw JSON field even if the field *name* is "registry entry"). Switch `new` template to `impl Into<String>` to avoid an unnecessary clone when an owned `String` is passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rules): simplify types.md + split doc-hygiene; address review round 2 - types: collapse two templates into one canonical validated-newtype shape. New types use `#[serde(try_from = "String")]` with a shared `validate(&str)` helper — no more dual "transparent for some / try_from for others" guidance. `CredentialName`/`ExtensionName` are documented as the sole legacy exception (locked in by the `serde_does_not_revalidate` test); new code must not copy their `transparent` + `from_trusted` pattern. Removes the long "Using `from_trusted` safely" section and the separate "Validated newtypes must gate Deserialize" subsection that contradicted the Don'ts list. - doc-hygiene: new tiny rule file scoped to `**/*.md`, `**/*.py`, `docs/**` that carries the "no developer-local absolute paths in committed docs" convention. Removed from review-discipline.md where its `src/**/*.rs` scope meant the rule never loaded on the files it governed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(live): match hyphenated tool-install in attempted_relevant_tool The engine records `action_name` as the raw string the LLM emitted (`crates/ironclaw_engine/src/executor/structured.rs:381`), and the registry's lookup canonicalization only affects dispatch — not the name that reaches `StatusUpdate::ToolStarted`. The two other predicates in this file (`bad_recovery` at :420, `phase_b_recovery` at :531) already defend against both forms; this one should too, for consistency. Addresses PR #2714 review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel/web Web gateway channel size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 18, 2026

gemini-code-assist Bot reviewed Apr 18, 2026

View reviewed changes

ilblackdragon mentioned this pull request Apr 19, 2026

feat(engine-v2): Phase 4 cost tracking + Phase 6 mission lifecycle acceptance #2660

Merged

5 tasks

Preserve paused leases across engine auth resume

8bb2ef5

ilblackdragon force-pushed the henry/v2-paused-lease-replay branch from 4bd24e9 to 8bb2ef5 Compare April 19, 2026 15:11

ilblackdragon marked this pull request as ready for review April 19, 2026 15:11

Copilot AI review requested due to automatic review settings April 19, 2026 15:11

Copilot started reviewing on behalf of ilblackdragon April 19, 2026 15:11 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

ilblackdragon and others added 2 commits April 19, 2026 15:28

Merge remote-tracking branch 'origin/staging' into henry/v2-paused-le…

4ea0b61

…ase-replay # Conflicts: # src/channels/web/features/oauth/mod.rs # src/channels/web/server.rs

github-actions Bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Apr 19, 2026

Copilot AI review requested due to automatic review settings April 19, 2026 16:37

Copilot started reviewing on behalf of ilblackdragon April 19, 2026 16:38 View session

ilblackdragon approved these changes Apr 19, 2026

View reviewed changes

Copilot AI reviewed Apr 19, 2026

View reviewed changes

ilblackdragon merged commit 6419347 into staging Apr 19, 2026
22 checks passed

ilblackdragon deleted the henry/v2-paused-lease-replay branch April 19, 2026 16:45

This was referenced Apr 19, 2026

chore: promote staging to staging-promote/cad5e50f-24632925495 (2026-04-19 16:55 UTC) #2690

Merged

chore: promote staging to staging-promote/64193474-24634116619 (2026-04-19 17:30 UTC) #2691

Merged

ilblackdragon mentioned this pull request Apr 20, 2026

docs(rules): add review-driven guidance for Claude Code #2714

Merged

4 tasks

henrypark133 mentioned this pull request Apr 21, 2026

chore: release #2606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve paused leases across engine auth resume#2631

Preserve paused leases across engine auth resume#2631
ilblackdragon merged 4 commits intostagingfrom
henry/v2-paused-lease-replay

henrypark133 commented Apr 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

ilblackdragon commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

ilblackdragon Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

ilblackdragon Apr 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

henrypark133 commented Apr 18, 2026

Summary

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ilblackdragon commented Apr 19, 2026

Rebase notes

Clippy fix

Verification

Engine v2 default critical path

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

ilblackdragon Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

ilblackdragon Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants