chore: promote staging to staging-promote/11f00698-24612972670 (2026-04-19 09:02 UTC) by ironclaw-ci[bot] · Pull Request #2659 · nearai/ironclaw

ironclaw-ci · 2026-04-19T09:02:01Z

Auto-promotion from staging CI

Batch range: a53eac5c2dec6b6cd5c08189086093fde64aa9cb..ff119531d43506e014d1e2101899c983e2ff0ec8
Promotion branch: staging-promote/ff119531-24625403497
Base: staging-promote/11f00698-24612972670
Triggered by: Staging CI batch at 2026-04-19 09:02 UTC

Commits in this batch (122):

7be3b91 [codex] Label migration PRs with DB MIGRATION ([codex] Label migration PRs with DB MIGRATION #1967)
6f7575d Fix Telegram UTF-16 message splitting (Fix Telegram UTF-16 message splitting #1961)
f0db0a3 chore: bump registry versions for github tool, whatsapp and telegram channels
92388b7 revert: undo 2 main-only commits to unblock staging-promote merge (revert: undo 2 main-only commits to unblock staging-promote merge #2297)
fda3767 chore: release (chore: release #2075)
be6de43 fix(ci): unblock v0.25.0 release — fix tag filter and publish config (fix(ci): unblock v0.25.0 release — fix tag filter and publish config #2306)
72829db chore: update WASM artifact SHA256 checksums [skip ci] (chore: update WASM artifact checksums and version-pinned URLs #2308)
a7401ec fix(gateway): scope chat approvals to the active thread (fix(gateway): scope chat approvals to the active thread #2267)
4032f6d Fix paired Telegram owner scope routine visibility (Fix paired Telegram owner scope routine visibility #2258)
7d66d83 fix(engine): always append ActionResult for every tool call (fix(engine): always append ActionResult for every tool call #2322)
764e586 feat(engine): LLM council via per-call model override in CodeAct (feat(engine): LLM council via per-call model override in CodeAct #2320)
70862ed feat(config): default CLI_MODE to TUI instead of REPL (feat(config): default CLI_MODE to TUI #2329)
207c4d4 ci: build docker image in release process (ci: build docker image in release process #2321)
cd9b60c fix: re-apply Telegram UTF-16 splitting and DB MIGRATION label (fix: re-apply Telegram UTF-16 splitting and DB MIGRATION label #2304)
88b87c0 feat: user-facing temperature setting (feat: user-facing temperature setting #2275)
fdb0a13 chore: sync staging and main (chore: sync staging and main #2337)
3cb77fe fix: resolve cargo-deny failures (wildcard deps + rand advisory) (fix: resolve cargo-deny failures (wildcard deps + rand advisory) #2370)
ed2d6dc fix web chat refresh active thread ([codex] Fix web chat refresh active thread #2330)
66ccafb chore(engine): update monty to v0.0.11 (chore(engine): update monty to v0.0.11 #2364)
4529f00 fix(engine): track consecutive action errors in orchestrator Tier 0 path (Orchestrator: add action error counting to Python execution loop #2325) (fix(engine): track consecutive action errors in orchestrator (#2325) #2340)
50fc280 fix(agent): detect and escalate repeated identical failing tool calls (Agent retries same failing tool call up to 50 times with no duplicate detection #2240) (fix(agent): detect and escalate repeated identical failing tool calls #2338)
3f6149c feat(cli): add ironclaw profile list subcommand (feat(cli): add ironclaw profile list subcommand #2288)
625cd85 Suppress LLM_BACKEND warning when config.toml and .env values match (Suppress LLM_BACKEND warning when config.toml and .env values match #2388)
a9cea6c fix(ci): skip NearAI URL DNS validation for non-NearAI backends (fix(ci): skip NearAI URL DNS validation for non-NearAI backends #2080)
160a75e fix(llm): image detail field + /v1 base URL normalization (fix(llm): image detail field + /v1 base URL normalization #2380)
4dbb44c fix(docker): make runtime-staging the default Docker target for Railway (fix(docker): make runtime-staging the default Docker target for Railway #2244)
532fc61 feat: admin management panel — web UI for users and usage monitoring (feat: admin management panel — web UI for users and usage monitoring #1963)
16b7b06 feat(web): add ironclaw docs link (feat(web): add ironclaw docs link #2398)
ba81044 fix(mcp): Install NEAR AI MCP server from environment config (fix(mcp): Install NEAR AI MCP server from environment config #2181)
57d7b54 Fix Feishu webhook auth refresh and extension card overflow (fix: Feishu webhook auth refresh and extension card overflow #2443)
7425dc0 fix(security): harden approval thread safety (TOCTOU + error handling) (fix(security): harden approval thread safety (TOCTOU + error handling) #2366)
86a9d0b fix: image generation with nearai models (fix: image generation with nearai models #1819)
46ff740 docs(setup): warn that Telegram open mode splits history (docs(setup): warn that Telegram open mode splits history #2427)
aea2e87 fix(ux): actionable auth errors and improved CLI help for new users (should make it easy to use #1852) (fix(ux): actionable auth errors and improved CLI help for new users (#1852) #2315)
ab1f279 fix: more strict check for registry to avoid false positives (fix: more strict check for registry to avoid false positives #2222)
019c048 refactor(llm): promote decorator chain settings from NearAiConfig to top-level LlmConfig (refactor(llm): promote decorator chain settings from NearAiConfig to top-level LlmConfig #1749)
b6f5da8 perf(tunnel): reuse HTTP client in CustomTunnel health checks (perf(tunnel): reuse HTTP client in CustomTunnel health checks #1201)
fe2b134 fix(engine): guard consecutive-error checks against None limit (fix(engine): guard consecutive-error checks against None limit #2460)
37669e6 fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer ([QA] Pages Unresponsive dialog and black screen crashes #2406) (fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer (#2406) #2441)
1041094 style: fix cargo fmt line wrapping in thread_ops.rs (style: fix cargo fmt in thread_ops.rs #2451)
0a9d816 fix(responses-api): thread creation, GET by ID, streaming delta, context injection (fix(responses-api): thread creation, GET by ID, streaming delta, context injection #2167)
1fa73a4 docs: add Responses API section to USER_MANAGEMENT_API (docs: add Responses API reference #2440)
5140279 docs: guide how to host ironclaw on google cloud (docs: add google tutorial #2262)
d85c11d fix(telegram): route WASM owner_id fallback (fix(telegram): route WASM owner_id fallback #2349)
28c6a15 fix(ci): exclude test files from PR size classification (fix(ci): exclude test files from PR size classification #2387)
d63601e fix(security): gate test URL rewriters behind #[cfg(test)] (fixes SECURITY: Remove env-var-controlled API URL rewriters from production WASM channel wrapper (Telegram & Slack) #2056) (fix(security): gate test URL rewriters behind #[cfg(test)] (fixes #2056) #2401)
82d341d fix(sandbox): try Docker socket before CLI binary check (fix(sandbox): try Docker socket before CLI binary check #2467)
2dc78b2 feat(db): add per-user CachedSettingsStore decorator (feat(db): add per-user CachedSettingsStore decorator #2425)
8973d1b fix: use gateway owner_id for relay OAuth nonce storage (fix: use gateway owner_id for relay OAuth nonce storage #2473)
16a0731 test(e2e): add Playwright persistence happy-path test (test(e2e): add Playwright persistence happy-path test #2475)
... and 72 more (see compare view)

Current commits in this promotion (1)

Current base: staging-promote/11f00698-24612972670
Current head: staging-promote/ff119531-24625403497
Current range: origin/staging-promote/11f00698-24612972670..origin/staging-promote/ff119531-24625403497

ff11953 test(replay): promote engine traces to insta-backed snapshot gate (test(replay): promote engine traces to insta-backed snapshot gate #2621)

Auto-updated by staging promotion metadata workflow

Waiting for gates:

Tests: pending
E2E: pending
Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

) * test(replay): promote engine replay traces to insta-backed snapshot gate Adds a ReplayOutcome snapshot type, a replay-gate CI workflow, and a developer script wrapper for cargo-insta. Replaces unreviewable 3,000-line JSON diffs on engine changes with a YAML snapshot of the observable run shape (tool sequence, final state, retrospective analyzer issues). Why: engine v2 live-fixture traces had grown past reviewability. A single prompt-wording change could move the whole fixture, and reviewers had no way to see which behaviour actually changed. Splitting the fixture into a "replay driver" (JSON stays in tests/fixtures/) and a "regression snapshot" (YAML in tests/snapshots/) gives reviewers a narrow, stable diff to approve, while keeping the full recorded context for deterministic replay. Changes: - `tests/support/replay_outcome.rs` — ReplayOutcome + assert_replay_snapshot! macro; snapshots include retrospective analyzer output (TraceIssue severity/category) via a new `ironclaw::bridge::engine_retrospectives_for_test()` helper that runs `build_trace()` over engine threads - `tests/e2e_engine_v2.rs` — three POC snapshot tests (single_tool_echo, tool_error_recovery, zizmor_scan_v2) - `tests/e2e_bug_bash_snapshots.rs` + `tests/fixtures/llm_traces/bug_bash/` — bug-regression fixture template, mapped to open issues in the README - `.github/workflows/replay-gate.yml` — cargo insta test --check on engine/agent/LLM/tools/bridge path changes; rejects committed .snap.new - `scripts/replay-snap.sh` — review/accept/test/record wrappers around cargo-insta and IRONCLAW_RECORD_TRACE - `scripts/trace-coverage.sh` — reports EventKind variants with snapshot coverage; `--strict` mode for future CI promotion - `tests/e2e_live.rs` — `#[ignore]` swapped for `cfg_attr(not(feature="replay"), ignore)` so the replay CI job can run the scenarios without `-- --ignored` - `Cargo.toml` — new `replay = ["libsql"]` feature; insta gains the `yaml` feature - `tests/fixtures/llm_traces/README.md` — documents the two-role driver/snapshot split Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(replay): address PR #2621 review + swap cargo-insta installer Review fixes: - Replay gate was missing the bug-bash snapshot suite. Adds `tests/e2e_bug_bash_snapshots.rs` to the workflow paths trigger and the `cargo insta test --check` invocation so bug-regression snapshots are actually gated. (copilot-pull-request-reviewer) - `cargo install cargo-insta --locked` added ~40s of cold-cache compile to the gate. Swapped for `taiki-e/install-action@v2`, which downloads a precompiled binary in a few seconds. Also updated `scripts/replay-snap.sh` to *fail closed* when cargo-insta is missing instead of silently auto-installing it. (gemini-code-assist) - `engine_retrospectives_for_test` was `pub` and re-exported under the default-enabled `libsql` feature, contradicting its "not part of any public API" doc. Split the re-export, kept `reset_engine_state` as a plain `pub use`, and hid `engine_retrospectives_for_test` behind `#[doc(hidden)]` — it still needs to cross the crate boundary for integration tests (which live in a separate crate, so `#[cfg(test)]` doesn't reach them), but no longer appears in published docs. (copilot-pull-request-reviewer) - Added an explicit "caller must serialize" note on `engine_retrospectives_for_test` explaining the `ENGINE_STATE` singleton and pointing new callers at `engine_v2_test_lock()` / `reset_engine_state()`. Matches what the existing snapshot tests already do. (gemini-code-assist) Doc corrections: - `snapshot_zizmor_scan_v2` doc claimed the snapshot pinned `ApprovalNeeded` events and response wording — it doesn't. Rewrote to describe what the snapshot actually asserts (tool order, step count, retrospective issues, final state). (copilot-pull-request-reviewer) - `llm_call_count` was documented as "bucketed" but passed through verbatim. Updated the field doc to reflect the raw value. Bucketing wasn't needed because fixtures are deterministic. (copilot-pull-request-reviewer) - `src/bridge/router.rs` doc referenced a non-existent `ReplayOutcome.trace_issues` field — the struct uses `engine_threads`. Fixed the reference. (copilot-pull-request-reviewer) - `scripts/trace-coverage.sh` header claimed CI runs it with `--strict`; the workflow runs it in advisory mode. Rewrote the header to match, with a pointer for when to promote to strict. (copilot-pull-request-reviewer) No-change replies (rationale commented in the code): - `event_kind_name` uses an exhaustive `match` on `EventKind` rather than `Debug` or a `strum` derive. The compile-time exhaustiveness check is the point — adding a new engine event should force a conscious decision about how the snapshot represents it, not a silent fallthrough. Added a comment making that intent explicit. - `trace-coverage.sh` awk parser of `event.rs` is fragile — agreed, but the script is advisory and its failure mode is false negatives (uncovered variants simply aren't gated). Documented the tradeoff and the rewrite-in-Rust escape hatch in the script header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(replay-gate): prime cache on staging, restrict PR runs to read-only The second run on PR #2621 missed the cache ("No cache found" in the rust-cache restore step) even though the workflow is wired correctly. Root cause: the repo sits close to GitHub's 10 GB per-repo cache quota (~59 entries, many >500 MB), and the LRU policy evicts PR-scoped caches before they get reused. Fix: - Add `push: [staging, main]` so the gate runs (and saves a ~1.2 GB cache under the `replay-gate` key) on every merge to the branches PRs actually target. Subsequent PRs restore from that base-branch cache — GitHub Actions permits cross-ref restore when the restoring ref's base matches the saved ref. - Set `save-if: ${{ github.event_name == 'push' }}` so PR runs only *read* the cache. Without this gate, each PR push would save its own copy and crowd out the primed base-branch cache, putting us right back in the eviction loop. Expected effect: cold-cache 9m → warm ~2-3m once staging has a run with the new workflow. Base-branch prime run still pays 9m (no regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(replay): drop bug-bash fixture scaffolding Replay fixtures can't reproduce the Phase 3 target bugs because the fixture *is* the LLM's output — handwriting a trace where the LLM emits a tool call doesn't test whether the real LLM would have emitted that call, only that the harness dispatches a scripted one. What `summarization_uses_tools.json` actually pinned was the happy path, not the #2541 bug. Of the 7 open bug-bash issues, only #2544 ("plans and delegates but never executes") is catchable by replay, and only via a live-recorded fixture. The other six are LLM-behavior or infra-timing bugs outside replay's reach. Rather than ship regression theater, tear out the scaffolding. Removed: - tests/e2e_bug_bash_snapshots.rs - tests/fixtures/llm_traces/bug_bash/ - tests/snapshots/replay__bug_bash_summarization_uses_tools.snap Unwired: - Replay-gate workflow paths + test list no longer mention bug_bash - scripts/replay-snap.sh test command drops the extra --test flag Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: switch to cargo-nextest with per-test timeouts Nextest runs each integration test in its own process and runs test binaries in parallel, which is a big unlock for this repo: - Engine v2 tests share a process-global `ENGINE_STATE` singleton (OnceLock), which the current test lock serialises inside a single test binary. Nextest's process-per-test model gives each test a clean state automatically, so the 16 engine_v2 tests stop running one-by-one. - Cross-binary parallelism: `cargo test --test A --test B` runs binaries in sequence; nextest runs them concurrently. Measured locally: the replay-gate test set (3 binaries, 21 tests) went from ~30s sequential to **2.7s parallel**. Adds `.config/nextest.toml` with: - `slow-timeout = 60s / terminate-after 3` in the default profile so a hung test fails fast instead of blocking the workflow-level 25- minute cap. - A `ci` profile with `fail-fast = false` (one flake shouldn't mask other failures), `failure-output = immediate-final`, `success-output = never` for readable Actions logs. - Per-test 300s override for the handful of genuinely slow scenarios (zizmor scan, e2e_thread_scheduling). Workflows updated: - `replay-gate.yml`: installs cargo-nextest via taiki-e/install-action alongside cargo-insta (one step), runs `cargo insta test --test-runner nextest` with `NEXTEST_PROFILE=ci`. - `test.yml`: all five `cargo test` invocations swapped for `cargo nextest run --profile ci`. Nextest doesn't execute doctests, so every nextest step is paired with a `cargo test --doc` follow-up to preserve coverage. Local dev is unchanged — `cargo test` still works; nextest is only required in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: re-trigger replay-gate workflow after nextest migration Previous push only modified workflow files and `.config/nextest.toml`; GitHub skipped the `pull_request` workflow events for that sync, so the nextest migration didn't actually get exercised in CI. Empty commit forces re-evaluation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(replay): note nextest wiring in the fixtures README Also forces a CI re-run: the previous empty commit had no matching paths, so the `pull_request.paths` filters skipped every workflow including replay-gate. Touching a file under `tests/fixtures/llm_traces/**` re-matches the filter and runs the nextest-based gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(test): defer test.yml nextest migration Staging restructured test.yml significantly while this PR was open (matrix-config dynamic matrix, `changes` code-detection job, composite install-cargo-component action, save-if restricted to base-branch pushes). The merge into staging had heavy conflicts for every nextest-swap hunk. Rather than force a re-layering of the new staging structure on top of the nextest migration in this PR, revert test.yml to staging's current version. This PR now scopes the nextest change to just the replay-gate workflow (where it cleanly demonstrates the value) plus the shared `.config/nextest.toml` profile. Migrating the rest of test.yml to nextest is a follow-up that can rebase on the new structure without the heavy conflict surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Henry Park <henrypark133@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ironclaw-ci bot added the staging-promotion label Apr 19, 2026

github-actions bot added scope: ci CI/CD workflows scope: docs Documentation scope: dependencies Dependency updates size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: promote staging to staging-promote/11f00698-24612972670 (2026-04-19 09:02 UTC)#2659

chore: promote staging to staging-promote/11f00698-24612972670 (2026-04-19 09:02 UTC)#2659
ironclaw-ci[bot] wants to merge 1 commit intostaging-promote/11f00698-24612972670from
staging-promote/ff119531-24625403497

ironclaw-ci bot commented Apr 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ironclaw-ci bot commented Apr 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-promotion from staging CI

Commits in this batch (122):

Current commits in this promotion (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ironclaw-ci bot commented Apr 19, 2026 •

edited by github-actions bot

Loading