chore: promote staging to staging-promote/79ad2e38-24572736232 (2026-04-17 16:19 UTC) by ironclaw-ci[bot] · Pull Request #2587 · nearai/ironclaw

ironclaw-ci · 2026-04-17T16:19:08Z

Auto-promotion from staging CI

Batch range: a53eac5c2dec6b6cd5c08189086093fde64aa9cb..e65ba2e4d9f46149207ec8ceb09f25b31d07f8bd
Promotion branch: staging-promote/e65ba2e4-24575255629
Base: staging-promote/79ad2e38-24572736232
Triggered by: Staging CI batch at 2026-04-17 16:19 UTC

Commits in this batch (72):

a7401ec fix(gateway): scope chat approvals to the active thread (fix(gateway): scope chat approvals to the active thread #2267)
4032f6d Fix paired Telegram owner scope routine visibility (Fix paired Telegram owner scope routine visibility #2258)
7d66d83 fix(engine): always append ActionResult for every tool call (fix(engine): always append ActionResult for every tool call #2322)
764e586 feat(engine): LLM council via per-call model override in CodeAct (feat(engine): LLM council via per-call model override in CodeAct #2320)
70862ed feat(config): default CLI_MODE to TUI instead of REPL (feat(config): default CLI_MODE to TUI #2329)
207c4d4 ci: build docker image in release process (ci: build docker image in release process #2321)
cd9b60c fix: re-apply Telegram UTF-16 splitting and DB MIGRATION label (fix: re-apply Telegram UTF-16 splitting and DB MIGRATION label #2304)
88b87c0 feat: user-facing temperature setting (feat: user-facing temperature setting #2275)
fdb0a13 chore: sync staging and main (chore: sync staging and main #2337)
3cb77fe fix: resolve cargo-deny failures (wildcard deps + rand advisory) (fix: resolve cargo-deny failures (wildcard deps + rand advisory) #2370)
ed2d6dc fix web chat refresh active thread ([codex] Fix web chat refresh active thread #2330)
66ccafb chore(engine): update monty to v0.0.11 (chore(engine): update monty to v0.0.11 #2364)
4529f00 fix(engine): track consecutive action errors in orchestrator Tier 0 path (Orchestrator: add action error counting to Python execution loop #2325) (fix(engine): track consecutive action errors in orchestrator (#2325) #2340)
50fc280 fix(agent): detect and escalate repeated identical failing tool calls (Agent retries same failing tool call up to 50 times with no duplicate detection #2240) (fix(agent): detect and escalate repeated identical failing tool calls #2338)
3f6149c feat(cli): add ironclaw profile list subcommand (feat(cli): add ironclaw profile list subcommand #2288)
625cd85 Suppress LLM_BACKEND warning when config.toml and .env values match (Suppress LLM_BACKEND warning when config.toml and .env values match #2388)
a9cea6c fix(ci): skip NearAI URL DNS validation for non-NearAI backends (fix(ci): skip NearAI URL DNS validation for non-NearAI backends #2080)
160a75e fix(llm): image detail field + /v1 base URL normalization (fix(llm): image detail field + /v1 base URL normalization #2380)
4dbb44c fix(docker): make runtime-staging the default Docker target for Railway (fix(docker): make runtime-staging the default Docker target for Railway #2244)
532fc61 feat: admin management panel — web UI for users and usage monitoring (feat: admin management panel — web UI for users and usage monitoring #1963)
16b7b06 feat(web): add ironclaw docs link (feat(web): add ironclaw docs link #2398)
ba81044 fix(mcp): Install NEAR AI MCP server from environment config (fix(mcp): Install NEAR AI MCP server from environment config #2181)
57d7b54 Fix Feishu webhook auth refresh and extension card overflow (fix: Feishu webhook auth refresh and extension card overflow #2443)
7425dc0 fix(security): harden approval thread safety (TOCTOU + error handling) (fix(security): harden approval thread safety (TOCTOU + error handling) #2366)
86a9d0b fix: image generation with nearai models (fix: image generation with nearai models #1819)
46ff740 docs(setup): warn that Telegram open mode splits history (docs(setup): warn that Telegram open mode splits history #2427)
aea2e87 fix(ux): actionable auth errors and improved CLI help for new users (should make it easy to use #1852) (fix(ux): actionable auth errors and improved CLI help for new users (#1852) #2315)
ab1f279 fix: more strict check for registry to avoid false positives (fix: more strict check for registry to avoid false positives #2222)
019c048 refactor(llm): promote decorator chain settings from NearAiConfig to top-level LlmConfig (refactor(llm): promote decorator chain settings from NearAiConfig to top-level LlmConfig #1749)
b6f5da8 perf(tunnel): reuse HTTP client in CustomTunnel health checks (perf(tunnel): reuse HTTP client in CustomTunnel health checks #1201)
fe2b134 fix(engine): guard consecutive-error checks against None limit (fix(engine): guard consecutive-error checks against None limit #2460)
37669e6 fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer ([QA] Pages Unresponsive dialog and black screen crashes #2406) (fix(web): prevent browser crash from timer leaks, DOM growth, SSE buffer (#2406) #2441)
1041094 style: fix cargo fmt line wrapping in thread_ops.rs (style: fix cargo fmt in thread_ops.rs #2451)
0a9d816 fix(responses-api): thread creation, GET by ID, streaming delta, context injection (fix(responses-api): thread creation, GET by ID, streaming delta, context injection #2167)
1fa73a4 docs: add Responses API section to USER_MANAGEMENT_API (docs: add Responses API reference #2440)
5140279 docs: guide how to host ironclaw on google cloud (docs: add google tutorial #2262)
d85c11d fix(telegram): route WASM owner_id fallback (fix(telegram): route WASM owner_id fallback #2349)
28c6a15 fix(ci): exclude test files from PR size classification (fix(ci): exclude test files from PR size classification #2387)
d63601e fix(security): gate test URL rewriters behind #[cfg(test)] (fixes SECURITY: Remove env-var-controlled API URL rewriters from production WASM channel wrapper (Telegram & Slack) #2056) (fix(security): gate test URL rewriters behind #[cfg(test)] (fixes #2056) #2401)
82d341d fix(sandbox): try Docker socket before CLI binary check (fix(sandbox): try Docker socket before CLI binary check #2467)
2dc78b2 feat(db): add per-user CachedSettingsStore decorator (feat(db): add per-user CachedSettingsStore decorator #2425)
8973d1b fix: use gateway owner_id for relay OAuth nonce storage (fix: use gateway owner_id for relay OAuth nonce storage #2473)
16a0731 test(e2e): add Playwright persistence happy-path test (test(e2e): add Playwright persistence happy-path test #2475)
ae1f698 fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction (fix(llm): map HTTP 413 to ContextLengthExceeded for auto-compaction #2339)
4353493 fix(gateway): resolve assistant thread for threadless broadcasts (fix(gateway): resolve assistant thread for threadless broadcasts #2444)
be0b33b fix: duplicate reasoning_content fields in chat completions response (fix: duplicate reasoning_content fields in chat completions response #2493)
62bb007 feat(tui): add multiline support and input clear (feat(tui): add multiline support and input clear #2449)
7206bf0 feat(gateway): rich tool cards in history + thread processing indicator (feat(gateway): rich tool cards in history + thread processing indicator #2477)
7008e9a feat(gate): persist "always approve" decisions to DB in v2 engine path (feat(gate): persist "always approve" decisions to DB in v2 engine path #2428)
427783d fix(engine): surface action errors to LLM with [ACTION FAILED] prefix (fix(engine): surface action errors to LLM with [ACTION FAILED] prefix #2326)
... and 22 more (see compare view)

Current commits in this promotion (1)

Current base: staging-promote/79ad2e38-24572736232
Current head: staging-promote/e65ba2e4-24575255629
Current range: origin/staging-promote/79ad2e38-24572736232..origin/staging-promote/e65ba2e4-24575255629

e65ba2e fix(engine): security hardening for v2 orchestrator and Monty sandbox (fix(engine): security hardening for v2 orchestrator and Monty sandbox #1958)

Auto-updated by staging promotion metadata workflow

Waiting for gates:

Tests: pending
E2E: pending
Claude Code review: pending (will post comments on this PR)

Auto-created by staging-ci workflow

@serrrfirat

…#1958) * fix(engine): security hardening for v2 orchestrator and Monty sandbox Address deferred security items C1/C2/C4/M2 from PR #1557 review: C1/C2 — Orchestrator self-modification approval gates: - memory_write tool now returns ApprovalRequirement::Always for protected orchestrator paths when ORCHESTRATOR_SELF_MODIFY=true, forcing human approval before any orchestrator or prompt overlay patch is written - Store adapter validates Python syntax via Monty parser before persisting orchestrator patches, preventing broken code from consuming failure budget - Content hash (SHA-256) stamped on all protected docs for audit trail C4 — Sandbox security test coverage (6 new tests): - sandbox_enforces_rlm_query_depth_limit: depth check at max recursion - sandbox_rejects_final_injection: FINAL() captures payload literally - sandbox_rejects_tool_name_injection: dynamic names can't bypass leases - sandbox_context_variable_is_not_mutable: Python mutations don't affect Rust - sandbox_handles_deep_recursion: infinite recursion terminates safely - validate_syntax_rejects_broken_code: syntax validation unit test M2 — Remove ownership-bypassing thread operations: - Delete stop_thread_system() and inject_message_system() — dead code with no callers, was a privilege escalation footgun. Ownership-checking variants (stop_thread/inject_message) are the only API now. Also: document child lease budget snapshot semantics (intentional design), add EngineError::InvalidInput variant, re-export validate_python_syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: collapse nested if per clippy suggestion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: cargo fmt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): address PR #1958 review findings Review feedback from Copilot and serrrfirat: 1. Use ORCHESTRATOR_FAILURES_TITLE constant consistently in both branches of the self-modify gate (was string literal in deny branch). 2. Normalize metadata to {} before stamping content_hash, so the audit trail is reliably present even for docs with null/non-object metadata. 3. Add 256KB size cap to validate_python_syntax() to prevent pathological inputs from causing CPU/memory pressure on the store write path. 4. Tighten sandbox_rejects_tool_name_injection test assertion to verify the expected "not_found" outcome, not just absence of "ESCAPED". 5. Change ApprovalRequirement::Always → UnlessAutoApproved for protected orchestrator writes. The v2 effect bridge maps Always to hard denial (LeaseDenied), making the self-modify path unusable. UnlessAutoApproved triggers the gate/pause flow so human approval is possible. 6. Extend is_protected_orchestrator_path() to cover physical workspace paths (engine/orchestrator/*) in addition to logical aliases. Prevents bypassing the approval gate by writing to the persisted file path. 7. Persist project_id and user_id in frontmatter (serialize_knowledge_doc) and restore them on load (deserialize_knowledge_doc). Previously all reloaded docs got project_id=nil and user_id="legacy", making them invisible to project-scoped queries after restart. 8. Synthesize MemoryDocs from raw .py orchestrator files on startup (synthesize_orchestrator_doc_from_py). Orchestrator versions are persisted as engine/orchestrator/v{N}.py but load_knowledge_docs could not parse them — they silently disappeared on restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): PR #1958 round-2 review — close path-bypass, forgeable-metadata, and rehydration holes Address the critical/high findings from @serrrfirat's second review on PR #1958. Every fix is paired with a caller-level regression test per the "Test Through the Caller" rule in .claude/rules/testing.md. ## Critical — path normalization bypass `is_protected_orchestrator_path` and `is_orchestrator_code_path` both did raw `starts_with` on user input. Paths like `engine/./orchestrator/v3.py`, `engine//orchestrator/v3.py`, and `engine/knowledge/../orchestrator/v3.py` resolved to the protected location but sidestepped the check, letting LLM-authored writes skip the approval gate *and* syntax validation. Added `normalize_workspace_path` (in tools/builtin/memory.rs) and `normalize_path` (in bridge/store_adapter.rs): strip `.` and empty segments, reject `..` traversal outright. Both guards now normalize first, then match against both `.system/engine/orchestrator/` and the legacy `engine/orchestrator/` prefix. Traversal attempts are treated as protected so they route through the approval gate — and `execute()` rejects them with `InvalidParameters` before reaching the workspace. ## High — forgeable `is_system_internal` metadata The gate previously trusted `doc.metadata.source == "compiled_in"` to mark a write as system-internal. An LLM tool call could craft a doc with that field and bypass both denial and validation. Replaced with a `tokio::task_local!`-backed trusted-write scope in the new `crates/ironclaw_engine/src/runtime/internal_write.rs` module: runtime::with_trusted_internal_writes(async { store.save_memory_doc(&seed).await }) The orchestrator v0 seeder in `MissionManager::seed_orchestrator_v0` enters the scope; the store gate reads `is_trusted_internal_write_active()`. Task-locals cannot be set from outside a trusted callsite and do not propagate across `tokio::spawn`, so an untrusted caller cannot inherit the flag. A unit test asserts the no-propagation property. ## High — rehydration rendered orchestrator invisible after restart `synthesize_orchestrator_doc_from_py` returned a doc with `ProjectId::nil()`, but `list_memory_docs` filters by exact project, so persisted `.py` orchestrator files disappeared from project-scoped queries after a process restart and the runtime silently reverted to compiled-in defaults. Override `HybridStore::list_shared_memory_docs` to surface docs flagged as "physically global" (by title — orchestrator, failure tracker, prompt overlay) for any project query, regardless of the stored project_id. Physical storage is one file per workspace; the override reflects that. New end-to-end round-trip test writes an orchestrator via the store, rebuilds a fresh HybridStore from the same workspace, and asserts the rehydrated doc is visible to both the original project and an unrelated project. ## High — env var read on every gate check Both `memory_write::requires_approval` and `save_memory_doc` read `ORCHESTRATOR_SELF_MODIFY` from the environment on every call. Env vars are global mutable state; a future sandbox escape could flip a security gate mid-flight. Centralized in `runtime::self_modify_enabled()`: a process-wide `OnceLock<bool>` seeded on first read. Tool, store, engine loop, and self-improvement mission all share the same snapshot. Tests need to flip the flag, so the module also exposes a `SelfModifyTestGuard` that overrides the snapshot and serializes concurrent tests via a process-wide `Mutex`. The override layer is compiled out of release builds (`cfg(debug_assertions)`). ## High — PR description / code mismatch + hard denial regression Original PR said `ApprovalRequirement::Always`, code used `UnlessAutoApproved`. The `Always` path is mapped by the v2 effect bridge to `LeaseDenied` (permanent refusal), not to a resumable approval gate. Fixed the code to `UnlessAutoApproved` (already in the previous round), but added an end-to-end regression test in `effect_adapter.rs` that drives `EffectBridgeAdapter::execute_action` with the real `MemoryWriteTool` and asserts the protected target produces `GatePaused(Approval)` — not `LeaseDenied`. Sibling test asserts that with self-modify disabled, the write surfaces as a non-resumable refusal so the agent doesn't loop on an unreachable approval. ## Medium — audit-hash scope The content hash stamped on protected docs is **write-time audit only**: the workspace file is the trust boundary, and anyone with workspace access can edit the raw file bypassing save_memory_doc. Documented this explicitly in `save_memory_doc` so future readers don't mistake it for a runtime integrity check. Also cleaned up the clippy `map_for_value` nit by switching to `if let Some(map) = ...`. ## Test coverage (all caller-level, new) Unit tests (41 new): - `memory.rs` path normalization & protected-path guard (dot-segment bypass, double-slash bypass, traversal, legacy path, canonical path, logical alias, unrelated path) — 9 cases - `memory.rs` requires_approval branches (enabled + protected, disabled + protected, physical path, dot bypass, traversal bypass, unprotected, missing target) — 7 cases - `store_adapter.rs` normalize_path, is_orchestrator_code_path, synthesize_orchestrator_doc_from_py, validate_orchestrator_content, is_protected_orchestrator_doc, is_globally_shared — 23 cases - `internal_write.rs` trusted-write scope semantics — 3 cases - `list_shared_memory_docs` override surfaces global vs project-scoped docs correctly — 2 cases Integration tests (11 new, libsql feature): - `dispatch.rs::integration_tests` — drives `ToolDispatcher::dispatch()` against the real `MemoryWriteTool` for all bypass paths (protected alias, physical path, dot segment, double slash, traversal, unprotected baseline) — 6 cases - `store_adapter.rs::migration_tests::orchestrator_py_round_trips_through_restart` — full write → restart → load → cross-project query cycle - `store_adapter.rs::migration_tests::knowledge_md_doc_round_trips_project_id_and_user_id` — asserts frontmatter `project_id`/`user_id` survive restart (was previously dropped, making docs invisible to project queries) - `store_adapter.rs::migration_tests::invalid_python_orchestrator_is_rejected_at_write_time` — validator gate fires before persistence - `effect_adapter.rs::tests::memory_write_orchestrator_target_paused_for_approval_when_self_modify_enabled` — the UnlessAutoApproved regression test - `effect_adapter.rs::tests::memory_write_orchestrator_target_refused_when_self_modify_disabled` — asserts no gate pauses when self-modify is off ## Quality gate - `cargo fmt` — clean - `cargo clippy -p ironclaw --lib --tests --features libsql` — 0 warnings - `cargo clippy -p ironclaw_engine --all-targets` — 0 warnings (crate-local) - `cargo test -p ironclaw_engine` — 358/358 pass - `cargo test -p ironclaw --lib --features libsql` — 4735/4735 pass - `cargo test --test engine_v2_gate_integration --test engine_v2_skill_codeact` — 27/27 pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): round-3 review — syntax validation in memory_write, Always-approval gate, global doc visibility Critical: MemoryWriteTool::execute() now validates Python syntax for protected .py paths before writing to workspace (was bypassing the Store-level validator entirely). High: ApprovalRequirement::Always now produces GatePaused(Approval) with allow_always=false instead of hard LeaseDenied; memory_write returns Always (not UnlessAutoApproved) for protected paths so session auto-approve cannot silently skip the gate. High: Global docs (orchestrator, failures, prompt overlay) have project_id normalized to nil on save so they surface immediately from any project query, not just after restart. Medium: Traversal paths no longer trigger spurious approval gates — requires_approval returns Never for normalization failures so execute() rejects them immediately as InvalidParameters. Medium: Dead code removed from is_orchestrator_code_path (equality checks unreachable after .py suffix requirement). Medium: Document prompt overlay validation skip and syntax validation threat model; expanded validate_python_syntax test coverage (size cap, empty input, unicode, error format). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): update deny.toml — remove stale wasmtime advisories, add rand 0.8.5 Wasmtime was upgraded and no longer triggers RUSTSEC-2025-0046, RUSTSEC-2025-0118, RUSTSEC-2026-0020, RUSTSEC-2026-0021. The stale ignores caused cargo-deny to fail with "advisory was not encountered". Added RUSTSEC-2026-0097 (rand 0.8.5 unsound aliased mutable ref in ThreadRng during reseed from custom logger) — transitive dep via monty/wasmtime; upgrade to rand 0.9+ tracked separately. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): path-boundary check in requires_approval, fix stale docstring Address two Copilot review comments: - requires_approval used raw starts_with on normalized path, matching unrelated paths like orchestrator_backup/. Added path-boundary checks. - Updated stale docstring on the Always-approval gate test to reflect current behavior (Always→GatePaused, not UnlessAutoApproved). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): deduplicate requires_approval gate, move syntax validation after param checks - requires_approval now delegates to is_protected_orchestrator_path instead of duplicating the matching logic (reviewer concern about drift between the two checks) - Syntax validation for protected .py paths moved after patch-mode parameter validation (empty old_string, missing new_string) so an empty-string replace can't create a huge intermediate string before being rejected [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(engine): address PR #1958 round-4 review findings 5 issues from serrrfirat review (2026-04-13): 1. High — clamp `always` in `resolve_gate` against `pending.resume_kind` so a caller-supplied `always: true` can no longer install a session- wide auto-approval on an `Approval { allow_always: false }` gate (orchestrator self-modify writes). Extracted to `clamp_always_to_ resume_kind` with unit tests covering all three ResumeKind variants. 2. Medium — expand `validate_python_syntax` rustdoc to document that `MontyRun::new()` parses and prepares only (no heap/namespaces, no module-level execution) and explain the 256 KB size cap as a bound on parser allocation, not an execution-time safeguard. 3. Medium — remove the `doc.title == ORCHESTRATOR_FAILURES_TITLE` shortcut from the store-adapter self-modify gate. The two legitimate callers (`record_orchestrator_failure`, `reset_orchestrator_failures`) now wrap their `save_memory_doc` in `with_trusted_internal_writes`. Additionally reject untrusted writes to the failures title regardless of self-modify state, since no LLM-reachable code path should ever persist the system-internal tracker. 4. Medium — add a parity test asserting `normalize_path` (store adapter) and `normalize_workspace_path` (memory tool) agree on a canonical input set. Shared extraction isn't clean across the bridge/tool boundary; the test is the lighter guard against drift on this security boundary. 5. Medium — tighten `MemoryWriteTool::requires_approval` commentary with an explicit cross-reference to the `execute()` rejection site (~line 444) and a load-bearing invariant warning so a future refactor that weakens `execute()` will also be forced to flip this branch to `Always`. Also collapse a clippy::collapsible_if that appeared in the traversal gate check. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ironclaw-ci bot added the staging-promotion label Apr 17, 2026

github-actions bot added scope: tool/builtin Built-in tools size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 17, 2026

Base automatically changed from staging-promote/79ad2e38-24572736232 to main April 18, 2026 01:00

henrypark133 merged commit e65ba2e into main Apr 18, 2026
39 of 48 checks passed

henrypark133 deleted the staging-promote/e65ba2e4-24575255629 branch April 18, 2026 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: promote staging to staging-promote/79ad2e38-24572736232 (2026-04-17 16:19 UTC)#2587

chore: promote staging to staging-promote/79ad2e38-24572736232 (2026-04-17 16:19 UTC)#2587
henrypark133 merged 1 commit intomainfrom
staging-promote/e65ba2e4-24575255629

ironclaw-ci bot commented Apr 17, 2026 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ironclaw-ci bot commented Apr 17, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-promotion from staging CI

Commits in this batch (72):

Current commits in this promotion (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ironclaw-ci bot commented Apr 17, 2026 •

edited by github-actions bot

Loading