fix(engine): security hardening for v2 orchestrator and Monty sandbox by ilblackdragon · Pull Request #1958 · nearai/ironclaw

ilblackdragon · 2026-04-03T07:21:24Z

Summary

C1/C2: Approval gate + syntax validation for orchestrator self-modification when ORCHESTRATOR_SELF_MODIFY=true
C4: 6 sandbox security negative tests (rlm_query depth, FINAL injection, tool name injection, context mutation, deep recursion, syntax validation)
M2: Remove dead stop_thread_system/inject_message_system methods that bypassed ownership checks
Round 2: Close path-normalization bypass, forgeable-metadata escalation, and orchestrator rehydration holes surfaced in @serrrfirat's review
Document child lease budget snapshot semantics, add EngineError::InvalidInput variant

Addresses deferred security items from #1557 review. Related: #1956 (incident logging).

C1/C2 — Orchestrator approval gates (when self-modify is enabled)

Layer A — memory_write tool returns ApprovalRequirement::UnlessAutoApproved for protected orchestrator paths. The v2 effect bridge maps this to a resumable GatePaused(Approval) so human approval can pause + resume the write. (Earlier rounds used Always, which the bridge mapped to LeaseDenied — a permanent refusal that made the gate unusable; fixed.)

Layer B — Store adapter validates Python syntax via Monty parser before persisting orchestrator patches. Broken code is rejected upfront instead of consuming failure-budget slots (3 failures trigger auto-rollback). Content hash (SHA-256) stamped on all protected docs for write-time audit (not runtime integrity — the workspace file is the trust boundary).

Round 2 — Path bypass, forgeable metadata, rehydration, and env-var fragility

Path normalization bypass (critical): is_protected_orchestrator_path and is_orchestrator_code_path did raw starts_with, so engine/./orchestrator/v3.py, engine//orchestrator/v3.py, and engine/knowledge/../orchestrator/v3.py slipped past the gate. Both guards now normalize first (strip ./empty segments, reject .. traversal) and match against both .system/engine/orchestrator/ and the legacy engine/orchestrator/ prefix.
Forgeable is_system_internal (high): the gate previously exempted any doc whose metadata.source == "compiled_in". An LLM tool call could craft that field and bypass denial + validation. Replaced with a tokio::task_local!-backed trusted-write scope (ironclaw_engine::runtime::with_trusted_internal_writes) that MissionManager::seed_orchestrator_v0 enters; the store gate reads the task-local, which cannot be set from outside a trusted callsite and does not propagate across tokio::spawn.
Rehydration invisibility (high): synthesize_orchestrator_doc_from_py returned docs with ProjectId::nil(), but list_memory_docs filters by exact project, so persisted .py orchestrator files disappeared from project-scoped queries after restart. Overrode HybridStore::list_shared_memory_docs to surface globally-stored docs (orchestrator, failure tracker, prompt overlay) for any project query, matching the physically-global storage layout.
Env var read per call (high): ORCHESTRATOR_SELF_MODIFY was read on every gate check. Centralized in runtime::self_modify_enabled() — a process-wide OnceLock<bool> seeded on first read; tool, store, engine loop, and self-improvement mission all share the same snapshot. SelfModifyTestGuard provides a serialized dev-build override for tests (compiled out of release builds).
Frontmatter project_id / user_id loss: fixed by persisting both fields in serialize_knowledge_doc and parsing them in deserialize_knowledge_doc. Round-trip test in store_adapter::migration_tests::knowledge_md_doc_round_trips_project_id_and_user_id.

C4 — Sandbox security tests

Test	What it proves
`sandbox_enforces_rlm_query_depth_limit`	Depth check at max recursion refuses with clear error
`sandbox_rejects_final_injection`	FINAL() captures payload literally, no code execution
`sandbox_rejects_tool_name_injection`	Dynamic name construction can't bypass lease enforcement
`sandbox_context_variable_is_not_mutable`	Python-side mutations don't propagate to Rust thread state
`sandbox_handles_deep_recursion`	Infinite recursion terminates safely (no crash/hang)
`validate_syntax_rejects_broken_code`	Syntax validation unit test

M2 — Remove ownership-bypassing thread operations

stop_thread_system() and inject_message_system() had zero callers anywhere in the codebase — dead code that was a privilege escalation footgun. Removed entirely; ownership-checking variants are the only API.

Round 2 test coverage (caller-level, per `.claude/rules/testing.md`)

Unit tests (41 new):

path normalization & protected-path guard (9 cases), requires_approval branches (7 cases), store-adapter helper predicates + validate_orchestrator_content + synthesize_orchestrator_doc_from_py (23 cases), trusted-write scope semantics (3 cases), list_shared_memory_docs override (2 cases)

Integration tests (11 new, libsql feature):

tools::dispatch::integration_tests — drives ToolDispatcher::dispatch() against the real MemoryWriteTool for every bypass vector (logical alias, physical path, dot segment, double slash, traversal, unprotected baseline)
bridge::store_adapter::migration_tests::orchestrator_py_round_trips_through_restart — full write → restart → load → cross-project query cycle
bridge::store_adapter::migration_tests::knowledge_md_doc_round_trips_project_id_and_user_id — frontmatter project/user survive restart
bridge::store_adapter::migration_tests::invalid_python_orchestrator_is_rejected_at_write_time — validator gate fires before persistence
bridge::effect_adapter::tests::memory_write_orchestrator_target_paused_for_approval_when_self_modify_enabled — asserts UnlessAutoApproved produces GatePaused(Approval), not LeaseDenied (the round-1 miss)
bridge::effect_adapter::tests::memory_write_orchestrator_target_refused_when_self_modify_disabled — asserts no gate pauses when self-modify is off

Other

Document child lease budget snapshot semantics in derive_child_leases (intentional: independent budgets avoid cross-thread contention)
Add EngineError::InvalidInput variant for validation failures
Re-export validate_python_syntax from executor module

Test plan

cargo fmt — clean
cargo clippy -p ironclaw --lib --tests --features libsql — 0 warnings
cargo clippy -p ironclaw_engine --all-targets — 0 warnings
cargo test -p ironclaw_engine — 358/358 pass
cargo test -p ironclaw --lib --features libsql — 4735/4735 pass
cargo test --test engine_v2_gate_integration --test engine_v2_skill_codeact — 27/27 pass

🤖 Generated with Claude Code

Address deferred security items C1/C2/C4/M2 from PR #1557 review: C1/C2 — Orchestrator self-modification approval gates: - memory_write tool now returns ApprovalRequirement::Always for protected orchestrator paths when ORCHESTRATOR_SELF_MODIFY=true, forcing human approval before any orchestrator or prompt overlay patch is written - Store adapter validates Python syntax via Monty parser before persisting orchestrator patches, preventing broken code from consuming failure budget - Content hash (SHA-256) stamped on all protected docs for audit trail C4 — Sandbox security test coverage (6 new tests): - sandbox_enforces_rlm_query_depth_limit: depth check at max recursion - sandbox_rejects_final_injection: FINAL() captures payload literally - sandbox_rejects_tool_name_injection: dynamic names can't bypass leases - sandbox_context_variable_is_not_mutable: Python mutations don't affect Rust - sandbox_handles_deep_recursion: infinite recursion terminates safely - validate_syntax_rejects_broken_code: syntax validation unit test M2 — Remove ownership-bypassing thread operations: - Delete stop_thread_system() and inject_message_system() — dead code with no callers, was a privilege escalation footgun. Ownership-checking variants (stop_thread/inject_message) are the only API now. Also: document child lease budget snapshot semantics (intentional design), add EngineError::InvalidInput variant, re-export validate_python_syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR hardens the engine v2 self-modifying orchestrator and Monty sandbox by adding explicit approval gating for orchestrator/prompt writes, validating orchestrator patch syntax before persistence, expanding negative security tests, and removing ownership-bypassing thread operations.

Changes:

Add approval requirement logic for memory_write when targeting protected orchestrator/prompt docs under ORCHESTRATOR_SELF_MODIFY=true.
Validate orchestrator patch Python syntax before persisting; stamp protected docs with a SHA-256 content hash for audit.
Expand Monty sandbox negative tests and remove bypass-style *_system thread operations; document child-lease budget snapshot semantics and add EngineError::InvalidInput.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/tools/builtin/memory.rs`	Adds approval gating for writes to protected orchestrator/prompt targets.
`src/bridge/store_adapter.rs`	Adds orchestrator syntax validation, protected-doc hashing, and self-modify enforcement in the store adapter.
`crates/ironclaw_engine/src/types/error.rs`	Introduces `EngineError::InvalidInput` for validation failures.
`crates/ironclaw_engine/src/runtime/manager.rs`	Removes ownership-bypassing thread stop/inject APIs.
`crates/ironclaw_engine/src/executor/scripting.rs`	Adds `validate_python_syntax()` and multiple sandbox security negative tests.
`crates/ironclaw_engine/src/executor/mod.rs`	Re-exports `validate_python_syntax`.
`crates/ironclaw_engine/src/capability/lease.rs`	Documents child lease budget snapshot semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T07:32:12Z

+                // Self-modification is enabled — validate content before persisting.
+                // Skip validation for system-internal docs (failure tracker, compiled-in seed).
+                let is_system_internal = doc.title == ORCHESTRATOR_FAILURES_TITLE
+                    || doc
+                        .metadata


In this self-modify gate, the allow=false branch checks doc.title == "orchestrator:failures" while the allow=true branch uses ORCHESTRATOR_FAILURES_TITLE. Using the constant consistently in both branches avoids drift and makes the policy easier to audit.

Fixed in 2bdd5c0 — replaced string literal with ORCHESTRATOR_FAILURES_TITLE constant.

Addressed in 2bdd5c0 (round 1).

Copilot · 2026-04-03T07:32:12Z

+            stamped
+                .metadata
+                .as_object_mut()
+                .map(|m| m.insert("content_hash".into(), serde_json::Value::String(hash)));


content_hash stamping is best-effort because it only runs when stamped.metadata is already a JSON object. If legacy docs can have metadata: null (or non-object), the hash silently won’t be recorded. Consider normalizing metadata to {} when stamping protected docs so the audit trail is reliably present.

Suggested change

stamped

.metadata

.as_object_mut()

.map(|m| m.insert("content_hash".into(), serde_json::Value::String(hash)));

if !stamped.metadata.is_object() {

stamped.metadata = serde_json::Value::Object(serde_json::Map::new());

}

stamped

.metadata

.as_object_mut()

.expect("metadata normalized to object before stamping content_hash")

.insert("content_hash".into(), serde_json::Value::String(hash));

Fixed in 2bdd5c0 — metadata is now normalized to {} before stamping the content hash.

Addressed in 2bdd5c0 (round 1).

Copilot · 2026-04-03T07:32:12Z

+        // Stamp a content hash for audit trail on all protected docs.
+        if is_protected_orchestrator_doc(doc) {
+            use sha2::{Digest, Sha256};
+            let hash = format!("{:x}", Sha256::digest(doc.content.as_bytes()));
+            stamped
+                .metadata
+                .as_object_mut()
+                .map(|m| m.insert("content_hash".into(), serde_json::Value::String(hash)));
+        }


The stamped hash is computed over the full doc.content, but at runtime prompt overlays are truncated (see MAX_PROMPT_OVERLAY_CHARS in the engine prompt builder). For prompt overlays, this can make the audit hash differ from the effective content used in execution. Either enforce the same size cap at write-time for prompt docs or hash the truncated form that will actually be used.

The hash is an audit trail recording what was written to the store, not a runtime integrity check. Hashing the full content is the correct behavior — it lets you verify what was persisted matches what was submitted. The runtime truncation is a display concern, not a storage concern.

Copilot · 2026-04-03T07:32:13Z

+pub fn validate_python_syntax(code: &str) -> Result<(), String> {
+    let code_owned = code.to_string();
+    match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
+        MontyRun::new(code_owned, "validate.py", vec![])
+    })) {


validate_python_syntax() parses unbounded input without any size cap, and it allocates code.to_string() before parsing. Because this runs on the store write-path for orchestrator patches, a very large patch could cause avoidable CPU/memory pressure (even though no code is executed). Consider enforcing a maximum length for orchestrator source (or rejecting/streaming above a threshold) before invoking Monty’s parser.

Fixed in 2bdd5c0 — added a 256KB size cap (MAX_ORCHESTRATOR_SOURCE_BYTES) before invoking the Monty parser. The compiled-in default is ~2KB so this is generous but prevents pathological inputs.

Addressed in 2bdd5c0 (round 1).

Copilot · 2026-04-03T07:32:13Z

+        assert!(
+            !answer.starts_with("ESCAPED"),
+            "dynamic name construction should not bypass leases, got: {answer}",


sandbox_rejects_tool_name_injection is very permissive: it only asserts the final answer doesn’t start with "ESCAPED". As written it can pass for reasons unrelated to lease enforcement (e.g., the name lookup returns None, or globals().get() returns a non-callable placeholder). To make this a stronger regression test, assert the expected outcome explicitly (e.g., final_answer == "not_found" or an error string that indicates denial by lease/policy).

Suggested change

assert!(

!answer.starts_with("ESCAPED"),

"dynamic name construction should not bypass leases, got: {answer}",

assert_eq!(

answer,

"not_found",

"dynamic name construction should resolve to no callable tool and must not bypass leases, got: {answer}",

Fixed in 2bdd5c0 — added a second assertion verifying the expected outcome (not_found or blocked:*), not just absence of ESCAPED.

Addressed in 2bdd5c0 (round 1).

Copilot · 2026-04-03T07:32:13Z

+    fn requires_approval(&self, params: &serde_json::Value) -> ApprovalRequirement {
+        // When orchestrator self-modification is enabled, writing to protected
+        // paths (orchestrator code, prompt overlays) always requires explicit
+        // human approval — even if the session has auto-approve enabled.
+        let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
+        if is_protected_orchestrator_path(target) {
+            let self_modify_enabled = std::env::var("ORCHESTRATOR_SELF_MODIFY")
+                .map(|v| v == "true" || v == "1")
+                .unwrap_or(false);
+            if self_modify_enabled {
+                return ApprovalRequirement::Always;
+            }
+            // When disabled, execute() returns NotAuthorized before any write.
+        }
+        ApprovalRequirement::Never
+    }


memory_write::requires_approval() returns ApprovalRequirement::Always for protected orchestrator targets when ORCHESTRATOR_SELF_MODIFY is enabled. In the engine v2 bridge (EffectBridgeAdapter::execute_action), ApprovalRequirement::Always is currently treated as a hard denial (it returns EngineError::LeaseDenied rather than pausing for interactive approval), so this may inadvertently make orchestrator patch writes impossible even with human approval. If protected writes are meant to be approvable (just not auto-approvable), the v2 adapter/gate logic needs to pause on Always instead of denying, or this tool should use a requirement variant that the adapter can pause on.

Fixed in 2bdd5c0 — changed to ApprovalRequirement::UnlessAutoApproved. The v2 effect bridge pauses on this variant (producing a resumable GatePaused approval gate) rather than hard-denying. This makes the protected self-modify path usable with human approval.

gemini-code-assist

Code Review

This pull request introduces security enhancements and validation logic for orchestrator self-modification. Key changes include the implementation of a Python syntax validator to prevent broken code from being persisted, the addition of content hashing for audit trails on protected documents, and a mandatory human approval requirement for modifications to orchestrator paths. Furthermore, the PR adds comprehensive sandbox security tests covering recursion limits, injection prevention, and state mutability, while removing several system-level thread management methods. I have no feedback to provide.

serrrfirat

Posting four review findings from the local audit.

serrrfirat · 2026-04-03T10:00:40Z

+                .map(|v| v == "true" || v == "1")
+                .unwrap_or(false);
+            if self_modify_enabled {
+                return ApprovalRequirement::Always;


High Severity

This ApprovalRequirement::Always is not approvable in engine v2 today. EffectBridgeAdapter::execute_action() still maps Always to EngineError::LeaseDenied, and only UnlessAutoApproved becomes GatePaused(Approval). That means with ORCHESTRATOR_SELF_MODIFY=true, a call like memory_write(target="prompt:codeact_preamble", ...) fails outright instead of pausing for human approval, so the protected self-modify path is unusable.

Suggested fix: either return UnlessAutoApproved here and rely on the approval gate to suppress the allow always option for these targets, or update the v2 effect bridge so Always produces a resumable approval gate instead of a hard denial.

Fixed in 2bdd5c0 — changed to UnlessAutoApproved. The v2 effect bridge maps this to a resumable GatePaused(Approval) gate, so the user sees an approval prompt and can approve/deny. Always was incorrectly hard-denying via LeaseDenied.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

serrrfirat · 2026-04-03T10:00:40Z

+        // When orchestrator self-modification is enabled, writing to protected
+        // paths (orchestrator code, prompt overlays) always requires explicit
+        // human approval — even if the session has auto-approve enabled.
+        let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");


Medium Severity

This check only protects the logical aliases (prompt:* / orchestrator:*). The persisted prompt overlay actually lives at engine/orchestrator/codeact-preamble-overlay.md, and memory_write still accepts arbitrary workspace paths. With self-modify enabled, writing that real path avoids the new approval gate while still mutating the runtime overlay.

Suggested fix: canonicalize target through the same engine-doc path mapping used by the store adapter, and treat the persisted engine paths as protected too.

Fixed in 2bdd5c0 — is_protected_orchestrator_path() now also matches physical workspace paths (engine/orchestrator/*). Writing to the persisted file path directly will now trigger the same approval gate.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

serrrfirat · 2026-04-03T10:00:40Z


-        self.docs.write().await.insert(doc.id, doc.clone());
-        self.persist_doc(doc).await;
+        let mut stamped = doc.clone();


High Severity

This persists frontmatter-based engine docs, but the restart path currently reconstructs them with project_id = nil and user_id = "legacy" in deserialize_knowledge_doc(). list_memory_docs() later filters by the real project/user pair, so after load_state_from_workspace() these persisted markdown docs disappear from list_shared_memory_docs(project_id) / list_memory_docs(...) and stop participating in prompt/orchestrator loading.

Suggested fix: persist project/user in frontmatter, or restore them during load_knowledge_docs() before inserting docs into self.docs. A restart round-trip test through HybridStore::load_state_from_workspace() would catch this.

Fixed in 2bdd5c0 — serialize_knowledge_doc() now persists project_id and user_id in frontmatter, and deserialize_knowledge_doc() restores them on load. Legacy docs without these fields fall back to nil/"legacy" for backward compatibility.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

serrrfirat · 2026-04-03T10:00:40Z

+        }
+
+        self.docs.write().await.insert(stamped.id, stamped.clone());
+        self.persist_doc(&stamped).await;


High Severity

This write path persists runtime orchestrator versions as raw engine/orchestrator/vN.py, but startup only reconstructs docs from frontmatter markdown or JSON. Those .py files get skipped by load_knowledge_docs(), so saved orchestrator patches do not survive a process restart and load_orchestrator() falls back to the compiled-in default.

Suggested fix: either persist orchestrator versions in a loadable MemoryDoc format, or synthesize MemoryDocs from the vN.py files during startup. This also needs a restart persistence test.

Fixed in 2bdd5c0 — added synthesize_orchestrator_doc_from_py() which reconstructs MemoryDocs from raw .py files during load_knowledge_docs(). Extracts the version number from the filename (v{N}.py) and creates a doc with the correct title, tag, and version metadata so load_orchestrator_from_docs() finds it on restart.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

Review feedback from Copilot and serrrfirat: 1. Use ORCHESTRATOR_FAILURES_TITLE constant consistently in both branches of the self-modify gate (was string literal in deny branch). 2. Normalize metadata to {} before stamping content_hash, so the audit trail is reliably present even for docs with null/non-object metadata. 3. Add 256KB size cap to validate_python_syntax() to prevent pathological inputs from causing CPU/memory pressure on the store write path. 4. Tighten sandbox_rejects_tool_name_injection test assertion to verify the expected "not_found" outcome, not just absence of "ESCAPED". 5. Change ApprovalRequirement::Always → UnlessAutoApproved for protected orchestrator writes. The v2 effect bridge maps Always to hard denial (LeaseDenied), making the self-modify path unusable. UnlessAutoApproved triggers the gate/pause flow so human approval is possible. 6. Extend is_protected_orchestrator_path() to cover physical workspace paths (engine/orchestrator/*) in addition to logical aliases. Prevents bypassing the approval gate by writing to the persisted file path. 7. Persist project_id and user_id in frontmatter (serialize_knowledge_doc) and restore them on load (deserialize_knowledge_doc). Previously all reloaded docs got project_id=nil and user_id="legacy", making them invisible to project-scoped queries after restart. 8. Synthesize MemoryDocs from raw .py orchestrator files on startup (synthesize_orchestrator_doc_from_py). Orchestrator versions are persisted as engine/orchestrator/v{N}.py but load_knowledge_docs could not parse them — they silently disappeared on restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

serrrfirat · 2026-04-06T16:59:17Z

I think the new .py orchestrator recovery path still misses the restarted-runtime case because the synthesized doc is created with project_id = nil, but the later lookup path still filters by the real project id.

Concrete path:

synthesize_orchestrator_doc_from_py() creates a MemoryDoc with project_id: ProjectId(uuid::Uuid::nil())
Store::list_shared_memory_docs(project_id) delegates to list_memory_docs(project_id, shared_owner_id())
HybridStore::list_memory_docs() then filters with doc.project_id == project_id && doc.user_id == user_id
LoopEngine::resolve_orchestrator() loads shared orchestrators via list_shared_memory_docs(self.thread.project_id)

So after restart, the synthesized runtime orchestrator docs still will not be returned for the real project, and the engine appears to fall back to the built-in orchestrator instead of the persisted .py version.

If the intent is for shared orchestrator docs to survive restart, I think this needs either:

preserving/deriving the real project id when synthesizing the doc, or
a corresponding store-query change so shared orchestrator docs are not filtered out by the nil project_id.

A restart regression test around the .py synthesis + reload path would also help lock this in.

serrrfirat · 2026-04-07T07:11:07Z

        || path.starts_with("prompt:")
+        // Physical workspace paths (match the paths in store_adapter::doc_workspace_path)
+        || path.starts_with("engine/orchestrator/")
+        || path == "engine/orchestrator"


Medium Severity — Path normalization bypass in orchestrator path guard

is_protected_orchestrator_path uses raw starts_with("engine/orchestrator/") on the user-supplied target parameter. Paths with dot-components or double slashes bypass this check while resolving to the same filesystem location:

engine/./orchestrator/v3.py → does NOT start with engine/orchestrator/ → bypasses check

engine//orchestrator/v3.py → same bypass

OS normalizes both to engine/orchestrator/v3.py on disk

On filesystem-backed workspaces, a prompt-injected LLM could write orchestrator code even with ORCHESTRATOR_SELF_MODIFY=false, because execute() uses the same unguarded check (line ~342).

Suggested fix — normalize before checking:

fn normalize_path(path: &str) -> String { path.split('/') .filter(|s| !s.is_empty() && *s != ".") .collect::<Vec<_>>() .join("/") } fn is_protected_orchestrator_path(path: &str) -> bool { let norm = normalize_path(path); norm.starts_with("orchestrator:") || norm.starts_with("prompt:") || norm.starts_with("engine/orchestrator/") || norm == "engine/orchestrator" }

Note: .. components should also be collapsed if the workspace doesn't reject them.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

serrrfirat · 2026-04-07T07:11:11Z

+                .map(|v| v == "true" || v == "1")
+                .unwrap_or(false);
+            if self_modify_enabled {
+                return ApprovalRequirement::UnlessAutoApproved;


Medium Severity — PR description says Always but code returns UnlessAutoApproved

The PR summary states:

Layer A — memory_write tool returns ApprovalRequirement::Always for protected orchestrator paths

But the code returns UnlessAutoApproved (line 297). These have different semantics:

Always → approval required every time, even when session auto-approve is enabled

UnlessAutoApproved → can be silently bypassed when a session has auto-approve on

The code comment at line 289 explains the intent well ("so the v2 engine gate system can pause for approval rather than hard-denying the call"), but the PR description is misleading for reviewers.

Suggested fix: Either update the PR description to say UnlessAutoApproved, or change to Always if the intent is to never skip approval for orchestrator self-modification.

PR description updated. Code now uses ApprovalRequirement::Always (as of 37a67f3) with the effect bridge mapping Always to GatePaused(Approval { allow_always: false }) instead of LeaseDenied. So the protected self-modify path always pauses for human approval and cannot be session-auto-approved.

serrrfirat · 2026-04-07T07:11:17Z

+/// (3 failures trigger auto-rollback). Runtime sandbox (Monty) handles all
+/// security enforcement — no blocklist here.
+fn validate_orchestrator_content(doc: &MemoryDoc) -> Result<(), EngineError> {
+    if doc.title.starts_with("orchestrator:")


Medium Severity — Missing unit tests for new store adapter functions

Four new functions in this file have no dedicated unit tests:

validate_orchestrator_content() (line 677) — should test: valid Python passes, broken Python rejected, failure-tracker title (orchestrator:failures) is exempt, oversized content handling

synthesize_orchestrator_doc_from_py() (line 637) — should test: valid v3.py path, non-orchestrator prefix returns None, non-numeric version (vfoo.py) returns None, edge case v0.py

Content hash stamping (~line 800) — should verify hash is present and correct on saved protected docs, and absent on non-protected docs

Frontmatter roundtrip with new fields — project_id and user_id were added to serialization (line 791) and deserialization (line 897). A roundtrip test should verify these survive serialize → deserialize

Per project rules (.claude/rules/review-discipline.md): "Regression test with every fix."

serrrfirat · 2026-04-07T07:11:25Z

        })
    }

+    fn requires_approval(&self, params: &serde_json::Value) -> ApprovalRequirement {


Medium Severity — No unit test for requires_approval approval gate

This new method has three meaningful code paths, none of which are tested:

Target ORCHESTRATOR_SELF_MODIFY Expected result

"orchestrator:main" "true" UnlessAutoApproved

"engine/orchestrator/v3.py" unset / "false" Never (defers to execute() block)

"daily_log" any Never

Suggested test:

#[test] fn requires_approval_gates_orchestrator_paths() { let tool = MemoryWriteTool::new(/* ... */); // Non-orchestrator path — never needs approval let params = serde_json::json!({"target": "daily_log"}); assert!(matches!(tool.requires_approval(&params), ApprovalRequirement::Never)); // Orchestrator path, self-modify disabled — no gate (execute blocks) std::env::remove_var("ORCHESTRATOR_SELF_MODIFY"); let params = serde_json::json!({"target": "orchestrator:main"}); assert!(matches!(tool.requires_approval(&params), ApprovalRequirement::Never)); // Orchestrator path, self-modify enabled — requires approval std::env::set_var("ORCHESTRATOR_SELF_MODIFY", "true"); assert!(matches!(tool.requires_approval(&params), ApprovalRequirement::UnlessAutoApproved)); std::env::remove_var("ORCHESTRATOR_SELF_MODIFY"); }

(Note: env var mutation in tests needs serialization via #[serial] or temp_env crate.)

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

ilblackdragon · 2026-04-08T08:39:00Z

Code Review

Verdict: Mergeable after spec/code mismatch + rehydration hole are fixed

399/-44 across 7 files. Adds (a) approval gate + Python syntax pre-validation for orchestrator self-modification (ORCHESTRATOR_SELF_MODIFY=true), (b) SHA-256 content-hash stamping on protected docs, (c) .py reconstitution on restart, (d) frontmatter project_id/user_id round-tripping, (e) removal of dead stop_thread_system/inject_message_system ownership-bypass methods, and (f) 6 new Monty sandbox negative tests.

Scope clarification

The PR title is misleading. Does not touch src/orchestrator/ (sandbox HTTP API), src/sandbox/ (Docker), network proxy allowlist, container isolation, credential injection, or per-job tokens. "Orchestrator" here = engine v2 self-modifying agent loop; "Monty sandbox" = the in-process Python interpreter, not Docker. Suggest renaming to "engine v2 self-modify gates + Monty negative tests" before merge.

Do hardening measures close claimed gaps? Mostly yes, with caveats.

🚨 Verdict mismatch — MemoryWriteTool::requires_approval returns UnlessAutoApproved (src/tools/builtin/memory.rs:296), but the PR description claims ApprovalRequirement::Always. A session that auto-approves memory_write (or auto_approve_all) silently bypasses the gate. For self-modification of the agent loop, Always is the right choice — fix the code. Also, requires_approval reads ORCHESTRATOR_SELF_MODIFY via std::env::var on every call (memory.rs:300); cache at startup into Config to remove TOCTOU on env mutation.

🚨 synthesize_orchestrator_doc_from_py is a rehydration trust hole (src/bridge/store_adapter.rs:635-668). Mints a fresh DocId and user_id = shared_owner_id() from any engine/orchestrator/v{N}.py file found on disk. Anyone with workspace write (or a path-traversal in another tool) can drop a v999.py and get it loaded as a privileged orchestrator doc on next restart. The new content-hash stamping happens only on set_doc, not on this restart path, so persisted-from-disk patches have no audit hash. This is the most serious issue in the PR. Should: (a) verify file hash against a stored manifest of approved versions, (b) re-run validate_python_syntax on load, (c) refuse versions whose hash isn't recorded.

Syntax validation (crates/ironclaw_engine/src/executor/scripting.rs:65) is parse-only — while True: pass or imports of privileged builtins still pass. The PR comment honestly notes "no blocklist here" — fine as a scope limit, just be aware.

Sandbox tests: sandbox_handles_deep_recursion accepts either Ok or Err (scripting.rs:2390-2401), so a silent Ok with empty answer passes. Tighten to require non-empty blocked: answer or Err.

Dead code removal (M2): confirmed zero callers; clean win. Removing the _system variants closes a privilege-escalation footgun. Add a regression test asserting stop_thread/inject_message do check ownership, otherwise the bypass can re-enter via copy-paste.

Child lease budget doc (crates/ironclaw_engine/src/capability/lease.rs:187-195): documents an intentional weakening — combined parent+child usage can exceed max_uses. This is a real budget-bypass; expose derive_child_leases_strict for callers (orchestrator self-modify path) where global budgets matter.

Sandbox escape gaps (Monty)

sandbox_context_variable_is_not_mutable proves Python-side mutations don't propagate, but only because Monty is given a copy. No tests for: __builtins__ mutation, eval/exec of attacker-supplied strings, getattr chain to reach sys.modules, FINAL shadowing, PyO3-style attribute walks.
Tool-name injection test only covers globals().get(name); doesn't probe __import__, attribute access via captured tool closures.
The depth-limit test mutates thread.config.depth directly — verify the sandboxed code path actually re-reads thread depth on each rlm_query rather than a snapshot at executor entry, otherwise the test passes for the wrong reason.

Conventions

New EngineError::InvalidInput variant and pub use validate_python_syntax follow the crate's style.
No .unwrap()/.expect() in production. The .map(|m| m.insert(...)) at src/bridge/store_adapter.rs:1259 discards the result silently — let _ would be clearer.
No new info!/warn!. Compliant.

Test coverage for security boundaries

Six new tests, all in crates/ironclaw_engine/src/executor/scripting.rs:2189-2403. Gaps:

No test for MemoryWriteTool::requires_approval × ORCHESTRATOR_SELF_MODIFY=true/false × protected path matrix.
No test for validate_orchestrator_content end-to-end via HybridStore::set_doc.
No test for synthesize_orchestrator_doc_from_py path traversal / version parsing (v-1.py, v9999999999999999999.py, v.py).
No test that content-hash stamping survives reload.
No regression test for the M2 dead-code removal (ownership checks on stop_thread/inject_message).

Top suggestions

Resolve the Always vs UnlessAutoApproved mismatch at src/tools/builtin/memory.rs:296. Fix the code, not the description.
Fix synthesize_orchestrator_doc_from_py rehydration trust hole at src/bridge/store_adapter.rs:635-668.
Cache ORCHESTRATOR_SELF_MODIFY at startup instead of reading env on every approval check.
Tighten sandbox_handles_deep_recursion to require non-empty blocked: answer or Err.
Add Monty escape tests for __builtins__ mutation, eval/exec, getattr chain, FINAL shadowing.
Strict child-lease variant for orchestrator self-modify path.
Rename PR title before merge.

🤖 Generated with Claude Code

serrrfirat · 2026-04-09T12:18:28Z

High Severity

The new orchestrator rehydration path in src/bridge/store_adapter.rs synthesizes persisted .py docs with project_id = nil.

Later loads filter shared memory docs by the real project ID, so after a restart the persisted orchestrator versions may no longer be returned to load_orchestrator_from_docs(). The practical effect is that the engine can fall back to the compiled-in orchestrator instead of the persisted runtime version.

Suggested fix: preserve the original project ID when persisting/rehydrating these .py docs, or restore it from durable metadata. A restart regression test would make this failure mode obvious.

Medium Severity

The new content_hash audit metadata is not durable for orchestrator .py docs. The hash is stamped onto the in-memory MemoryDoc, but the persistence path writes raw Python and the load path reconstructs the doc without restoring that metadata.

Suggested fix: persist metadata alongside the .py artifact or serialize orchestrator docs in a format that round-trips the audit fields.

serrrfirat · 2026-04-10T10:40:28Z

+    path.starts_with("orchestrator:")
        || path.starts_with("prompt:")
+        // Physical workspace paths (match the paths in store_adapter::doc_workspace_path)
+        || path.starts_with("engine/orchestrator/")


Critical Severity — Path traversal bypass in is_protected_orchestrator_path()

is_protected_orchestrator_path() performs raw starts_with("engine/orchestrator/") on user-supplied target without any path normalization. An LLM-authored payload can bypass this guard using:

engine/./orchestrator/v3.py

engine//orchestrator/v3.py

engine/knowledge/../orchestrator/v3.py

None of these match starts_with("engine/orchestrator/") but all resolve to the same workspace location. This bypasses both the approval gate (requires_approval) AND the hard-deny when ORCHESTRATOR_SELF_MODIFY=false.

Suggested fix: Add a normalize_path() helper that collapses ., .., double slashes, and trailing slashes before the starts_with check. Apply it in both is_protected_orchestrator_path() and in the execute() guard.

Fixed in c50d70a — added normalize_workspace_path() that strips ./empty segments and rejects ../ traversal. Both is_protected_orchestrator_path and is_orchestrator_code_path normalize before matching. Caller-level tests drive ToolDispatcher::dispatch() with all bypass vectors (dot segment, double slash, traversal). See also 37a67f3 which further refines traversal handling in requires_approval.

serrrfirat · 2026-04-10T10:40:34Z

@@ -624,11 +627,67 @@ fn is_orchestrator_code_path(path: &str) -> bool {
    path.starts_with(ORCHESTRATOR_PREFIX) && path.ends_with(".py")
 }


Critical Severity — Path normalization bypass in is_orchestrator_code_path()

Same path normalization vulnerability as is_protected_orchestrator_path() — is_orchestrator_code_path() only checks starts_with(ORCHESTRATOR_PREFIX) AND ends_with(".py") on raw input.

A crafted path like engine/knowledge/../orchestrator/v99.py bypasses this check and skips syntax validation while still being written to the orchestrator directory by the workspace layer.

Combined with the is_protected_orchestrator_path() bypass, this creates a complete security bypass: write invalid/malicious Python to the orchestrator directory without approval and without syntax validation.

Suggested fix: Apply the same normalize_path() to inputs before checking is_orchestrator_code_path().

Fixed in c50d70a — is_orchestrator_code_path now normalizes via normalize_path() before matching. Matches both .system/engine/orchestrator/ and legacy engine/orchestrator/ prefixes. Dead equality checks removed in 37a67f3.

serrrfirat · 2026-04-10T10:40:40Z

+                // Skip validation for system-internal docs (failure tracker, compiled-in seed).
+                let is_system_internal = doc.title == ORCHESTRATOR_FAILURES_TITLE
+                    || doc
+                        .metadata


High Severity — Forgeable is_system_internal metadata allows privilege escalation

The is_system_internal check relies on doc.metadata.get("source").is_some_and(|s| s == "compiled_in") to whitelist system writes. Since MemoryDoc.metadata is a serde_json::Value that the caller constructs, an LLM-authored tool call could craft a doc with metadata: {"source": "compiled_in"} and bypass both the self-modify gate (when disabled) and syntax validation (when enabled). The metadata field is not authenticated or integrity-checked.

Suggested fix: Do not trust metadata.source for authorization decisions. Use a separate internal-only flag or check the call stack/context (e.g., a CallerContext enum that distinguishes system vs. LLM-originated writes).

Fixed in c50d70a — replaced metadata-based exemption with a tokio::task_local! trusted-write scope (ironclaw_engine::runtime::with_trusted_internal_writes). The task-local cannot be set from outside a trusted callsite and does not propagate across tokio::spawn. Test in internal_write.rs asserts the no-propagation property.

serrrfirat · 2026-04-10T10:40:45Z

+        // Stamp a content hash for audit trail on all protected docs.
+        if is_protected_orchestrator_doc(doc) {
+            use sha2::{Digest, Sha256};
+            let hash = format!("{:x}", Sha256::digest(doc.content.as_bytes()));


High Severity — Content hash provides false integrity assurance

The content hash is stamped at write time but never verified on read. An attacker who can modify the workspace file directly (bypassing save_memory_doc) can change the .py content without invalidating the hash.

Suggested fix: Either verify the hash on load in load_knowledge_docs() / synthesize_orchestrator_doc_from_py(), or document explicitly that the hash is write-time-only audit trail (not integrity enforcement).

Addressed in c50d70a — documented explicitly in save_memory_doc that the hash is write-time audit only, not a runtime integrity check. The workspace file is the trust boundary; anyone with workspace access can edit files directly.

serrrfirat · 2026-04-10T10:41:01Z

+        doc_type: DocType::Note,
+        title: ORCHESTRATOR_MAIN_TITLE.to_string(),
+        content: content.to_string(),
+        source_thread_id: None,


High Severity — Rehydration uses project_id = nil, docs invisible after restart

synthesize_orchestrator_doc_from_py() uses project_id: ProjectId(uuid::Uuid::nil()). If list_memory_docs() filters by the real project_id (lines 1276-1284), synthesized orchestrator docs will be invisible to project-scoped queries after restart.

This was reported twice by @serrrfirat (2026-04-06 and 2026-04-09) and remains unaddressed.

Suggested fix: Verify that the orchestrator loading path uses list_shared_memory_docs() or equivalent that does not filter by project_id. If it does filter, pass the actual project_id through a sideband (e.g., store it in a .meta.json alongside the .py file).

Fixed in c50d70a — overrode HybridStore::list_shared_memory_docs to surface physically-global docs (orchestrator, failures, prompt overlay) regardless of stored project_id. Further strengthened in 37a67f3: save_memory_doc normalizes project_id to nil for global docs at write time so they surface immediately from any project query, not just after restart.

serrrfirat · 2026-04-10T10:41:08Z

+        // can pause for approval rather than hard-denying the call.
+        let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
+        if is_protected_orchestrator_path(target) {
+            let self_modify_enabled = std::env::var("ORCHESTRATOR_SELF_MODIFY")


High Severity — Environment variable check for security gate is fragile

requires_approval() reads ORCHESTRATOR_SELF_MODIFY from the environment on every call. Environment variables are global mutable state and can be modified by any thread. If the process runs untrusted code in-process (which Monty does), a race condition or future Monty escape could flip this env var.

More importantly, the env var check is duplicated in three places (requires_approval, execute in memory.rs, save_memory_doc in store_adapter.rs) with identical but copy-pasted logic. Any future divergence creates a bypass.

Suggested fix: Extract the env var check into a single shared function (e.g., fn is_orchestrator_self_modify_enabled() -> bool). Consider caching the value at startup rather than reading it per-call.

Fixed in c50d70a — centralized in ironclaw_engine::runtime::self_modify_enabled(): a process-wide OnceLock<bool> seeded once on first read. Tool, store, engine loop, and self-improvement mission all share the same snapshot. SelfModifyTestGuard provides serialized dev-build override (compiled out of release builds).

serrrfirat · 2026-04-10T10:41:15Z

+    }
+    let code_owned = code.to_string();
+    match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
+        MontyRun::new(code_owned, "validate.py", vec![])


Medium Severity — Syntax-only validation may provide false security confidence

validate_python_syntax() checks syntax only — it does not check semantic validity. Syntactically valid but semantically dangerous patterns (e.g., exec(compile(...)), __import__('os')) pass validation.

The comment says "Runtime sandbox (Monty) handles all security enforcement — no blocklist here" which is correct, but for a security PR, this should be documented more prominently.

Suggested fix: Add a prominent comment explicitly stating the threat model: syntax validation prevents consuming failure-budget slots, NOT preventing malicious code. The sandbox is the security boundary.

Addressed in 37a67f3 — expanded the rustdoc on validate_python_syntax with a Threat model section: syntax validation prevents auto-rollback DoS, NOT dangerous code execution. All security enforcement happens in the Monty sandbox at runtime.

serrrfirat · 2026-04-10T10:41:20Z

+    if doc.title.starts_with("orchestrator:")
+        && doc.title != ORCHESTRATOR_FAILURES_TITLE
+        && let Err(reason) = ironclaw_engine::executor::validate_python_syntax(&doc.content)
+    {


Medium Severity — validate_orchestrator_content() skips "prompt:" titled docs

is_protected_orchestrator_doc() checks both "orchestrator:" and "prompt:" titles, but validate_orchestrator_content() at line 677 only validates "orchestrator:" docs. A malicious prompt overlay (title "prompt:codeact_preamble") is stamped with a content hash but never syntax-validated.

If the engine ever supports Python prompt overlays, this gap would become a vulnerability.

Suggested fix: Either validate prompt overlays too (if they can contain code), or add a comment explicitly documenting that prompt overlays are assumed to be non-executable content.

Addressed in 37a67f3 — added a detailed comment explaining that prompt:* docs are markdown text, not Python code. Validating them as Python would reject every prompt overlay. If the engine ever supports Python prompt overlays, the validator must be updated.

…rity-hardening # Conflicts: # crates/ironclaw_engine/src/executor/scripting.rs

…ys-approval gate, global doc visibility Critical: MemoryWriteTool::execute() now validates Python syntax for protected .py paths before writing to workspace (was bypassing the Store-level validator entirely). High: ApprovalRequirement::Always now produces GatePaused(Approval) with allow_always=false instead of hard LeaseDenied; memory_write returns Always (not UnlessAutoApproved) for protected paths so session auto-approve cannot silently skip the gate. High: Global docs (orchestrator, failures, prompt overlay) have project_id normalized to nil on save so they surface immediately from any project query, not just after restart. Medium: Traversal paths no longer trigger spurious approval gates — requires_approval returns Never for normalization failures so execute() rejects them immediately as InvalidParameters. Medium: Dead code removed from is_orchestrator_code_path (equality checks unreachable after .py suffix requirement). Medium: Document prompt overlay validation skip and syntax validation threat model; expanded validate_python_syntax test coverage (size cap, empty input, unicode, error format). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rity-hardening

…d 0.8.5 Wasmtime was upgraded and no longer triggers RUSTSEC-2025-0046, RUSTSEC-2025-0118, RUSTSEC-2026-0020, RUSTSEC-2026-0021. The stale ignores caused cargo-deny to fail with "advisory was not encountered". Added RUSTSEC-2026-0097 (rand 0.8.5 unsound aliased mutable ref in ThreadRng during reseed from custom logger) — transitive dep via monty/wasmtime; upgrade to rand 0.9+ tracked separately. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ilblackdragon · 2026-04-12T13:06:06Z

+        if let Some(canonical) = normalize_workspace_path(target) {
+            let protected = canonical.starts_with(".system/engine/orchestrator")
+                || canonical.starts_with("engine/orchestrator");
+            if protected {


Fixed in eff8a22 — added path-boundary checks (equality + trailing-slash starts_with) so orchestrator_backup/ and similar won't trigger the gate.

ilblackdragon · 2026-04-12T13:06:08Z

+    /// PR #1958 reviewer-flagged regression: the original code returned
+    /// `ApprovalRequirement::Always` for protected orchestrator targets,
+    /// which `EffectBridgeAdapter::execute_action` maps to `LeaseDenied`
+    /// (a hard refusal). The fix changed it to `UnlessAutoApproved`, which
+    /// must produce a resumable `GatePaused(Approval)`. This test asserts
+    /// the full path: `requires_approval` → adapter → gate, with the
+    /// real tool wired into a real registry.


Fixed in eff8a22 — updated the docstring to reflect the current Always → GatePaused(Approval { allow_always: false }) behavior.

…rity-hardening # Conflicts: # deny.toml

…tring Address two Copilot review comments: - requires_approval used raw starts_with on normalized path, matching unrelated paths like orchestrator_backup/. Added path-boundary checks. - Updated stale docstring on the Always-approval gate test to reflect current behavior (Always→GatePaused, not UnlessAutoApproved). [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ilblackdragon · 2026-04-12T13:37:33Z

+            let final_content = if is_patch_mode {
+                let old_str = params
+                    .get("old_string")
+                    .and_then(|v| v.as_str())
+                    .unwrap_or("");
+                let new_str = params
+                    .get("new_string")
+                    .and_then(|v| v.as_str())
+                    .unwrap_or("");
+                let replace_all_flag = params
+                    .get("replace_all")
+                    .and_then(|v| v.as_bool())
+                    .unwrap_or(false);
+                let existing = workspace
+                    .read(&resolved_path)
+                    .await
+                    .map(|d| d.content)
+                    .unwrap_or_default();
+                if replace_all_flag {
+                    existing.replace(old_str, new_str)
+                } else {
+                    existing.replacen(old_str, new_str, 1)
+                }


Fixed in 7a3bf5b — moved the syntax validation block after patch-mode parameter validation (empty old_string / missing new_string are rejected first). Also deduplicated the matching logic: requires_approval now delegates to is_protected_orchestrator_path instead of duplicating the prefix checks.

Addressed comments

…on after param checks - requires_approval now delegates to is_protected_orchestrator_path instead of duplicating the matching logic (reviewer concern about drift between the two checks) - Syntax validation for protected .py paths moved after patch-mode parameter validation (empty old_string, missing new_string) so an empty-string replace can't create a huge intermediate string before being rejected [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

serrrfirat

Found one issue in the security hardening path:

Severity	Category	File:Line	Finding	Suggested Fix
High	Security / approval bypass	`crates/ironclaw_engine/src/runtime/mission.rs:2348`	The self-improvement structured JSON path still bypasses the new protected-write approval gate. When `ORCHESTRATOR_SELF_MODIFY=true`, `process_self_improvement_output` accepts LLM-authored `prompt_additions`, updates `prompt:codeact_preamble`, and calls `store.save_memory_doc` directly. That store path checks the self-modify flag and validates protected orchestrator code, but it cannot pause for human approval, so it never goes through `MemoryWriteTool::requires_approval` / `EffectBridgeAdapter`'s `Always` gate. A self-improvement mission can therefore mutate a protected prompt overlay without the approval flow this PR is trying to enforce.	Route this path through the same approval machinery, either by requiring the mission to use `memory_write` for prompt overlays or by adding a resumable pending-approval gate before the direct save. Add a caller-level test proving `prompt_additions` are not persisted while approval is pending, even with self-modify enabled.

I did not run tests; this was a review-only pass over the PR diff and the relevant call paths.

serrrfirat · 2026-04-13T07:41:00Z

+                            context.current_call_id.as_deref(),
+                            parameters,
+                            ironclaw_engine::ResumeKind::Approval {
+                                allow_always: false,


High Severity — allow_always: false not enforced server-side

The PR sets allow_always: false in ResumeKind::Approval for orchestrator writes so the UI hides the "always approve" button. However, the router (src/bridge/router.rs ~line 1516) blindly passes the user-supplied always flag to auto_approve_tool() without checking pending.resume_kind.allow_always.

A malicious API call to the approval endpoint with always: true would permanently auto-approve memory_write for the session, silently bypassing the per-call approval gate on all subsequent orchestrator writes.

Suggested fix: In the router where GateResolution::Approved { always } is constructed, clamp: let always = always && matches!(pending.resume_kind, ResumeKind::Approval { allow_always: true }). Add a caller-level integration test.

Fixed in 20168104 (prior commit). The clamp_always_to_resume_kind helper in router.rs now clamps the caller-supplied always flag against pending.resume_kind.allow_always server-side. A crafted always: true on an Approval { allow_always: false } gate is clamped to false. Four unit tests cover all three ResumeKind variants. The staging merge (f259ff9f) preserved this helper over staging's inline match (functionally equivalent, but ours has dedicated tests).

serrrfirat · 2026-04-13T07:41:05Z

+    }
+    let code_owned = code.to_string();
+    match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
+        MontyRun::new(code_owned, "validate.py", vec![])


Medium Severity — validate_python_syntax creates full VM, not just parser

validate_python_syntax creates a full MontyRun for parsing. MontyRun::new() initializes the entire VM (allocator, runtime state, etc.) and may execute partial code before hitting a syntax error. This is heavier than a parse-only check. The 256KB size cap mitigates but doesn't eliminate resource risk.

Suggested fix: If Monty exposes a parse-only API, use it instead. If not, document that MontyRun::new() only parses (does not execute) so reviewers can verify. Consider a tighter size limit or a timeout.

Addressed in 20168104. Expanded the validate_python_syntax rustdoc to document that MontyRun::new() is a parse-and-prepare step — it runs the Ruff-based parser to check syntax, then sets up runtime scaffolding (allocator handles, namespace stubs) without populating the heap or executing module-level statements. The 256 KB MAX_ORCHESTRATOR_SOURCE_BYTES cap bounds parser allocation, not execution time. If Monty later exposes a dedicated parse-only entry point, we can switch; for now the resource exposure from the current approach is bounded by the size cap.

serrrfirat · 2026-04-13T07:41:13Z

-                        .and_then(|v| v.as_str())
-                        .is_some_and(|s| s == "compiled_in");
-                if !is_system_internal {
+            let trusted = ironclaw_engine::runtime::is_trusted_internal_write_active()


Medium Severity — orchestrator:failures title exemption bypasses gate from any code path

The line let trusted = is_trusted_internal_write_active() || doc.title == ORCHESTRATOR_FAILURES_TITLE; means any code path that calls save_memory_doc with title orchestrator:failures will bypass the self-modify gate — including potentially LLM-reachable paths through the self-improvement mission or engine code that calls the store directly.

Impact is limited to corrupting the failure count (not executable code), but it could trigger or prevent auto-rollback.

Suggested fix: Move the exemption inside with_trusted_internal_writes scope at the call sites (record_orchestrator_failure, reset_orchestrator_failures) instead of hardcoding it in the gate predicate.

Fixed in 20168104. Removed the doc.title == ORCHESTRATOR_FAILURES_TITLE shortcut from the store-adapter self-modify gate entirely. The two legitimate callers (record_orchestrator_failure, reset_orchestrator_failures) now wrap their save_memory_doc calls in with_trusted_internal_writes. Additionally, untrusted writes to the failures title are now rejected regardless of self-modify state, since no LLM-reachable code path should ever persist the system-internal tracker.

serrrfirat · 2026-04-13T07:41:18Z

+///
+/// Mirrors `normalize_workspace_path` in `tools::builtin::memory` — kept
+/// local here so the store adapter has no dependency on the tool layer.
+fn normalize_path(path: &str) -> Option<String> {


Medium Severity — Duplicate path normalization on security boundary

Two independent copies of the same path normalization logic exist: normalize_workspace_path in src/tools/builtin/memory.rs (line 64) and normalize_path here in store_adapter.rs. If one is updated and the other is not, a bypass path reopens. Both are on the same security boundary.

Suggested fix: Extract into a shared utility module (e.g., crate::util::normalize_workspace_path), or at minimum add a cross-reference test that asserts both produce identical results for a canonical set of inputs.

Addressed in 20168104. Added a parity test (normalize_path_and_normalize_workspace_path_agree_on_canonical_inputs) that asserts both normalizers produce identical results on a canonical input set including traversal attempts, double slashes, dot segments, and boundary cases. Shared extraction isn't clean across the bridge/tool boundary without introducing a shared crate dependency, so the parity test is the lighter guard against drift on this security boundary.

serrrfirat · 2026-04-13T07:41:23Z

+        // execute() rejects them immediately as InvalidParameters rather
+        // than triggering a spurious approval gate before the rejection.
+        let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
+        if !self_modify_enabled() {


Medium Severity — Defense-in-depth gap in requires_approval

requires_approval returns Never when self-modify is disabled, relying solely on execute() to reject. This means the effect bridge will not pause the action — it proceeds directly to execute(). If execute() is ever refactored and the self-modify check is accidentally removed, orchestrator writes slip through without approval.

Suggested fix: Consider returning ApprovalRequirement::Always when is_protected_orchestrator_path(target) regardless of self-modify state, or document why the single execute() gate is sufficient with a cross-reference to the exact rejection line.

Addressed in 20168104. Tightened the commentary in requires_approval with an explicit cross-reference to the execute() rejection site (~line 444) and a load-bearing invariant warning: "Any refactor that moves or weakens that execute() rejection must also flip this branch to Always so the approval dialog becomes the backstop." This makes the dependency relationship between the two gates explicit in the source, so a future refactor that weakens execute() will be forced to update this branch.

5 issues from serrrfirat review (2026-04-13): 1. High — clamp `always` in `resolve_gate` against `pending.resume_kind` so a caller-supplied `always: true` can no longer install a session- wide auto-approval on an `Approval { allow_always: false }` gate (orchestrator self-modify writes). Extracted to `clamp_always_to_ resume_kind` with unit tests covering all three ResumeKind variants. 2. Medium — expand `validate_python_syntax` rustdoc to document that `MontyRun::new()` parses and prepares only (no heap/namespaces, no module-level execution) and explain the 256 KB size cap as a bound on parser allocation, not an execution-time safeguard. 3. Medium — remove the `doc.title == ORCHESTRATOR_FAILURES_TITLE` shortcut from the store-adapter self-modify gate. The two legitimate callers (`record_orchestrator_failure`, `reset_orchestrator_failures`) now wrap their `save_memory_doc` in `with_trusted_internal_writes`. Additionally reject untrusted writes to the failures title regardless of self-modify state, since no LLM-reachable code path should ever persist the system-internal tracker. 4. Medium — add a parity test asserting `normalize_path` (store adapter) and `normalize_workspace_path` (memory tool) agree on a canonical input set. Shared extraction isn't clean across the bridge/tool boundary; the test is the lighter guard against drift on this security boundary. 5. Medium — tighten `MemoryWriteTool::requires_approval` commentary with an explicit cross-reference to the `execute()` rejection site (~line 444) and a load-bearing invariant warning so a future refactor that weakens `execute()` will also be forced to flip this branch to `Always`. Also collapse a clippy::collapsible_if that appeared in the traversal gate check. [skip-regression-check] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rity-hardening Conflict resolutions: - deny.toml: drop duplicate RUSTSEC-2026-0097 (already present with detailed comment from PR #2370) - router.rs: keep our clamp_always_to_resume_kind helper with tests (functionally equivalent to staging's inline match, but tested) - store_adapter.rs: take staging's yaml_quoted_escape and debug logging for nil-project/legacy-user frontmatter; keep both list_shared_memory_docs (our security override) and list_memory_docs_by_owner (new trait method) Also bump telegram channel registry version (0.2.8→0.2.9) to satisfy pre-commit version check after staging's source changes.

…rity-hardening

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    fn requires_approval(&self, params: &serde_json::Value) -> ApprovalRequirement {
+        // When orchestrator self-modification is enabled, writing to protected
+        // paths (orchestrator code, prompt overlays) requires explicit human
+        // approval on every call. Uses `Always` (not `UnlessAutoApproved`) so
+        // session auto-approve cannot silently skip the gate — the effect
+        // bridge maps `Always` to `GatePaused(Approval { allow_always: false })`.
+        //
+        // When self-modify is disabled we return `Never` deliberately: the
+        // approval gate would be cosmetic because `execute()` below rejects
+        // the write with `NotAuthorized` *before* any persistence or patch
+        // computation (see the `is_protected_orchestrator_path(target) &&
+        // !self_modify_enabled()` branch later in this file, ~line 444).
+        // Returning `Always` here would pop an approval dialog only to hard-
+        // deny immediately afterward — wasted UX, not extra security.
+        // **Invariant**: that `execute()` rejection is the load-bearing gate
+        // in the disabled case. Any refactor that moves or weakens it must
+        // also flip this branch to `Always` so the approval dialog becomes
+        // the backstop.
+        //
+        // Delegates to is_protected_orchestrator_path for consistency, but
+        // excludes traversal attempts (`..`) — those return Never here so
+        // execute() rejects them immediately as InvalidParameters rather
+        // than triggering a spurious approval gate before the rejection.
+        let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
+        if !self_modify_enabled() {
+            return ApprovalRequirement::Never;
+        }
+        // Traversal attempts: normalization fails → execute() rejects.
+        // Don't gate-pause for something that will be rejected anyway.
+        if !target.starts_with("orchestrator:")
+            && !target.starts_with("prompt:")
+            && normalize_workspace_path(target).is_none()
+        {
+            return ApprovalRequirement::Never;
+        }
+        if is_protected_orchestrator_path(target) {
+            return ApprovalRequirement::Always;
+        }


+    let synthetic_lease = CapabilityLease {
+        id: LeaseId::new(),
+        thread_id: ThreadId::new(),
+        capability_name: "memory".into(),
+        granted_actions: GrantedActions::All,
+        granted_at: chrono::Utc::now(),
+        expires_at: None,
+        max_uses: None,
+        uses_remaining: None,
+        revoked: false,
+        revoked_reason: None,
+    };
+
+    let context = ThreadExecutionContext {
+        thread_id: ThreadId::new(),
+        thread_type: ThreadType::Mission,
+        project_id,
+        user_id: doc.user_id.clone(),
+        step_id: StepId::new(),
+        current_call_id: None,
+        source_channel: None,
+        user_timezone: None,
+    };


ilblackdragon and others added 4 commits April 2, 2026 23:49

style: cargo fmt

fe46157

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: collapse nested if per clippy suggestion

42d1a22

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: cargo fmt

018abf1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 3, 2026 07:21

github-actions bot added scope: tool/builtin Built-in tools size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 3, 2026

Copilot started reviewing on behalf of ilblackdragon April 3, 2026 07:22 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 3, 2026

View reviewed changes

serrrfirat reviewed Apr 3, 2026

View reviewed changes

ilblackdragon requested a review from serrrfirat April 6, 2026 16:42

serrrfirat reviewed Apr 7, 2026

View reviewed changes

serrrfirat reviewed Apr 10, 2026

View reviewed changes

ilblackdragon and others added 4 commits April 12, 2026 12:26

Merge remote-tracking branch 'origin/staging' into fix/engine-v2-secu…

1a91cc2

…rity-hardening # Conflicts: # crates/ironclaw_engine/src/executor/scripting.rs

Merge remote-tracking branch 'origin/staging' into fix/engine-v2-secu…

6e154e0

…rity-hardening

Copilot AI review requested due to automatic review settings April 12, 2026 06:19

Copilot started reviewing on behalf of ilblackdragon April 12, 2026 06:20 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

ilblackdragon and others added 2 commits April 12, 2026 22:01

Merge remote-tracking branch 'origin/staging' into fix/engine-v2-secu…

a84bdc4

…rity-hardening # Conflicts: # deny.toml

Copilot AI review requested due to automatic review settings April 12, 2026 13:05

Copilot started reviewing on behalf of ilblackdragon April 12, 2026 13:06 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

ilblackdragon requested a review from serrrfirat April 12, 2026 13:31

serrrfirat reviewed Apr 13, 2026

View reviewed changes

ilblackdragon and others added 3 commits April 14, 2026 16:01

Merge remote-tracking branch 'origin/staging' into fix/engine-v2-secu…

fa7af91

…rity-hardening

Copilot AI review requested due to automatic review settings April 17, 2026 15:16

Copilot started reviewing on behalf of ilblackdragon April 17, 2026 15:16 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

ilblackdragon merged commit e65ba2e into staging Apr 17, 2026
19 checks passed

ilblackdragon deleted the fix/engine-v2-security-hardening branch April 17, 2026 15:31

github-actions bot mentioned this pull request Apr 17, 2026

chore: promote staging to staging-promote/79ad2e38-24572736232 (2026-04-17 16:19 UTC) #2587

Merged

Target	`ORCHESTRATOR_SELF_MODIFY`	Expected result
`"orchestrator:main"`	`"true"`	`UnlessAutoApproved`
`"engine/orchestrator/v3.py"`	unset / `"false"`	`Never` (defers to `execute()` block)
`"daily_log"`	any	`Never`

		@@ -624,11 +627,67 @@ fn is_orchestrator_code_path(path: &str) -> bool {
		path.starts_with(ORCHESTRATOR_PREFIX) && path.ends_with(".py")
		}

Conversation

ilblackdragon commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

C1/C2 — Orchestrator approval gates (when self-modify is enabled)

Round 2 — Path bypass, forgeable metadata, rehydration, and env-var fragility

C4 — Sandbox security tests

M2 — Remove ownership-bypassing thread operations

Round 2 test coverage (caller-level, per .claude/rules/testing.md)

Other

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilblackdragon commented Apr 3, 2026 •

edited

Loading

Round 2 test coverage (caller-level, per `.claude/rules/testing.md`)