Skip to content

fix(engine): security hardening for v2 orchestrator and Monty sandbox#1958

Merged
ilblackdragon merged 17 commits intostagingfrom
fix/engine-v2-security-hardening
Apr 17, 2026
Merged

fix(engine): security hardening for v2 orchestrator and Monty sandbox#1958
ilblackdragon merged 17 commits intostagingfrom
fix/engine-v2-security-hardening

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented Apr 3, 2026

Summary

  • C1/C2: Approval gate + syntax validation for orchestrator self-modification when ORCHESTRATOR_SELF_MODIFY=true
  • C4: 6 sandbox security negative tests (rlm_query depth, FINAL injection, tool name injection, context mutation, deep recursion, syntax validation)
  • M2: Remove dead stop_thread_system/inject_message_system methods that bypassed ownership checks
  • Round 2: Close path-normalization bypass, forgeable-metadata escalation, and orchestrator rehydration holes surfaced in @serrrfirat's review
  • Document child lease budget snapshot semantics, add EngineError::InvalidInput variant

Addresses deferred security items from #1557 review. Related: #1956 (incident logging).

C1/C2 — Orchestrator approval gates (when self-modify is enabled)

Layer Amemory_write tool returns ApprovalRequirement::UnlessAutoApproved for protected orchestrator paths. The v2 effect bridge maps this to a resumable GatePaused(Approval) so human approval can pause + resume the write. (Earlier rounds used Always, which the bridge mapped to LeaseDenied — a permanent refusal that made the gate unusable; fixed.)

Layer B — Store adapter validates Python syntax via Monty parser before persisting orchestrator patches. Broken code is rejected upfront instead of consuming failure-budget slots (3 failures trigger auto-rollback). Content hash (SHA-256) stamped on all protected docs for write-time audit (not runtime integrity — the workspace file is the trust boundary).

Round 2 — Path bypass, forgeable metadata, rehydration, and env-var fragility

  • Path normalization bypass (critical): is_protected_orchestrator_path and is_orchestrator_code_path did raw starts_with, so engine/./orchestrator/v3.py, engine//orchestrator/v3.py, and engine/knowledge/../orchestrator/v3.py slipped past the gate. Both guards now normalize first (strip ./empty segments, reject .. traversal) and match against both .system/engine/orchestrator/ and the legacy engine/orchestrator/ prefix.
  • Forgeable is_system_internal (high): the gate previously exempted any doc whose metadata.source == "compiled_in". An LLM tool call could craft that field and bypass denial + validation. Replaced with a tokio::task_local!-backed trusted-write scope (ironclaw_engine::runtime::with_trusted_internal_writes) that MissionManager::seed_orchestrator_v0 enters; the store gate reads the task-local, which cannot be set from outside a trusted callsite and does not propagate across tokio::spawn.
  • Rehydration invisibility (high): synthesize_orchestrator_doc_from_py returned docs with ProjectId::nil(), but list_memory_docs filters by exact project, so persisted .py orchestrator files disappeared from project-scoped queries after restart. Overrode HybridStore::list_shared_memory_docs to surface globally-stored docs (orchestrator, failure tracker, prompt overlay) for any project query, matching the physically-global storage layout.
  • Env var read per call (high): ORCHESTRATOR_SELF_MODIFY was read on every gate check. Centralized in runtime::self_modify_enabled() — a process-wide OnceLock<bool> seeded on first read; tool, store, engine loop, and self-improvement mission all share the same snapshot. SelfModifyTestGuard provides a serialized dev-build override for tests (compiled out of release builds).
  • Frontmatter project_id / user_id loss: fixed by persisting both fields in serialize_knowledge_doc and parsing them in deserialize_knowledge_doc. Round-trip test in store_adapter::migration_tests::knowledge_md_doc_round_trips_project_id_and_user_id.

C4 — Sandbox security tests

Test What it proves
sandbox_enforces_rlm_query_depth_limit Depth check at max recursion refuses with clear error
sandbox_rejects_final_injection FINAL() captures payload literally, no code execution
sandbox_rejects_tool_name_injection Dynamic name construction can't bypass lease enforcement
sandbox_context_variable_is_not_mutable Python-side mutations don't propagate to Rust thread state
sandbox_handles_deep_recursion Infinite recursion terminates safely (no crash/hang)
validate_syntax_rejects_broken_code Syntax validation unit test

M2 — Remove ownership-bypassing thread operations

stop_thread_system() and inject_message_system() had zero callers anywhere in the codebase — dead code that was a privilege escalation footgun. Removed entirely; ownership-checking variants are the only API.

Round 2 test coverage (caller-level, per .claude/rules/testing.md)

Unit tests (41 new):

  • path normalization & protected-path guard (9 cases), requires_approval branches (7 cases), store-adapter helper predicates + validate_orchestrator_content + synthesize_orchestrator_doc_from_py (23 cases), trusted-write scope semantics (3 cases), list_shared_memory_docs override (2 cases)

Integration tests (11 new, libsql feature):

  • tools::dispatch::integration_tests — drives ToolDispatcher::dispatch() against the real MemoryWriteTool for every bypass vector (logical alias, physical path, dot segment, double slash, traversal, unprotected baseline)
  • bridge::store_adapter::migration_tests::orchestrator_py_round_trips_through_restart — full write → restart → load → cross-project query cycle
  • bridge::store_adapter::migration_tests::knowledge_md_doc_round_trips_project_id_and_user_id — frontmatter project/user survive restart
  • bridge::store_adapter::migration_tests::invalid_python_orchestrator_is_rejected_at_write_time — validator gate fires before persistence
  • bridge::effect_adapter::tests::memory_write_orchestrator_target_paused_for_approval_when_self_modify_enabled — asserts UnlessAutoApproved produces GatePaused(Approval), not LeaseDenied (the round-1 miss)
  • bridge::effect_adapter::tests::memory_write_orchestrator_target_refused_when_self_modify_disabled — asserts no gate pauses when self-modify is off

Other

  • Document child lease budget snapshot semantics in derive_child_leases (intentional: independent budgets avoid cross-thread contention)
  • Add EngineError::InvalidInput variant for validation failures
  • Re-export validate_python_syntax from executor module

Test plan

  • cargo fmt — clean
  • cargo clippy -p ironclaw --lib --tests --features libsql — 0 warnings
  • cargo clippy -p ironclaw_engine --all-targets — 0 warnings
  • cargo test -p ironclaw_engine — 358/358 pass
  • cargo test -p ironclaw --lib --features libsql — 4735/4735 pass
  • cargo test --test engine_v2_gate_integration --test engine_v2_skill_codeact — 27/27 pass

🤖 Generated with Claude Code

ilblackdragon and others added 4 commits April 2, 2026 23:49
Address deferred security items C1/C2/C4/M2 from PR #1557 review:

C1/C2 — Orchestrator self-modification approval gates:
- memory_write tool now returns ApprovalRequirement::Always for protected
  orchestrator paths when ORCHESTRATOR_SELF_MODIFY=true, forcing human
  approval before any orchestrator or prompt overlay patch is written
- Store adapter validates Python syntax via Monty parser before persisting
  orchestrator patches, preventing broken code from consuming failure budget
- Content hash (SHA-256) stamped on all protected docs for audit trail

C4 — Sandbox security test coverage (6 new tests):
- sandbox_enforces_rlm_query_depth_limit: depth check at max recursion
- sandbox_rejects_final_injection: FINAL() captures payload literally
- sandbox_rejects_tool_name_injection: dynamic names can't bypass leases
- sandbox_context_variable_is_not_mutable: Python mutations don't affect Rust
- sandbox_handles_deep_recursion: infinite recursion terminates safely
- validate_syntax_rejects_broken_code: syntax validation unit test

M2 — Remove ownership-bypassing thread operations:
- Delete stop_thread_system() and inject_message_system() — dead code with
  no callers, was a privilege escalation footgun. Ownership-checking variants
  (stop_thread/inject_message) are the only API now.

Also: document child lease budget snapshot semantics (intentional design),
add EngineError::InvalidInput variant, re-export validate_python_syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 07:21
@github-actions github-actions bot added scope: tool/builtin Built-in tools size: L 200-499 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Apr 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the engine v2 self-modifying orchestrator and Monty sandbox by adding explicit approval gating for orchestrator/prompt writes, validating orchestrator patch syntax before persistence, expanding negative security tests, and removing ownership-bypassing thread operations.

Changes:

  • Add approval requirement logic for memory_write when targeting protected orchestrator/prompt docs under ORCHESTRATOR_SELF_MODIFY=true.
  • Validate orchestrator patch Python syntax before persisting; stamp protected docs with a SHA-256 content hash for audit.
  • Expand Monty sandbox negative tests and remove bypass-style *_system thread operations; document child-lease budget snapshot semantics and add EngineError::InvalidInput.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/tools/builtin/memory.rs Adds approval gating for writes to protected orchestrator/prompt targets.
src/bridge/store_adapter.rs Adds orchestrator syntax validation, protected-doc hashing, and self-modify enforcement in the store adapter.
crates/ironclaw_engine/src/types/error.rs Introduces EngineError::InvalidInput for validation failures.
crates/ironclaw_engine/src/runtime/manager.rs Removes ownership-bypassing thread stop/inject APIs.
crates/ironclaw_engine/src/executor/scripting.rs Adds validate_python_syntax() and multiple sandbox security negative tests.
crates/ironclaw_engine/src/executor/mod.rs Re-exports validate_python_syntax.
crates/ironclaw_engine/src/capability/lease.rs Documents child lease budget snapshot semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bridge/store_adapter.rs Outdated
Comment on lines +1180 to +1184
// Self-modification is enabled — validate content before persisting.
// Skip validation for system-internal docs (failure tracker, compiled-in seed).
let is_system_internal = doc.title == ORCHESTRATOR_FAILURES_TITLE
|| doc
.metadata
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this self-modify gate, the allow=false branch checks doc.title == "orchestrator:failures" while the allow=true branch uses ORCHESTRATOR_FAILURES_TITLE. Using the constant consistently in both branches avoids drift and makes the policy easier to audit.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — replaced string literal with ORCHESTRATOR_FAILURES_TITLE constant.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 2bdd5c0 (round 1).

Comment thread src/bridge/store_adapter.rs Outdated
Comment on lines +1199 to +1202
stamped
.metadata
.as_object_mut()
.map(|m| m.insert("content_hash".into(), serde_json::Value::String(hash)));
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content_hash stamping is best-effort because it only runs when stamped.metadata is already a JSON object. If legacy docs can have metadata: null (or non-object), the hash silently won’t be recorded. Consider normalizing metadata to {} when stamping protected docs so the audit trail is reliably present.

Suggested change
stamped
.metadata
.as_object_mut()
.map(|m| m.insert("content_hash".into(), serde_json::Value::String(hash)));
if !stamped.metadata.is_object() {
stamped.metadata = serde_json::Value::Object(serde_json::Map::new());
}
stamped
.metadata
.as_object_mut()
.expect("metadata normalized to object before stamping content_hash")
.insert("content_hash".into(), serde_json::Value::String(hash));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — metadata is now normalized to {} before stamping the content hash.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 2bdd5c0 (round 1).

Comment thread src/bridge/store_adapter.rs Outdated
Comment on lines +1195 to +1203
// Stamp a content hash for audit trail on all protected docs.
if is_protected_orchestrator_doc(doc) {
use sha2::{Digest, Sha256};
let hash = format!("{:x}", Sha256::digest(doc.content.as_bytes()));
stamped
.metadata
.as_object_mut()
.map(|m| m.insert("content_hash".into(), serde_json::Value::String(hash)));
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stamped hash is computed over the full doc.content, but at runtime prompt overlays are truncated (see MAX_PROMPT_OVERLAY_CHARS in the engine prompt builder). For prompt overlays, this can make the audit hash differ from the effective content used in execution. Either enforce the same size cap at write-time for prompt docs or hash the truncated form that will actually be used.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash is an audit trail recording what was written to the store, not a runtime integrity check. Hashing the full content is the correct behavior — it lets you verify what was persisted matches what was submitted. The runtime truncation is a display concern, not a storage concern.

Comment on lines +63 to +67
pub fn validate_python_syntax(code: &str) -> Result<(), String> {
let code_owned = code.to_string();
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
MontyRun::new(code_owned, "validate.py", vec![])
})) {
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_python_syntax() parses unbounded input without any size cap, and it allocates code.to_string() before parsing. Because this runs on the store write-path for orchestrator patches, a very large patch could cause avoidable CPU/memory pressure (even though no code is executed). Consider enforcing a maximum length for orchestrator source (or rejecting/streaming above a threshold) before invoking Monty’s parser.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — added a 256KB size cap (MAX_ORCHESTRATOR_SOURCE_BYTES) before invoking the Monty parser. The compiled-in default is ~2KB so this is generous but prevents pathological inputs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 2bdd5c0 (round 1).

Comment on lines +2311 to +2313
assert!(
!answer.starts_with("ESCAPED"),
"dynamic name construction should not bypass leases, got: {answer}",
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sandbox_rejects_tool_name_injection is very permissive: it only asserts the final answer doesn’t start with "ESCAPED". As written it can pass for reasons unrelated to lease enforcement (e.g., the name lookup returns None, or globals().get() returns a non-callable placeholder). To make this a stronger regression test, assert the expected outcome explicitly (e.g., final_answer == "not_found" or an error string that indicates denial by lease/policy).

Suggested change
assert!(
!answer.starts_with("ESCAPED"),
"dynamic name construction should not bypass leases, got: {answer}",
assert_eq!(
answer,
"not_found",
"dynamic name construction should resolve to no callable tool and must not bypass leases, got: {answer}",

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — added a second assertion verifying the expected outcome (not_found or blocked:*), not just absence of ESCAPED.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 2bdd5c0 (round 1).

Comment on lines +282 to +297
fn requires_approval(&self, params: &serde_json::Value) -> ApprovalRequirement {
// When orchestrator self-modification is enabled, writing to protected
// paths (orchestrator code, prompt overlays) always requires explicit
// human approval — even if the session has auto-approve enabled.
let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
if is_protected_orchestrator_path(target) {
let self_modify_enabled = std::env::var("ORCHESTRATOR_SELF_MODIFY")
.map(|v| v == "true" || v == "1")
.unwrap_or(false);
if self_modify_enabled {
return ApprovalRequirement::Always;
}
// When disabled, execute() returns NotAuthorized before any write.
}
ApprovalRequirement::Never
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_write::requires_approval() returns ApprovalRequirement::Always for protected orchestrator targets when ORCHESTRATOR_SELF_MODIFY is enabled. In the engine v2 bridge (EffectBridgeAdapter::execute_action), ApprovalRequirement::Always is currently treated as a hard denial (it returns EngineError::LeaseDenied rather than pausing for interactive approval), so this may inadvertently make orchestrator patch writes impossible even with human approval. If protected writes are meant to be approvable (just not auto-approvable), the v2 adapter/gate logic needs to pause on Always instead of denying, or this tool should use a requirement variant that the adapter can pause on.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — changed to ApprovalRequirement::UnlessAutoApproved. The v2 effect bridge pauses on this variant (producing a resumable GatePaused approval gate) rather than hard-denying. This makes the protected self-modify path usable with human approval.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces security enhancements and validation logic for orchestrator self-modification. Key changes include the implementation of a Python syntax validator to prevent broken code from being persisted, the addition of content hashing for audit trails on protected documents, and a mandatory human approval requirement for modifications to orchestrator paths. Furthermore, the PR adds comprehensive sandbox security tests covering recursion limits, injection prevention, and state mutability, while removing several system-level thread management methods. I have no feedback to provide.

Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting four review findings from the local audit.

Comment thread src/tools/builtin/memory.rs Outdated
.map(|v| v == "true" || v == "1")
.unwrap_or(false);
if self_modify_enabled {
return ApprovalRequirement::Always;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity

This ApprovalRequirement::Always is not approvable in engine v2 today. EffectBridgeAdapter::execute_action() still maps Always to EngineError::LeaseDenied, and only UnlessAutoApproved becomes GatePaused(Approval). That means with ORCHESTRATOR_SELF_MODIFY=true, a call like memory_write(target="prompt:codeact_preamble", ...) fails outright instead of pausing for human approval, so the protected self-modify path is unusable.

Suggested fix: either return UnlessAutoApproved here and rely on the approval gate to suppress the allow always option for these targets, or update the v2 effect bridge so Always produces a resumable approval gate instead of a hard denial.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — changed to UnlessAutoApproved. The v2 effect bridge maps this to a resumable GatePaused(Approval) gate, so the user sees an approval prompt and can approve/deny. Always was incorrectly hard-denying via LeaseDenied.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

// When orchestrator self-modification is enabled, writing to protected
// paths (orchestrator code, prompt overlays) always requires explicit
// human approval — even if the session has auto-approve enabled.
let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity

This check only protects the logical aliases (prompt:* / orchestrator:*). The persisted prompt overlay actually lives at engine/orchestrator/codeact-preamble-overlay.md, and memory_write still accepts arbitrary workspace paths. With self-modify enabled, writing that real path avoids the new approval gate while still mutating the runtime overlay.

Suggested fix: canonicalize target through the same engine-doc path mapping used by the store adapter, and treat the persisted engine paths as protected too.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0is_protected_orchestrator_path() now also matches physical workspace paths (engine/orchestrator/*). Writing to the persisted file path directly will now trigger the same approval gate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.


self.docs.write().await.insert(doc.id, doc.clone());
self.persist_doc(doc).await;
let mut stamped = doc.clone();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity

This persists frontmatter-based engine docs, but the restart path currently reconstructs them with project_id = nil and user_id = "legacy" in deserialize_knowledge_doc(). list_memory_docs() later filters by the real project/user pair, so after load_state_from_workspace() these persisted markdown docs disappear from list_shared_memory_docs(project_id) / list_memory_docs(...) and stop participating in prompt/orchestrator loading.

Suggested fix: persist project/user in frontmatter, or restore them during load_knowledge_docs() before inserting docs into self.docs. A restart round-trip test through HybridStore::load_state_from_workspace() would catch this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0serialize_knowledge_doc() now persists project_id and user_id in frontmatter, and deserialize_knowledge_doc() restores them on load. Legacy docs without these fields fall back to nil/"legacy" for backward compatibility.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

}

self.docs.write().await.insert(stamped.id, stamped.clone());
self.persist_doc(&stamped).await;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity

This write path persists runtime orchestrator versions as raw engine/orchestrator/vN.py, but startup only reconstructs docs from frontmatter markdown or JSON. Those .py files get skipped by load_knowledge_docs(), so saved orchestrator patches do not survive a process restart and load_orchestrator() falls back to the compiled-in default.

Suggested fix: either persist orchestrator versions in a loadable MemoryDoc format, or synthesize MemoryDocs from the vN.py files during startup. This also needs a restart persistence test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2bdd5c0 — added synthesize_orchestrator_doc_from_py() which reconstructs MemoryDocs from raw .py files during load_knowledge_docs(). Extracts the version number from the filename (v{N}.py) and creates a doc with the correct title, tag, and version metadata so load_orchestrator_from_docs() finds it on restart.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

Review feedback from Copilot and serrrfirat:

1. Use ORCHESTRATOR_FAILURES_TITLE constant consistently in both branches
   of the self-modify gate (was string literal in deny branch).

2. Normalize metadata to {} before stamping content_hash, so the audit
   trail is reliably present even for docs with null/non-object metadata.

3. Add 256KB size cap to validate_python_syntax() to prevent pathological
   inputs from causing CPU/memory pressure on the store write path.

4. Tighten sandbox_rejects_tool_name_injection test assertion to verify
   the expected "not_found" outcome, not just absence of "ESCAPED".

5. Change ApprovalRequirement::Always → UnlessAutoApproved for protected
   orchestrator writes. The v2 effect bridge maps Always to hard denial
   (LeaseDenied), making the self-modify path unusable. UnlessAutoApproved
   triggers the gate/pause flow so human approval is possible.

6. Extend is_protected_orchestrator_path() to cover physical workspace
   paths (engine/orchestrator/*) in addition to logical aliases. Prevents
   bypassing the approval gate by writing to the persisted file path.

7. Persist project_id and user_id in frontmatter (serialize_knowledge_doc)
   and restore them on load (deserialize_knowledge_doc). Previously all
   reloaded docs got project_id=nil and user_id="legacy", making them
   invisible to project-scoped queries after restart.

8. Synthesize MemoryDocs from raw .py orchestrator files on startup
   (synthesize_orchestrator_doc_from_py). Orchestrator versions are
   persisted as engine/orchestrator/v{N}.py but load_knowledge_docs
   could not parse them — they silently disappeared on restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ilblackdragon ilblackdragon requested a review from serrrfirat April 6, 2026 16:42
Copy link
Copy Markdown
Collaborator

I think the new .py orchestrator recovery path still misses the restarted-runtime case because the synthesized doc is created with project_id = nil, but the later lookup path still filters by the real project id.

Concrete path:

  • synthesize_orchestrator_doc_from_py() creates a MemoryDoc with project_id: ProjectId(uuid::Uuid::nil())
  • Store::list_shared_memory_docs(project_id) delegates to list_memory_docs(project_id, shared_owner_id())
  • HybridStore::list_memory_docs() then filters with doc.project_id == project_id && doc.user_id == user_id
  • LoopEngine::resolve_orchestrator() loads shared orchestrators via list_shared_memory_docs(self.thread.project_id)

So after restart, the synthesized runtime orchestrator docs still will not be returned for the real project, and the engine appears to fall back to the built-in orchestrator instead of the persisted .py version.

If the intent is for shared orchestrator docs to survive restart, I think this needs either:

  1. preserving/deriving the real project id when synthesizing the doc, or
  2. a corresponding store-query change so shared orchestrator docs are not filtered out by the nil project_id.

A restart regression test around the .py synthesis + reload path would also help lock this in.

Comment thread src/tools/builtin/memory.rs Outdated
|| path.starts_with("prompt:")
// Physical workspace paths (match the paths in store_adapter::doc_workspace_path)
|| path.starts_with("engine/orchestrator/")
|| path == "engine/orchestrator"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Path normalization bypass in orchestrator path guard

is_protected_orchestrator_path uses raw starts_with("engine/orchestrator/") on the user-supplied target parameter. Paths with dot-components or double slashes bypass this check while resolving to the same filesystem location:

  • engine/./orchestrator/v3.py → does NOT start with engine/orchestrator/bypasses check
  • engine//orchestrator/v3.py → same bypass
  • OS normalizes both to engine/orchestrator/v3.py on disk

On filesystem-backed workspaces, a prompt-injected LLM could write orchestrator code even with ORCHESTRATOR_SELF_MODIFY=false, because execute() uses the same unguarded check (line ~342).

Suggested fix — normalize before checking:

fn normalize_path(path: &str) -> String {
    path.split('/')
        .filter(|s| !s.is_empty() && *s != ".")
        .collect::<Vec<_>>()
        .join("/")
}

fn is_protected_orchestrator_path(path: &str) -> bool {
    let norm = normalize_path(path);
    norm.starts_with("orchestrator:")
        || norm.starts_with("prompt:")
        || norm.starts_with("engine/orchestrator/")
        || norm == "engine/orchestrator"
}

Note: .. components should also be collapsed if the workspace doesn't reject them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

Comment thread src/tools/builtin/memory.rs Outdated
.map(|v| v == "true" || v == "1")
.unwrap_or(false);
if self_modify_enabled {
return ApprovalRequirement::UnlessAutoApproved;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — PR description says Always but code returns UnlessAutoApproved

The PR summary states:

Layer Amemory_write tool returns ApprovalRequirement::Always for protected orchestrator paths

But the code returns UnlessAutoApproved (line 297). These have different semantics:

  • Always → approval required every time, even when session auto-approve is enabled
  • UnlessAutoApproved → can be silently bypassed when a session has auto-approve on

The code comment at line 289 explains the intent well ("so the v2 engine gate system can pause for approval rather than hard-denying the call"), but the PR description is misleading for reviewers.

Suggested fix: Either update the PR description to say UnlessAutoApproved, or change to Always if the intent is to never skip approval for orchestrator self-modification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description updated. Code now uses ApprovalRequirement::Always (as of 37a67f3) with the effect bridge mapping Always to GatePaused(Approval { allow_always: false }) instead of LeaseDenied. So the protected self-modify path always pauses for human approval and cannot be session-auto-approved.

/// (3 failures trigger auto-rollback). Runtime sandbox (Monty) handles all
/// security enforcement — no blocklist here.
fn validate_orchestrator_content(doc: &MemoryDoc) -> Result<(), EngineError> {
if doc.title.starts_with("orchestrator:")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Missing unit tests for new store adapter functions

Four new functions in this file have no dedicated unit tests:

  1. validate_orchestrator_content() (line 677) — should test: valid Python passes, broken Python rejected, failure-tracker title (orchestrator:failures) is exempt, oversized content handling
  2. synthesize_orchestrator_doc_from_py() (line 637) — should test: valid v3.py path, non-orchestrator prefix returns None, non-numeric version (vfoo.py) returns None, edge case v0.py
  3. Content hash stamping (~line 800) — should verify hash is present and correct on saved protected docs, and absent on non-protected docs
  4. Frontmatter roundtrip with new fieldsproject_id and user_id were added to serialization (line 791) and deserialization (line 897). A roundtrip test should verify these survive serialize → deserialize

Per project rules (.claude/rules/review-discipline.md): "Regression test with every fix."

})
}

fn requires_approval(&self, params: &serde_json::Value) -> ApprovalRequirement {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — No unit test for requires_approval approval gate

This new method has three meaningful code paths, none of which are tested:

Target ORCHESTRATOR_SELF_MODIFY Expected result
"orchestrator:main" "true" UnlessAutoApproved
"engine/orchestrator/v3.py" unset / "false" Never (defers to execute() block)
"daily_log" any Never

Suggested test:

#[test]
fn requires_approval_gates_orchestrator_paths() {
    let tool = MemoryWriteTool::new(/* ... */);

    // Non-orchestrator path — never needs approval
    let params = serde_json::json!({"target": "daily_log"});
    assert!(matches!(tool.requires_approval(&params), ApprovalRequirement::Never));

    // Orchestrator path, self-modify disabled — no gate (execute blocks)
    std::env::remove_var("ORCHESTRATOR_SELF_MODIFY");
    let params = serde_json::json!({"target": "orchestrator:main"});
    assert!(matches!(tool.requires_approval(&params), ApprovalRequirement::Never));

    // Orchestrator path, self-modify enabled — requires approval
    std::env::set_var("ORCHESTRATOR_SELF_MODIFY", "true");
    assert!(matches!(tool.requires_approval(&params), ApprovalRequirement::UnlessAutoApproved));
    std::env::remove_var("ORCHESTRATOR_SELF_MODIFY");
}

(Note: env var mutation in tests needs serialization via #[serial] or temp_env crate.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed across c50d70a (round 2) and 37a67f3 (round 3). See commit messages for details.

@ilblackdragon
Copy link
Copy Markdown
Member Author

Code Review

Verdict: Mergeable after spec/code mismatch + rehydration hole are fixed

399/-44 across 7 files. Adds (a) approval gate + Python syntax pre-validation for orchestrator self-modification (ORCHESTRATOR_SELF_MODIFY=true), (b) SHA-256 content-hash stamping on protected docs, (c) .py reconstitution on restart, (d) frontmatter project_id/user_id round-tripping, (e) removal of dead stop_thread_system/inject_message_system ownership-bypass methods, and (f) 6 new Monty sandbox negative tests.

Scope clarification

The PR title is misleading. Does not touch src/orchestrator/ (sandbox HTTP API), src/sandbox/ (Docker), network proxy allowlist, container isolation, credential injection, or per-job tokens. "Orchestrator" here = engine v2 self-modifying agent loop; "Monty sandbox" = the in-process Python interpreter, not Docker. Suggest renaming to "engine v2 self-modify gates + Monty negative tests" before merge.

Do hardening measures close claimed gaps? Mostly yes, with caveats.

🚨 Verdict mismatchMemoryWriteTool::requires_approval returns UnlessAutoApproved (src/tools/builtin/memory.rs:296), but the PR description claims ApprovalRequirement::Always. A session that auto-approves memory_write (or auto_approve_all) silently bypasses the gate. For self-modification of the agent loop, Always is the right choice — fix the code. Also, requires_approval reads ORCHESTRATOR_SELF_MODIFY via std::env::var on every call (memory.rs:300); cache at startup into Config to remove TOCTOU on env mutation.

🚨 synthesize_orchestrator_doc_from_py is a rehydration trust hole (src/bridge/store_adapter.rs:635-668). Mints a fresh DocId and user_id = shared_owner_id() from any engine/orchestrator/v{N}.py file found on disk. Anyone with workspace write (or a path-traversal in another tool) can drop a v999.py and get it loaded as a privileged orchestrator doc on next restart. The new content-hash stamping happens only on set_doc, not on this restart path, so persisted-from-disk patches have no audit hash. This is the most serious issue in the PR. Should: (a) verify file hash against a stored manifest of approved versions, (b) re-run validate_python_syntax on load, (c) refuse versions whose hash isn't recorded.

Syntax validation (crates/ironclaw_engine/src/executor/scripting.rs:65) is parse-only — while True: pass or imports of privileged builtins still pass. The PR comment honestly notes "no blocklist here" — fine as a scope limit, just be aware.

Sandbox tests: sandbox_handles_deep_recursion accepts either Ok or Err (scripting.rs:2390-2401), so a silent Ok with empty answer passes. Tighten to require non-empty blocked: answer or Err.

Dead code removal (M2): confirmed zero callers; clean win. Removing the _system variants closes a privilege-escalation footgun. Add a regression test asserting stop_thread/inject_message do check ownership, otherwise the bypass can re-enter via copy-paste.

Child lease budget doc (crates/ironclaw_engine/src/capability/lease.rs:187-195): documents an intentional weakening — combined parent+child usage can exceed max_uses. This is a real budget-bypass; expose derive_child_leases_strict for callers (orchestrator self-modify path) where global budgets matter.

Sandbox escape gaps (Monty)

  • sandbox_context_variable_is_not_mutable proves Python-side mutations don't propagate, but only because Monty is given a copy. No tests for: __builtins__ mutation, eval/exec of attacker-supplied strings, getattr chain to reach sys.modules, FINAL shadowing, PyO3-style attribute walks.
  • Tool-name injection test only covers globals().get(name); doesn't probe __import__, attribute access via captured tool closures.
  • The depth-limit test mutates thread.config.depth directly — verify the sandboxed code path actually re-reads thread depth on each rlm_query rather than a snapshot at executor entry, otherwise the test passes for the wrong reason.

Conventions

  • New EngineError::InvalidInput variant and pub use validate_python_syntax follow the crate's style.
  • No .unwrap()/.expect() in production. The .map(|m| m.insert(...)) at src/bridge/store_adapter.rs:1259 discards the result silently — let _ would be clearer.
  • No new info!/warn!. Compliant.

Test coverage for security boundaries

Six new tests, all in crates/ironclaw_engine/src/executor/scripting.rs:2189-2403. Gaps:

  1. No test for MemoryWriteTool::requires_approval × ORCHESTRATOR_SELF_MODIFY=true/false × protected path matrix.
  2. No test for validate_orchestrator_content end-to-end via HybridStore::set_doc.
  3. No test for synthesize_orchestrator_doc_from_py path traversal / version parsing (v-1.py, v9999999999999999999.py, v.py).
  4. No test that content-hash stamping survives reload.
  5. No regression test for the M2 dead-code removal (ownership checks on stop_thread/inject_message).

Top suggestions

  1. Resolve the Always vs UnlessAutoApproved mismatch at src/tools/builtin/memory.rs:296. Fix the code, not the description.
  2. Fix synthesize_orchestrator_doc_from_py rehydration trust hole at src/bridge/store_adapter.rs:635-668.
  3. Cache ORCHESTRATOR_SELF_MODIFY at startup instead of reading env on every approval check.
  4. Tighten sandbox_handles_deep_recursion to require non-empty blocked: answer or Err.
  5. Add Monty escape tests for __builtins__ mutation, eval/exec, getattr chain, FINAL shadowing.
  6. Strict child-lease variant for orchestrator self-modify path.
  7. Rename PR title before merge.

🤖 Generated with Claude Code

@serrrfirat
Copy link
Copy Markdown
Collaborator

High Severity

The new orchestrator rehydration path in src/bridge/store_adapter.rs synthesizes persisted .py docs with project_id = nil.

Later loads filter shared memory docs by the real project ID, so after a restart the persisted orchestrator versions may no longer be returned to load_orchestrator_from_docs(). The practical effect is that the engine can fall back to the compiled-in orchestrator instead of the persisted runtime version.

Suggested fix: preserve the original project ID when persisting/rehydrating these .py docs, or restore it from durable metadata. A restart regression test would make this failure mode obvious.


Medium Severity

The new content_hash audit metadata is not durable for orchestrator .py docs. The hash is stamped onto the in-memory MemoryDoc, but the persistence path writes raw Python and the load path reconstructs the doc without restoring that metadata.

Suggested fix: persist metadata alongside the .py artifact or serialize orchestrator docs in a format that round-trips the audit fields.

Comment thread src/tools/builtin/memory.rs Outdated
path.starts_with("orchestrator:")
|| path.starts_with("prompt:")
// Physical workspace paths (match the paths in store_adapter::doc_workspace_path)
|| path.starts_with("engine/orchestrator/")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Severity — Path traversal bypass in is_protected_orchestrator_path()

is_protected_orchestrator_path() performs raw starts_with("engine/orchestrator/") on user-supplied target without any path normalization. An LLM-authored payload can bypass this guard using:

  • engine/./orchestrator/v3.py
  • engine//orchestrator/v3.py
  • engine/knowledge/../orchestrator/v3.py

None of these match starts_with("engine/orchestrator/") but all resolve to the same workspace location. This bypasses both the approval gate (requires_approval) AND the hard-deny when ORCHESTRATOR_SELF_MODIFY=false.

Suggested fix: Add a normalize_path() helper that collapses ., .., double slashes, and trailing slashes before the starts_with check. Apply it in both is_protected_orchestrator_path() and in the execute() guard.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c50d70a — added normalize_workspace_path() that strips ./empty segments and rejects ../ traversal. Both is_protected_orchestrator_path and is_orchestrator_code_path normalize before matching. Caller-level tests drive ToolDispatcher::dispatch() with all bypass vectors (dot segment, double slash, traversal). See also 37a67f3 which further refines traversal handling in requires_approval.

@@ -624,11 +627,67 @@ fn is_orchestrator_code_path(path: &str) -> bool {
path.starts_with(ORCHESTRATOR_PREFIX) && path.ends_with(".py")
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical Severity — Path normalization bypass in is_orchestrator_code_path()

Same path normalization vulnerability as is_protected_orchestrator_path()is_orchestrator_code_path() only checks starts_with(ORCHESTRATOR_PREFIX) AND ends_with(".py") on raw input.

A crafted path like engine/knowledge/../orchestrator/v99.py bypasses this check and skips syntax validation while still being written to the orchestrator directory by the workspace layer.

Combined with the is_protected_orchestrator_path() bypass, this creates a complete security bypass: write invalid/malicious Python to the orchestrator directory without approval and without syntax validation.

Suggested fix: Apply the same normalize_path() to inputs before checking is_orchestrator_code_path().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c50d70ais_orchestrator_code_path now normalizes via normalize_path() before matching. Matches both .system/engine/orchestrator/ and legacy engine/orchestrator/ prefixes. Dead equality checks removed in 37a67f3.

Comment thread src/bridge/store_adapter.rs Outdated
// Skip validation for system-internal docs (failure tracker, compiled-in seed).
let is_system_internal = doc.title == ORCHESTRATOR_FAILURES_TITLE
|| doc
.metadata
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Forgeable is_system_internal metadata allows privilege escalation

The is_system_internal check relies on doc.metadata.get("source").is_some_and(|s| s == "compiled_in") to whitelist system writes. Since MemoryDoc.metadata is a serde_json::Value that the caller constructs, an LLM-authored tool call could craft a doc with metadata: {"source": "compiled_in"} and bypass both the self-modify gate (when disabled) and syntax validation (when enabled). The metadata field is not authenticated or integrity-checked.

Suggested fix: Do not trust metadata.source for authorization decisions. Use a separate internal-only flag or check the call stack/context (e.g., a CallerContext enum that distinguishes system vs. LLM-originated writes).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c50d70a — replaced metadata-based exemption with a tokio::task_local! trusted-write scope (ironclaw_engine::runtime::with_trusted_internal_writes). The task-local cannot be set from outside a trusted callsite and does not propagate across tokio::spawn. Test in internal_write.rs asserts the no-propagation property.

// Stamp a content hash for audit trail on all protected docs.
if is_protected_orchestrator_doc(doc) {
use sha2::{Digest, Sha256};
let hash = format!("{:x}", Sha256::digest(doc.content.as_bytes()));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Content hash provides false integrity assurance

The content hash is stamped at write time but never verified on read. An attacker who can modify the workspace file directly (bypassing save_memory_doc) can change the .py content without invalidating the hash.

Suggested fix: Either verify the hash on load in load_knowledge_docs() / synthesize_orchestrator_doc_from_py(), or document explicitly that the hash is write-time-only audit trail (not integrity enforcement).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in c50d70a — documented explicitly in save_memory_doc that the hash is write-time audit only, not a runtime integrity check. The workspace file is the trust boundary; anyone with workspace access can edit files directly.

doc_type: DocType::Note,
title: ORCHESTRATOR_MAIN_TITLE.to_string(),
content: content.to_string(),
source_thread_id: None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Rehydration uses project_id = nil, docs invisible after restart

synthesize_orchestrator_doc_from_py() uses project_id: ProjectId(uuid::Uuid::nil()). If list_memory_docs() filters by the real project_id (lines 1276-1284), synthesized orchestrator docs will be invisible to project-scoped queries after restart.

This was reported twice by @serrrfirat (2026-04-06 and 2026-04-09) and remains unaddressed.

Suggested fix: Verify that the orchestrator loading path uses list_shared_memory_docs() or equivalent that does not filter by project_id. If it does filter, pass the actual project_id through a sideband (e.g., store it in a .meta.json alongside the .py file).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c50d70a — overrode HybridStore::list_shared_memory_docs to surface physically-global docs (orchestrator, failures, prompt overlay) regardless of stored project_id. Further strengthened in 37a67f3: save_memory_doc normalizes project_id to nil for global docs at write time so they surface immediately from any project query, not just after restart.

Comment thread src/tools/builtin/memory.rs Outdated
// can pause for approval rather than hard-denying the call.
let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
if is_protected_orchestrator_path(target) {
let self_modify_enabled = std::env::var("ORCHESTRATOR_SELF_MODIFY")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — Environment variable check for security gate is fragile

requires_approval() reads ORCHESTRATOR_SELF_MODIFY from the environment on every call. Environment variables are global mutable state and can be modified by any thread. If the process runs untrusted code in-process (which Monty does), a race condition or future Monty escape could flip this env var.

More importantly, the env var check is duplicated in three places (requires_approval, execute in memory.rs, save_memory_doc in store_adapter.rs) with identical but copy-pasted logic. Any future divergence creates a bypass.

Suggested fix: Extract the env var check into a single shared function (e.g., fn is_orchestrator_self_modify_enabled() -> bool). Consider caching the value at startup rather than reading it per-call.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c50d70a — centralized in ironclaw_engine::runtime::self_modify_enabled(): a process-wide OnceLock<bool> seeded once on first read. Tool, store, engine loop, and self-improvement mission all share the same snapshot. SelfModifyTestGuard provides serialized dev-build override (compiled out of release builds).

}
let code_owned = code.to_string();
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
MontyRun::new(code_owned, "validate.py", vec![])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Syntax-only validation may provide false security confidence

validate_python_syntax() checks syntax only — it does not check semantic validity. Syntactically valid but semantically dangerous patterns (e.g., exec(compile(...)), __import__('os')) pass validation.

The comment says "Runtime sandbox (Monty) handles all security enforcement — no blocklist here" which is correct, but for a security PR, this should be documented more prominently.

Suggested fix: Add a prominent comment explicitly stating the threat model: syntax validation prevents consuming failure-budget slots, NOT preventing malicious code. The sandbox is the security boundary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 37a67f3 — expanded the rustdoc on validate_python_syntax with a Threat model section: syntax validation prevents auto-rollback DoS, NOT dangerous code execution. All security enforcement happens in the Monty sandbox at runtime.

if doc.title.starts_with("orchestrator:")
&& doc.title != ORCHESTRATOR_FAILURES_TITLE
&& let Err(reason) = ironclaw_engine::executor::validate_python_syntax(&doc.content)
{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — validate_orchestrator_content() skips "prompt:" titled docs

is_protected_orchestrator_doc() checks both "orchestrator:" and "prompt:" titles, but validate_orchestrator_content() at line 677 only validates "orchestrator:" docs. A malicious prompt overlay (title "prompt:codeact_preamble") is stamped with a content hash but never syntax-validated.

If the engine ever supports Python prompt overlays, this gap would become a vulnerability.

Suggested fix: Either validate prompt overlays too (if they can contain code), or add a comment explicitly documenting that prompt overlays are assumed to be non-executable content.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 37a67f3 — added a detailed comment explaining that prompt:* docs are markdown text, not Python code. Validating them as Python would reject every prompt overlay. If the engine ever supports Python prompt overlays, the validator must be updated.

ilblackdragon and others added 4 commits April 12, 2026 12:26
…rity-hardening

# Conflicts:
#	crates/ironclaw_engine/src/executor/scripting.rs
…ys-approval gate, global doc visibility

Critical: MemoryWriteTool::execute() now validates Python syntax for
protected .py paths before writing to workspace (was bypassing the
Store-level validator entirely).

High: ApprovalRequirement::Always now produces GatePaused(Approval)
with allow_always=false instead of hard LeaseDenied; memory_write
returns Always (not UnlessAutoApproved) for protected paths so
session auto-approve cannot silently skip the gate.

High: Global docs (orchestrator, failures, prompt overlay) have
project_id normalized to nil on save so they surface immediately
from any project query, not just after restart.

Medium: Traversal paths no longer trigger spurious approval gates —
requires_approval returns Never for normalization failures so
execute() rejects them immediately as InvalidParameters.

Medium: Dead code removed from is_orchestrator_code_path (equality
checks unreachable after .py suffix requirement).

Medium: Document prompt overlay validation skip and syntax validation
threat model; expanded validate_python_syntax test coverage (size
cap, empty input, unicode, error format).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d 0.8.5

Wasmtime was upgraded and no longer triggers RUSTSEC-2025-0046,
RUSTSEC-2025-0118, RUSTSEC-2026-0020, RUSTSEC-2026-0021. The stale
ignores caused cargo-deny to fail with "advisory was not encountered".

Added RUSTSEC-2026-0097 (rand 0.8.5 unsound aliased mutable ref in
ThreadRng during reseed from custom logger) — transitive dep via
monty/wasmtime; upgrade to rand 0.9+ tracked separately.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 12, 2026 06:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/builtin/memory.rs Outdated
Comment on lines +383 to +386
if let Some(canonical) = normalize_workspace_path(target) {
let protected = canonical.starts_with(".system/engine/orchestrator")
|| canonical.starts_with("engine/orchestrator");
if protected {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eff8a22 — added path-boundary checks (equality + trailing-slash starts_with) so orchestrator_backup/ and similar won't trigger the gate.

Comment thread src/bridge/effect_adapter.rs Outdated
Comment on lines +1879 to +1885
/// PR #1958 reviewer-flagged regression: the original code returned
/// `ApprovalRequirement::Always` for protected orchestrator targets,
/// which `EffectBridgeAdapter::execute_action` maps to `LeaseDenied`
/// (a hard refusal). The fix changed it to `UnlessAutoApproved`, which
/// must produce a resumable `GatePaused(Approval)`. This test asserts
/// the full path: `requires_approval` → adapter → gate, with the
/// real tool wired into a real registry.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in eff8a22 — updated the docstring to reflect the current Always → GatePaused(Approval { allow_always: false }) behavior.

ilblackdragon and others added 2 commits April 12, 2026 22:01
…tring

Address two Copilot review comments:
- requires_approval used raw starts_with on normalized path, matching
  unrelated paths like orchestrator_backup/. Added path-boundary checks.
- Updated stale docstring on the Always-approval gate test to reflect
  current behavior (Always→GatePaused, not UnlessAutoApproved).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 12, 2026 13:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/builtin/memory.rs Outdated
Comment on lines +509 to +531
let final_content = if is_patch_mode {
let old_str = params
.get("old_string")
.and_then(|v| v.as_str())
.unwrap_or("");
let new_str = params
.get("new_string")
.and_then(|v| v.as_str())
.unwrap_or("");
let replace_all_flag = params
.get("replace_all")
.and_then(|v| v.as_bool())
.unwrap_or(false);
let existing = workspace
.read(&resolved_path)
.await
.map(|d| d.content)
.unwrap_or_default();
if replace_all_flag {
existing.replace(old_str, new_str)
} else {
existing.replacen(old_str, new_str, 1)
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7a3bf5b — moved the syntax validation block after patch-mode parameter validation (empty old_string / missing new_string are rejected first). Also deduplicated the matching logic: requires_approval now delegates to is_protected_orchestrator_path instead of duplicating the prefix checks.

…on after param checks

- requires_approval now delegates to is_protected_orchestrator_path
  instead of duplicating the matching logic (reviewer concern about
  drift between the two checks)
- Syntax validation for protected .py paths moved after patch-mode
  parameter validation (empty old_string, missing new_string) so an
  empty-string replace can't create a huge intermediate string before
  being rejected

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one issue in the security hardening path:

Severity Category File:Line Finding Suggested Fix
High Security / approval bypass crates/ironclaw_engine/src/runtime/mission.rs:2348 The self-improvement structured JSON path still bypasses the new protected-write approval gate. When ORCHESTRATOR_SELF_MODIFY=true, process_self_improvement_output accepts LLM-authored prompt_additions, updates prompt:codeact_preamble, and calls store.save_memory_doc directly. That store path checks the self-modify flag and validates protected orchestrator code, but it cannot pause for human approval, so it never goes through MemoryWriteTool::requires_approval / EffectBridgeAdapter's Always gate. A self-improvement mission can therefore mutate a protected prompt overlay without the approval flow this PR is trying to enforce. Route this path through the same approval machinery, either by requiring the mission to use memory_write for prompt overlays or by adding a resumable pending-approval gate before the direct save. Add a caller-level test proving prompt_additions are not persisted while approval is pending, even with self-modify enabled.

I did not run tests; this was a review-only pass over the PR diff and the relevant call paths.

context.current_call_id.as_deref(),
parameters,
ironclaw_engine::ResumeKind::Approval {
allow_always: false,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High Severity — allow_always: false not enforced server-side

The PR sets allow_always: false in ResumeKind::Approval for orchestrator writes so the UI hides the "always approve" button. However, the router (src/bridge/router.rs ~line 1516) blindly passes the user-supplied always flag to auto_approve_tool() without checking pending.resume_kind.allow_always.

A malicious API call to the approval endpoint with always: true would permanently auto-approve memory_write for the session, silently bypassing the per-call approval gate on all subsequent orchestrator writes.

Suggested fix: In the router where GateResolution::Approved { always } is constructed, clamp: let always = always && matches!(pending.resume_kind, ResumeKind::Approval { allow_always: true }). Add a caller-level integration test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 20168104 (prior commit). The clamp_always_to_resume_kind helper in router.rs now clamps the caller-supplied always flag against pending.resume_kind.allow_always server-side. A crafted always: true on an Approval { allow_always: false } gate is clamped to false. Four unit tests cover all three ResumeKind variants. The staging merge (f259ff9f) preserved this helper over staging's inline match (functionally equivalent, but ours has dedicated tests).

}
let code_owned = code.to_string();
match std::panic::catch_unwind(std::panic::AssertUnwindSafe(|| {
MontyRun::new(code_owned, "validate.py", vec![])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — validate_python_syntax creates full VM, not just parser

validate_python_syntax creates a full MontyRun for parsing. MontyRun::new() initializes the entire VM (allocator, runtime state, etc.) and may execute partial code before hitting a syntax error. This is heavier than a parse-only check. The 256KB size cap mitigates but doesn't eliminate resource risk.

Suggested fix: If Monty exposes a parse-only API, use it instead. If not, document that MontyRun::new() only parses (does not execute) so reviewers can verify. Consider a tighter size limit or a timeout.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 20168104. Expanded the validate_python_syntax rustdoc to document that MontyRun::new() is a parse-and-prepare step — it runs the Ruff-based parser to check syntax, then sets up runtime scaffolding (allocator handles, namespace stubs) without populating the heap or executing module-level statements. The 256 KB MAX_ORCHESTRATOR_SOURCE_BYTES cap bounds parser allocation, not execution time. If Monty later exposes a dedicated parse-only entry point, we can switch; for now the resource exposure from the current approach is bounded by the size cap.

Comment thread src/bridge/store_adapter.rs Outdated
.and_then(|v| v.as_str())
.is_some_and(|s| s == "compiled_in");
if !is_system_internal {
let trusted = ironclaw_engine::runtime::is_trusted_internal_write_active()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — orchestrator:failures title exemption bypasses gate from any code path

The line let trusted = is_trusted_internal_write_active() || doc.title == ORCHESTRATOR_FAILURES_TITLE; means any code path that calls save_memory_doc with title orchestrator:failures will bypass the self-modify gate — including potentially LLM-reachable paths through the self-improvement mission or engine code that calls the store directly.

Impact is limited to corrupting the failure count (not executable code), but it could trigger or prevent auto-rollback.

Suggested fix: Move the exemption inside with_trusted_internal_writes scope at the call sites (record_orchestrator_failure, reset_orchestrator_failures) instead of hardcoding it in the gate predicate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 20168104. Removed the doc.title == ORCHESTRATOR_FAILURES_TITLE shortcut from the store-adapter self-modify gate entirely. The two legitimate callers (record_orchestrator_failure, reset_orchestrator_failures) now wrap their save_memory_doc calls in with_trusted_internal_writes. Additionally, untrusted writes to the failures title are now rejected regardless of self-modify state, since no LLM-reachable code path should ever persist the system-internal tracker.

///
/// Mirrors `normalize_workspace_path` in `tools::builtin::memory` — kept
/// local here so the store adapter has no dependency on the tool layer.
fn normalize_path(path: &str) -> Option<String> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Duplicate path normalization on security boundary

Two independent copies of the same path normalization logic exist: normalize_workspace_path in src/tools/builtin/memory.rs (line 64) and normalize_path here in store_adapter.rs. If one is updated and the other is not, a bypass path reopens. Both are on the same security boundary.

Suggested fix: Extract into a shared utility module (e.g., crate::util::normalize_workspace_path), or at minimum add a cross-reference test that asserts both produce identical results for a canonical set of inputs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 20168104. Added a parity test (normalize_path_and_normalize_workspace_path_agree_on_canonical_inputs) that asserts both normalizers produce identical results on a canonical input set including traversal attempts, double slashes, dot segments, and boundary cases. Shared extraction isn't clean across the bridge/tool boundary without introducing a shared crate dependency, so the parity test is the lighter guard against drift on this security boundary.

// execute() rejects them immediately as InvalidParameters rather
// than triggering a spurious approval gate before the rejection.
let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
if !self_modify_enabled() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium Severity — Defense-in-depth gap in requires_approval

requires_approval returns Never when self-modify is disabled, relying solely on execute() to reject. This means the effect bridge will not pause the action — it proceeds directly to execute(). If execute() is ever refactored and the self-modify check is accidentally removed, orchestrator writes slip through without approval.

Suggested fix: Consider returning ApprovalRequirement::Always when is_protected_orchestrator_path(target) regardless of self-modify state, or document why the single execute() gate is sufficient with a cross-reference to the exact rejection line.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 20168104. Tightened the commentary in requires_approval with an explicit cross-reference to the execute() rejection site (~line 444) and a load-bearing invariant warning: "Any refactor that moves or weakens that execute() rejection must also flip this branch to Always so the approval dialog becomes the backstop." This makes the dependency relationship between the two gates explicit in the source, so a future refactor that weakens execute() will be forced to update this branch.

ilblackdragon and others added 3 commits April 14, 2026 16:01
5 issues from serrrfirat review (2026-04-13):

1. High — clamp `always` in `resolve_gate` against `pending.resume_kind`
   so a caller-supplied `always: true` can no longer install a session-
   wide auto-approval on an `Approval { allow_always: false }` gate
   (orchestrator self-modify writes). Extracted to `clamp_always_to_
   resume_kind` with unit tests covering all three ResumeKind variants.

2. Medium — expand `validate_python_syntax` rustdoc to document that
   `MontyRun::new()` parses and prepares only (no heap/namespaces, no
   module-level execution) and explain the 256 KB size cap as a bound
   on parser allocation, not an execution-time safeguard.

3. Medium — remove the `doc.title == ORCHESTRATOR_FAILURES_TITLE`
   shortcut from the store-adapter self-modify gate. The two legitimate
   callers (`record_orchestrator_failure`, `reset_orchestrator_failures`)
   now wrap their `save_memory_doc` in `with_trusted_internal_writes`.
   Additionally reject untrusted writes to the failures title regardless
   of self-modify state, since no LLM-reachable code path should ever
   persist the system-internal tracker.

4. Medium — add a parity test asserting `normalize_path` (store
   adapter) and `normalize_workspace_path` (memory tool) agree on a
   canonical input set. Shared extraction isn't clean across the
   bridge/tool boundary; the test is the lighter guard against drift
   on this security boundary.

5. Medium — tighten `MemoryWriteTool::requires_approval` commentary
   with an explicit cross-reference to the `execute()` rejection site
   (~line 444) and a load-bearing invariant warning so a future
   refactor that weakens `execute()` will also be forced to flip this
   branch to `Always`.

Also collapse a clippy::collapsible_if that appeared in the traversal
gate check.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rity-hardening

Conflict resolutions:
- deny.toml: drop duplicate RUSTSEC-2026-0097 (already present with
  detailed comment from PR #2370)
- router.rs: keep our clamp_always_to_resume_kind helper with tests
  (functionally equivalent to staging's inline match, but tested)
- store_adapter.rs: take staging's yaml_quoted_escape and debug logging
  for nil-project/legacy-user frontmatter; keep both list_shared_memory_docs
  (our security override) and list_memory_docs_by_owner (new trait method)

Also bump telegram channel registry version (0.2.8→0.2.9) to satisfy
pre-commit version check after staging's source changes.
Copilot AI review requested due to automatic review settings April 17, 2026 15:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +365 to +402
fn requires_approval(&self, params: &serde_json::Value) -> ApprovalRequirement {
// When orchestrator self-modification is enabled, writing to protected
// paths (orchestrator code, prompt overlays) requires explicit human
// approval on every call. Uses `Always` (not `UnlessAutoApproved`) so
// session auto-approve cannot silently skip the gate — the effect
// bridge maps `Always` to `GatePaused(Approval { allow_always: false })`.
//
// When self-modify is disabled we return `Never` deliberately: the
// approval gate would be cosmetic because `execute()` below rejects
// the write with `NotAuthorized` *before* any persistence or patch
// computation (see the `is_protected_orchestrator_path(target) &&
// !self_modify_enabled()` branch later in this file, ~line 444).
// Returning `Always` here would pop an approval dialog only to hard-
// deny immediately afterward — wasted UX, not extra security.
// **Invariant**: that `execute()` rejection is the load-bearing gate
// in the disabled case. Any refactor that moves or weakens it must
// also flip this branch to `Always` so the approval dialog becomes
// the backstop.
//
// Delegates to is_protected_orchestrator_path for consistency, but
// excludes traversal attempts (`..`) — those return Never here so
// execute() rejects them immediately as InvalidParameters rather
// than triggering a spurious approval gate before the rejection.
let target = params.get("target").and_then(|v| v.as_str()).unwrap_or("");
if !self_modify_enabled() {
return ApprovalRequirement::Never;
}
// Traversal attempts: normalization fails → execute() rejects.
// Don't gate-pause for something that will be rejected anyway.
if !target.starts_with("orchestrator:")
&& !target.starts_with("prompt:")
&& normalize_workspace_path(target).is_none()
{
return ApprovalRequirement::Never;
}
if is_protected_orchestrator_path(target) {
return ApprovalRequirement::Always;
}
Comment on lines +2477 to +2499
let synthetic_lease = CapabilityLease {
id: LeaseId::new(),
thread_id: ThreadId::new(),
capability_name: "memory".into(),
granted_actions: GrantedActions::All,
granted_at: chrono::Utc::now(),
expires_at: None,
max_uses: None,
uses_remaining: None,
revoked: false,
revoked_reason: None,
};

let context = ThreadExecutionContext {
thread_id: ThreadId::new(),
thread_type: ThreadType::Mission,
project_id,
user_id: doc.user_id.clone(),
step_id: StepId::new(),
current_call_id: None,
source_channel: None,
user_timezone: None,
};
@ilblackdragon ilblackdragon merged commit e65ba2e into staging Apr 17, 2026
19 checks passed
@ilblackdragon ilblackdragon deleted the fix/engine-v2-security-hardening branch April 17, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: tool/builtin Built-in tools size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants