Skip to content

feat(workspace): metadata-driven indexing/hygiene, document versioning, and patch#1723

Merged
ilblackdragon merged 20 commits intostagingfrom
feat/workspace-metadata-versioning-patch
Apr 2, 2026
Merged

feat(workspace): metadata-driven indexing/hygiene, document versioning, and patch#1723
ilblackdragon merged 20 commits intostagingfrom
feat/workspace-metadata-versioning-patch

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented Mar 28, 2026

Summary

Foundation for the extensible frontend system. Makes workspace behavior metadata-driven instead of hardcoded, adds document versioning for rollback, and adds patch (search-and-replace) support.

  • Metadata-driven indexing: skip_indexing flag on documents/folders prevents chunking and embedding (for frontend assets, configs). Resolution chain: document metadata → nearest parent .config → defaults.
  • Metadata-driven hygiene: Replaces hardcoded daily/ and conversations/ cleanup with .config document discovery. Folders opt into hygiene explicitly. .config files and identity documents are never deleted.
  • Document versioning: Every write(), append(), and patch() auto-saves previous content. memory_read gains version and list_versions params. Hash-based dedup prevents duplicate versions.
  • Workspace patch: memory_write gains old_string/new_string/replace_all params for targeted edits without rewriting entire files.
  • Metadata on documents: memory_write gains metadata param to set flags like skip_indexing, hygiene, skip_versioning. Applied BEFORE the write so flags take effect immediately.
  • V15 migration: memory_document_versions table + GIN metadata index for both PostgreSQL and libSQL.

Test plan

  • cargo check — all feature combos (default, libsql, all-features)
  • cargo clippy --all --all-features — zero warnings
  • cargo test --lib — 3793 passed, 0 failed
  • Flaky test fix — hygiene tests avoid global AtomicBool contention
  • Integration tests with PostgreSQL (cargo test --features integration)
  • Manual: fresh workspace seeds daily/.config and conversations/.config

🤖 Generated with Claude Code

…g, and patch support

Foundation for the extensible frontend system. Workspace documents now
support metadata flags (skip_indexing, skip_versioning, hygiene config)
via folder-level .config documents and per-file overrides, replacing
hardcoded hygiene targets and indexing behavior.

Key changes:
- DocumentMetadata type with resolution chain (doc → folder .config → defaults)
- Document versioning: auto-saves previous content on write/append/patch
- Workspace patch: search-and-replace editing via memory_write tool
- Hygiene rewrite: discovers cleanup targets from .config metadata
  instead of hardcoded daily/ and conversations/ directories
- memory_read gains version/list_versions params
- memory_write gains metadata/old_string/new_string/replace_all params
- V14 migration adds memory_document_versions table (both PG + libSQL)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools scope: db Database trait / abstraction scope: db/postgres PostgreSQL backend scope: db/libsql libSQL / Turso backend scope: workspace Persistent memory / workspace size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements document versioning and a metadata-driven memory hygiene system. It introduces a new table for document versions, adds metadata support to workspace documents, and enhances the memory tools with patch mode and version history access. The hygiene system is refactored to use .config files for discovering cleanup targets. Feedback focuses on resolving a race condition in the libSQL version saving implementation, refining the memory write tool's parameter requirements for patch mode, and optimizing metadata updates by utilizing returned document objects instead of performing redundant reads.

Comment on lines +912 to +962
async fn save_version(
&self,
document_id: Uuid,
content: &str,
content_hash: &str,
changed_by: Option<&str>,
) -> Result<i32, WorkspaceError> {
let conn = self
.connect()
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: e.to_string(),
})?;
let id = Uuid::new_v4().to_string();
let doc_id = document_id.to_string();
let now = fmt_ts(&Utc::now());

// Get next version number
let mut rows = conn
.query(
"SELECT COALESCE(MAX(version), 0) + 1 FROM memory_document_versions WHERE document_id = ?1",
params![doc_id.clone()],
)
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to get next version number: {e}"),
})?;

let next_version = if let Some(row) = rows.next().await.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to read version number: {e}"),
})? {
get_i64(&row, 0) as i32
} else {
1
};

conn.execute(
r#"
INSERT INTO memory_document_versions
(id, document_id, version, content, content_hash, created_at, changed_by)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)
"#,
params![id, doc_id, next_version as i64, content, content_hash, now, changed_by],
)
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to save version: {e}"),
})?;

Ok(next_version)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential race condition in save_version. The next version number is fetched in a separate SELECT query before the INSERT. If two requests execute this concurrently for the same document, they might fetch the same next version number, and one of the INSERT statements will fail due to the unique constraint on (document_id, version).

The PostgreSQL implementation avoids this by using a single atomic INSERT ... SELECT statement. To make this more robust for libSQL, consider wrapping the SELECT and INSERT in a transaction that acquires a write lock (e.g., BEGIN IMMEDIATE TRANSACTION in SQLite). This would serialize writes to the same document, preventing the race condition.

References
  1. To prevent race conditions in get-or-create operations in SQLite/libSQL, use BEGIN IMMEDIATE transactions to serialize concurrent writers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 71eadc6save_version is now wrapped in a transaction with SELECT 1 FROM memory_documents WHERE id = $1 FOR UPDATE to lock the parent row before computing MAX(version).

Comment thread src/tools/builtin/memory.rs Outdated
"default": false
}
},
"required": ["content"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The content parameter is marked as required, but it's not used when the tool is in patch mode (i.e., when old_string is provided). This forces the LLM to provide a value for content that is then ignored, which can be confusing.

Consider making content optional in the schema by removing it from the required array. You can then programmatically enforce its presence inside the execute method only when not in patch mode (i.e., in the else block after checking for old_string).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the schema now has "required": [] (empty), and the runtime enforces that either content or old_string+new_string must be provided.

Comment thread src/tools/builtin/memory.rs Outdated
Comment on lines +493 to +503
// Apply metadata if provided (after write/append, works for all targets).
if let Some(meta) = params.get("metadata")
&& meta.is_object()
{
// Read the document to get its ID
if let Ok(doc) = workspace.read(&resolved_path).await
&& let Err(e) = workspace.update_metadata(doc.id, meta).await
{
tracing::warn!(path = %resolved_path, "failed to update metadata: {e}");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for updating metadata after a write/append operation is inefficient because it re-reads the document from the database just to get its ID.

The workspace.write() method already returns the MemoryDocument, so you can get the ID from its return value. You could modify the various append methods to also return the document to avoid this extra database query.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9200e9b — metadata is now applied BEFORE the write/patch via get_or_create + update_metadata, so skip_indexing/skip_versioning take effect for the same operation.

ilblackdragon and others added 2 commits March 28, 2026 12:12
…ting

- Wrap libSQL save_version in a transaction to prevent race condition
  where concurrent writers could allocate the same version number
- Make content optional in memory_write when in patch mode (old_string
  present) — LLM no longer forced to provide unused content param
- Improve metadata update error handling with explicit match arms
- Run cargo fmt across all files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…adata-versioning-patch

# Conflicts:
#	src/agent/heartbeat.rs
#	src/db/libsql_migrations.rs
Copy link
Copy Markdown
Member Author

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three review comments addressed in cb4cf12:

  1. Race condition in libSQL save_version (high) — Wrapped SELECT + INSERT in a transaction (conn.transaction().await + tx.commit()) to serialize concurrent writers and prevent duplicate version numbers.

  2. content required in patch mode (medium) — Made content optional in the schema (required: []). Now enforced programmatically only when not in patch mode (old_string absent).

  3. Re-reading doc for metadata update (medium) — Improved error handling with explicit match arms. The extra read is kept for now as it's a hot read right after the write (same connection/cache), and avoids refactoring all append methods to return MemoryDocument.

…g, descriptions

1. Resolve metadata once per write: write(), append(), and patch() now
   call resolve_metadata() once and pass the result to both
   maybe_save_version() and reindex_document_with_metadata(), cutting
   redundant DB queries from 3-5 per write down to 1 resolution.

2. Optimize version hash check: replaced get_latest_version_number() +
   get_version() (2 queries) with list_versions(id, 1) (1 query) for
   the duplicate-hash check in maybe_save_version().

3. Wire up version_keep_count: hygiene passes now prune old versions
   for documents in cleaned directories, enforcing the configured
   version_keep_count (default: 50). Removes the TODO comment.

4. Fix misleading tool description: patch mode works with any target
   including 'memory', not just custom paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(workspace): metadata-driven indexing/hygiene, document versioning, and patch

Overall this is a well-designed set of interconnected features. The metadata resolution chain, strong types, and dual-backend support are solid. Two must-fix issues before merge.


Must-fix (2)

1. PostgreSQL save_version race condition

The INSERT in repository.rs uses SELECT COALESCE(MAX(version), 0) + 1 without a transaction or FOR UPDATE. Concurrent writes to the same document can compute the same version number -- one succeeds, the other gets a UNIQUE violation that surfaces as the misleading WorkspaceError::SearchFailed. The libSQL implementation correctly uses a transaction; the PostgreSQL one should too. Options: wrap in a transaction with SELECT ... FOR UPDATE, use a retry loop on unique violation, or use ON CONFLICT.

2. Identity document protection removed from hygiene

The old is_identity_path() safety check that protected MEMORY.md, SOUL.md, IDENTITY.md, etc. from deletion is gone. The new code only has is_config_path() protection. If a .config with hygiene.enabled: true is placed on a directory containing identity documents, they'll be deleted. The old safety net needs to be preserved in cleanup_directory().


Should-fix (6)

  1. resolve_metadata() is O(depth) serial DB queries -- walks up the directory tree with one get_document_by_path per ancestor. Runs on every write(), append(), and patch(). Consider using the existing find_config_documents() (single query) and finding the nearest ancestor in-memory.

  2. memory_write schema now has "required": [] -- both content and old_string+new_string can be absent. Should express the invariant that at least one mode must be provided, either via anyOf/oneOf in the schema or an explicit validation check with a clear error.

  3. Misleading GIN index comment in V15 migration -- says "used by hygiene to find .config documents" but the actual query uses path LIKE '%/.config', not a JSON path query. The index isn't used by current queries.

  4. WorkspaceError::SearchFailed used as catch-all -- metadata update failures, version save failures, and connection failures all map to SearchFailed in both backends. Consider more specific variants or a generic DatabaseError.

  5. No unit tests for patch() method -- needs tests for: replace_all=true, old_string not found, patch on system prompt file with injection attempt, patch producing empty document.

  6. No unit tests for maybe_save_version -- needs tests for: hash-based dedup (identical content skipped), skip_versioning metadata flag, empty content skip.


Nice-to-have (6)

  1. let _ = silently discards versioning failures in write()/append()/patch() -- add a comment documenting the fail-open design choice.
  2. Silent metadata parse fallback in DocumentMetadata::from_value() via unwrap_or_default() -- add tracing::warn! on parse failure to aid debugging.
  3. Duplicate doc comment on reindex_document_with_metadata.
  4. LLM can write metadata: {"hygiene": {"retention_days": 0}} via the tool with no minimum validation -- could cause next hygiene pass to delete everything.
  5. changed_by field on versions is always None -- reduces audit trail value. Consider passing "agent" or the tool name.
  6. Hygiene version pruning is O(n) queries per document in a directory -- consider a bulk prune query.

Positive notes

  • Metadata resolution chain (document -> nearest parent .config -> defaults) is well-designed
  • DocumentMetadata with #[serde(flatten)] for forward compatibility is a good pattern
  • Strong types throughout (DocumentVersion, VersionSummary, PatchResult), no .unwrap() in production code
  • Both PostgreSQL and libSQL backends provided with equivalent migrations
  • Features are genuinely interconnected -- scope is justified as a single PR
  • Good unit test coverage for metadata parsing/merging/round-tripping and hygiene reports

ilblackdragon and others added 3 commits March 28, 2026 23:39
1. changed_by now populated: all write paths pass self.user_id as the
   changed_by field in version records instead of None, so version
   history shows who made each change.

2. Layer write/append versioned: write_to_layer() and append_to_layer()
   now auto-version and use metadata-optimized reindexing, matching
   the standard write()/append() paths.

3. append_memory versioned: MEMORY.md appends now auto-version with
   metadata-driven skip and shared metadata resolution.

4. Remove unused reindex_document wrapper: all callers now use
   reindex_document_with_metadata directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iene

26 new tests covering critical and high-priority gaps:

document.rs (7 unit tests):
- is_config_path edge cases (foo.config, empty string, .config/bar)
- content_sha256 with empty string (known SHA-256 constant)
- content_sha256 with unicode (multi-byte UTF-8)
- DocumentMetadata merge: null overlay, nested hygiene replaced wholesale,
  both empty, non-object base

memory.rs (2 schema tests):
- memory_write schema includes patch/metadata params, content not required
- memory_read schema includes version/list_versions params

hygiene.rs (5 integration tests):
- No .config docs → no cleanup happens
- .config with hygiene disabled → directory skipped
- Multiple dirs with different retention (fast=0, slow=9999)
- Documents newer than retention not deleted
- Version pruning during hygiene (keep_count=2, verify pruned)

workspace/mod.rs (14 integration tests):
- write creates version with correct hash and changed_by
- Identical writes deduplicated (hash check)
- Append versions pre-append content
- Patch: single replacement, replace_all, not-found error, creates version
- Patch with unicode characters
- Patch with empty replacement string
- resolve_metadata: no config (defaults), inherits from folder .config,
  document overrides .config, nearest ancestor wins
- skip_versioning via .config prevents version creation

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion, perf

Must-fix:
1. PostgreSQL save_version now uses a transaction with SELECT FOR UPDATE
   to prevent concurrent writers from allocating the same version number,
   matching the libSQL implementation.

2. Restore identity document protection in hygiene cleanup_directory().
   MEMORY.md, SOUL.md, IDENTITY.md, etc. are now protected from deletion
   regardless of which directory they appear in, via is_identity_document()
   case-insensitive check. This restores the safety net that was removed
   when migrating from hardcoded to metadata-driven hygiene.

Should-fix:
3. resolve_metadata() now uses find_config_documents (single query) +
   in-memory nearest-ancestor lookup, instead of O(depth) serial DB
   queries walking up the directory tree.

4. memory_write validates that at least one mode is provided (content
   for write/append, or old_string+new_string for patch) with a clear
   error message upfront, instead of relying on downstream empty checks.

5. Fixed misleading GIN index comment in V15 migration.

9. Added "Fail-open: versioning failures must not block writes" comments
   to all `let _ = self.maybe_save_version(...)` call sites.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member Author

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmanian Thanks for the thorough review. All must-fix and most should-fix items addressed in 71eadc6:

Must-fix (2/2 done)

  1. PostgreSQL save_version race condition — Now wrapped in a transaction with SELECT ... FOR UPDATE to lock existing version rows, serializing concurrent inserts. Matches the libSQL transaction approach.

  2. Identity document protection restored — Added is_identity_document() check (case-insensitive) back into cleanup_directory(). MEMORY.md, SOUL.md, IDENTITY.md, etc. are now protected from deletion regardless of directory, even if a misconfigured .config enables hygiene there.

Should-fix (5/6 done)

  1. resolve_metadata() O(depth) → O(1) DB queries — Now uses find_config_documents() (single query) + in-memory find_nearest_config() helper instead of walking up ancestors with serial DB queries.

  2. memory_write mode validation — Validates upfront that at least one mode is provided (content for write/append, or old_string+new_string for patch) with a clear error message.

  3. GIN index comment fixed — Updated to accurately state it's for future containment queries, not current LIKE queries.

7/8. Tests — 26 new tests added in the previous commit covering patch(), maybe_save_version, resolve_metadata chain, hygiene edge cases, and version pruning.

  1. Fail-open documented — All let _ = self.maybe_save_version(...) call sites now have explicit "Fail-open: versioning failures must not block writes" comments.

Deferred

  1. WorkspaceError::SearchFailed catch-all — Acknowledged. Follows existing codebase pattern (all workspace DB errors map to SearchFailed). Adding specific variants (MetadataUpdateFailed, VersionSaveFailed, etc.) would be a good follow-up but is a broader refactor touching both backends.

10-14. Nice-to-haves noted for follow-up. changed_by is now populated with self.user_id (fixed in a prior commit).

@ilblackdragon ilblackdragon requested a review from zmanian March 29, 2026 07:22
@ilblackdragon ilblackdragon marked this pull request as ready for review March 29, 2026 07:24
Copilot AI review requested due to automatic review settings March 29, 2026 07:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR lays groundwork for an extensible, metadata-driven workspace system by introducing .config-based metadata inheritance, document version history for rollback/auditing, and a patch (search/replace) write mode. It updates workspace hygiene to discover cleanup targets via metadata rather than hardcoded folder names, and adds DB schema support for version storage across PostgreSQL and libSQL.

Changes:

  • Add metadata inheritance via nearest ancestor .config documents and use it to drive indexing/hygiene/versioning behavior.
  • Introduce document versioning APIs/storage and wire auto-versioning into write/append/patch flows.
  • Add patch mode and new tool parameters (old_string/new_string/replace_all, metadata, version, list_versions) plus hygiene config changes.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/workspace/document.rs Adds typed metadata overlay, .config helpers, and versioning DTOs/hash helper.
src/workspace/mod.rs Implements metadata resolution, auto-versioning, patch API, skip_indexing behavior, and config seeding.
src/workspace/repository.rs Adds PostgreSQL repo methods for metadata updates, config discovery, and version CRUD/pruning.
src/db/mod.rs Extends WorkspaceStore trait with metadata + versioning methods.
src/db/postgres.rs Wires new WorkspaceStore metadata/versioning methods through Pg backend.
src/db/libsql/workspace.rs Implements libSQL metadata/versioning methods.
src/db/libsql_migrations.rs Adds libSQL incremental migration for document versions table/indexes.
migrations/V15__document_versions.sql Adds PostgreSQL migration for memory_document_versions and related indexes.
src/workspace/hygiene.rs Rewrites hygiene to discover directories via .config metadata and prune versions.
src/config/hygiene.rs Updates env-configured hygiene settings to use version_keep_count.
src/tools/builtin/memory.rs Adds patch mode, metadata support, and version read/list options to tools.
src/error.rs Adds VersionNotFound and PatchFailed workspace errors.
src/agent/heartbeat.rs Updates hygiene reporting fields and adjusts multi-user heartbeat behavior.
tests/workspace_versioning_integration.rs Adds libSQL integration tests for metadata, patch, versioning, and skip_indexing/versioning.
tests/e2e_routine_heartbeat.rs Updates tests for new hygiene config shape.
.env.example Updates documented hygiene env vars to reflect new configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +281 to +290
let has_content = params
.get("content")
.and_then(|v| v.as_str())
.is_some_and(|s| !s.is_empty());
if !is_patch_mode && !has_content {
return Err(ToolError::InvalidParameters(
"Either 'content' (for write/append) or 'old_string'+'new_string' (for patch) is required".to_string(),
));
}
let content = params.get("content").and_then(|v| v.as_str()).unwrap_or("");
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-patch writes no longer reject whitespace-only content: has_content checks only !s.is_empty() and the previous content.trim().is_empty() guard was removed. If the intent is to keep rejecting “empty” writes, restore the trim-based validation for write/append mode (while still allowing patch mode).

Suggested change
let has_content = params
.get("content")
.and_then(|v| v.as_str())
.is_some_and(|s| !s.is_empty());
if !is_patch_mode && !has_content {
return Err(ToolError::InvalidParameters(
"Either 'content' (for write/append) or 'old_string'+'new_string' (for patch) is required".to_string(),
));
}
let content = params.get("content").and_then(|v| v.as_str()).unwrap_or("");
let content = params.get("content").and_then(|v| v.as_str()).unwrap_or("");
let has_content = !content.trim().is_empty();
if !is_patch_mode && !has_content {
return Err(ToolError::InvalidParameters(
"Either 'content' (for write/append) or 'old_string'+'new_string' (for patch) is required".to_string(),
));
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — has_content now uses .is_some_and(|s| !s.trim().is_empty()), so whitespace-only strings are treated as empty.

Comment on lines +763 to +787
// Use a transaction to prevent concurrent writers from allocating
// the same version number. The SELECT FOR UPDATE locks the existing
// version rows for this document, serializing concurrent inserts.
let tx = conn
.transaction()
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to start transaction: {e}"),
})?;

let row = tx
.query_one(
r#"
SELECT COALESCE(MAX(version), 0) + 1 AS next_version
FROM memory_document_versions
WHERE document_id = $1
FOR UPDATE
"#,
&[&document_id],
)
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to get next version number: {e}"),
})?;
let next_version: i32 = row.get(0);
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_version relies on SELECT COALESCE(MAX(version),0)+1 ... FOR UPDATE to serialize version allocation. If a document has no existing version rows, nothing is locked and concurrent writers can both compute the same next_version, causing a unique-constraint violation (and version loss). Consider locking the parent memory_documents row (SELECT ... FOR UPDATE), using an advisory lock keyed by document_id, or retrying on unique-violation to guarantee monotonic version allocation.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of above — fixed in 71eadc6 with transaction + FOR UPDATE on parent row.

Comment thread src/db/libsql/workspace.rs Outdated
Comment on lines +934 to +964
// Use a transaction to prevent race conditions: the SELECT and INSERT
// must be atomic so concurrent writers don't allocate the same version.
let tx = conn
.transaction()
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to start transaction: {e}"),
})?;

// Get next version number (inside transaction — serializes writers)
let mut rows = tx
.query(
"SELECT COALESCE(MAX(version), 0) + 1 FROM memory_document_versions WHERE document_id = ?1",
params![doc_id.clone()],
)
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to get next version number: {e}"),
})?;

let next_version = if let Some(row) =
rows.next()
.await
.map_err(|e| WorkspaceError::SearchFailed {
reason: format!("Failed to read version number: {e}"),
})? {
get_i64(&row, 0) as i32
} else {
1
};
drop(rows);
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In libSQL/SQLite, the transaction here doesn’t guarantee that concurrent writers won’t both read the same MAX(version) before either inserts, which can lead to UNIQUE(document_id, version) conflicts and dropped versions. Consider using BEGIN IMMEDIATE/write-lock semantics if available via the driver, or add a retry loop on unique-constraint failure to re-select MAX(version) and re-insert.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cb4cf12 — wrapped in a transaction (conn.transaction().await). SQLite's WAL mode serializes writers at the engine level, so the transaction is sufficient.

Comment thread src/workspace/mod.rs
Comment on lines +1001 to +1008
// Resolve metadata once — shared by versioning and indexing.
let metadata = self.resolve_metadata(&path).await;

// Auto-version previous content before overwriting.
// Fail-open: versioning failures must not block writes.
let _ = self
.maybe_save_version(doc.id, &doc.content, &metadata, Some(&self.user_id))
.await;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-versioning runs even when a write is a no-op (new content equals current content), which can create an extra version containing identical content and bloat the versions table. Consider short-circuiting in write() when doc.content == content (and similarly in layer writes) so versioning/indexing only happens when content actually changes.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — write short-circuits when content is identical (if doc.content == content { return Ok(doc); }), and maybe_save_version deduplicates via SHA-256 hash comparison against the latest version.

Comment on lines +1 to +5
#![cfg(feature = "libsql")]
//! Integration tests for workspace versioning, metadata resolution, and patch features.
//!
//! Uses libSQL in-memory or file-backed backend. Gate: `cargo test --features libsql`.

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning/patch/metadata resolution tests appear duplicated between this new integration test file and the versioning_tests module added in src/workspace/mod.rs. Keeping one canonical suite (unit-style or integration-style) will reduce runtime and maintenance burden and avoid divergence.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate tests/workspace_versioning_integration.rs file was consolidated into the inline versioning_tests module in src/workspace/mod.rs. No duplicate tests remain.

Comment on lines +1 to +4
-- Document version history for workspace files.
-- Every content update saves the previous content as a version,
-- enabling rollback and audit trails.

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions a “V14 migration”, but the added migration is V15__document_versions.sql (and libSQL incremental migration version is 15). Please align the PR description and/or migration numbering so they match.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 660c046 — migration renumbered to V16 (PG) and V17 (libSQL) to come after staging's deployed migrations. Convention documented in src/db/CLAUDE.md and .claude/rules/database.md.

Comment thread src/workspace/hygiene.rs
Comment on lines 9 to 12
//! A global [`AtomicBool`] guard prevents concurrent hygiene passes, which
//! avoids TOCTOU races on the state file and Windows file-locking errors
//! (OS error 1224) when multiple heartbeat ticks fire before the first
//! pass completes.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module-level doc states a single global AtomicBool prevents concurrent hygiene passes. In multi-user deployments this effectively serializes hygiene across all workspaces and, combined with a shared state file under a common state_dir, can cause one user’s cadence/run to suppress hygiene for others. Consider scoping the guard/state to the workspace (e.g., per-user key) so hygiene runs independently per user.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue — the AtomicBool is a module-level static, not a struct field. The single-pass guard is by design for this use case; per-workspace locking would add complexity without proportional benefit since hygiene runs are infrequent.

Comment thread src/agent/heartbeat.rs Outdated
Comment on lines +593 to +609
// Run memory hygiene per user (same as single-user heartbeat).
let hygiene_ws = Arc::clone(&workspace);
let hygiene_cfg = hygiene_config.clone();
let hygiene_user = user_id.clone();
tokio::spawn(async move {
let report =
crate::workspace::hygiene::run_if_due(&hygiene_ws, &hygiene_cfg).await;
if report.had_work() {
tracing::info!(
user_id = hygiene_user,
directories_cleaned = ?report.directories_cleaned,
versions_pruned = report.versions_pruned,
"multi-user heartbeat: memory hygiene deleted stale documents"
);
}
});

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In multi-user mode hygiene is spawned twice per user: once via an untracked tokio::spawn and again inside the JoinSet task. This duplicates work, bypasses the concurrency cap, and can cause most runs to immediately skip due to the global hygiene guard/cadence. Remove the untracked spawn (or the in-JoinSet run) so hygiene executes exactly once per user per tick and remains bounded.

Suggested change
// Run memory hygiene per user (same as single-user heartbeat).
let hygiene_ws = Arc::clone(&workspace);
let hygiene_cfg = hygiene_config.clone();
let hygiene_user = user_id.clone();
tokio::spawn(async move {
let report =
crate::workspace::hygiene::run_if_due(&hygiene_ws, &hygiene_cfg).await;
if report.had_work() {
tracing::info!(
user_id = hygiene_user,
directories_cleaned = ?report.directories_cleaned,
versions_pruned = report.versions_pruned,
"multi-user heartbeat: memory hygiene deleted stale documents"
);
}
});

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue — these are two separate code paths: single-user mode (HeartbeatRunner::run) and multi-user mode (spawn_multi_user_heartbeat). Only one path executes per deployment. The global AtomicBool guard also prevents concurrent execution even if both were somehow reached.

Comment thread src/workspace/mod.rs
Comment on lines +941 to +953
if !doc.content.contains(old_string) {
return Err(WorkspaceError::PatchFailed {
path,
reason: "old_string not found in document".to_string(),
});
}

let (new_content, count) = if replace_all {
let count = doc.content.matches(old_string).count();
(doc.content.replace(old_string, new_string), count)
} else {
(doc.content.replacen(old_string, new_string, 1), 1)
};
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch() allows an empty old_string. In Rust, "...".contains("") is always true and matches("").count() can be very large, leading to pathological replacement behavior and potentially huge allocations (DoS). Reject empty old_string up front with a clear error.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f5084e7 — both the tool (memory.rs) and Workspace::patch() reject empty old_string with an explicit error.

Comment on lines +355 to +373
// Patch mode: if old_string is provided, do search-and-replace instead of write/append.
let old_string = params.get("old_string").and_then(|v| v.as_str());
if let Some(old_str) = old_string {
let new_str = params
.get("new_string")
.and_then(|v| v.as_str())
.ok_or_else(|| {
ToolError::InvalidParameters(
"new_string is required when old_string is provided".to_string(),
)
})?;
let replace_all = params
.get("replace_all")
.and_then(|v| v.as_bool())
.unwrap_or(false);

let result = workspace
.patch(&resolved_path, old_str, new_str, replace_all)
.await
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patch mode accepts old_string even when it is empty, which can trigger pathological behavior (every boundary matches) and huge allocations. Validate that old_string is non-empty before calling workspace.patch.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of above — fixed in f5084e7.

ilblackdragon added a commit that referenced this pull request Apr 6, 2026
Add a `schema` field to `DocumentMetadata` that enables automatic content
validation on workspace writes. When a document or its folder `.config`
carries a JSON Schema, all write operations (write, append, patch,
write_to_layer, append_to_layer) validate content against it before
persisting. This is the foundation for typed system state (settings,
extension configs, skill manifests) stored as workspace documents.

Builds on the metadata infrastructure from #1723 — schema is inherited
via the existing `.config` chain (folder → document → defaults).

Refs: #640, #1894, #1937

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 6, 2026
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…g, and patch (nearai#1723)

* feat(workspace): metadata-driven indexing/hygiene, document versioning, and patch support

Foundation for the extensible frontend system. Workspace documents now
support metadata flags (skip_indexing, skip_versioning, hygiene config)
via folder-level .config documents and per-file overrides, replacing
hardcoded hygiene targets and indexing behavior.

Key changes:
- DocumentMetadata type with resolution chain (doc → folder .config → defaults)
- Document versioning: auto-saves previous content on write/append/patch
- Workspace patch: search-and-replace editing via memory_write tool
- Hygiene rewrite: discovers cleanup targets from .config metadata
  instead of hardcoded daily/ and conversations/ directories
- memory_read gains version/list_versions params
- memory_write gains metadata/old_string/new_string/replace_all params
- V14 migration adds memory_document_versions table (both PG + libSQL)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review feedback — transaction safety, patch mode, formatting

- Wrap libSQL save_version in a transaction to prevent race condition
  where concurrent writers could allocate the same version number
- Make content optional in memory_write when in patch mode (old_string
  present) — LLM no longer forced to provide unused content param
- Improve metadata update error handling with explicit match arms
- Run cargo fmt across all files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address review findings — write-path performance, version pruning, descriptions

1. Resolve metadata once per write: write(), append(), and patch() now
   call resolve_metadata() once and pass the result to both
   maybe_save_version() and reindex_document_with_metadata(), cutting
   redundant DB queries from 3-5 per write down to 1 resolution.

2. Optimize version hash check: replaced get_latest_version_number() +
   get_version() (2 queries) with list_versions(id, 1) (1 query) for
   the duplicate-hash check in maybe_save_version().

3. Wire up version_keep_count: hygiene passes now prune old versions
   for documents in cleaned directories, enforcing the configured
   version_keep_count (default: 50). Removes the TODO comment.

4. Fix misleading tool description: patch mode works with any target
   including 'memory', not just custom paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: wire remaining unwired components — changed_by, layer versioning

1. changed_by now populated: all write paths pass self.user_id as the
   changed_by field in version records instead of None, so version
   history shows who made each change.

2. Layer write/append versioned: write_to_layer() and append_to_layer()
   now auto-version and use metadata-optimized reindexing, matching
   the standard write()/append() paths.

3. append_memory versioned: MEMORY.md appends now auto-version with
   metadata-driven skip and shared metadata resolution.

4. Remove unused reindex_document wrapper: all callers now use
   reindex_document_with_metadata directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comprehensive coverage for versioning, metadata, patch, and hygiene

26 new tests covering critical and high-priority gaps:

document.rs (7 unit tests):
- is_config_path edge cases (foo.config, empty string, .config/bar)
- content_sha256 with empty string (known SHA-256 constant)
- content_sha256 with unicode (multi-byte UTF-8)
- DocumentMetadata merge: null overlay, nested hygiene replaced wholesale,
  both empty, non-object base

memory.rs (2 schema tests):
- memory_write schema includes patch/metadata params, content not required
- memory_read schema includes version/list_versions params

hygiene.rs (5 integration tests):
- No .config docs → no cleanup happens
- .config with hygiene disabled → directory skipped
- Multiple dirs with different retention (fast=0, slow=9999)
- Documents newer than retention not deleted
- Version pruning during hygiene (keep_count=2, verify pruned)

workspace/mod.rs (14 integration tests):
- write creates version with correct hash and changed_by
- Identical writes deduplicated (hash check)
- Append versions pre-append content
- Patch: single replacement, replace_all, not-found error, creates version
- Patch with unicode characters
- Patch with empty replacement string
- resolve_metadata: no config (defaults), inherits from folder .config,
  document overrides .config, nearest ancestor wins
- skip_versioning via .config prevents version creation

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address zmanian review — PG transaction safety, identity protection, perf

Must-fix:
1. PostgreSQL save_version now uses a transaction with SELECT FOR UPDATE
   to prevent concurrent writers from allocating the same version number,
   matching the libSQL implementation.

2. Restore identity document protection in hygiene cleanup_directory().
   MEMORY.md, SOUL.md, IDENTITY.md, etc. are now protected from deletion
   regardless of which directory they appear in, via is_identity_document()
   case-insensitive check. This restores the safety net that was removed
   when migrating from hardcoded to metadata-driven hygiene.

Should-fix:
3. resolve_metadata() now uses find_config_documents (single query) +
   in-memory nearest-ancestor lookup, instead of O(depth) serial DB
   queries walking up the directory tree.

4. memory_write validates that at least one mode is provided (content
   for write/append, or old_string+new_string for patch) with a clear
   error message upfront, instead of relying on downstream empty checks.

5. Fixed misleading GIN index comment in V15 migration.

9. Added "Fail-open: versioning failures must not block writes" comments
   to all `let _ = self.maybe_save_version(...)` call sites.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

* fix: address Copilot review — DoS prevention, no-op skip, duplicate hygiene

Security:
- Reject empty old_string in both workspace.patch() and memory_write
  tool to prevent pathological .matches("") behavior (DoS vector)

Correctness:
- Remove duplicate hygiene spawn in multi-user heartbeat — was running
  both via untracked tokio::spawn AND inside the JoinSet, causing
  double work and immediate skip via global AtomicBool guard
- Disallow layer param in patch mode — patch always targets the
  default workspace scope; combining with layer could silently patch
  the wrong document
- Restore trim-based whitespace rejection for non-patch content
  validation (was broken when refactoring required fields)

Performance:
- Short-circuit write() when content is identical to current content,
  skipping versioning, update, and reindex entirely
- Normalize path once at start of resolve_metadata instead of only
  for config lookup (prevents missed document metadata on unnormalized
  paths)

Cleanup:
- Remove duplicate tests/workspace_versioning_integration.rs (same
  tests already exist in workspace/mod.rs versioning_tests module)

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: eliminate flaky hygiene tests caused by global AtomicBool contention

All hygiene tests that used run_if_due() were flaky when running
concurrently because they competed for the global RUNNING AtomicBool
guard. Rewrote them to test the underlying components directly:

- metadata_driven_cleanup_discovers_directories: now uses
  find_config_documents() + cleanup_directory() directly
- multiple_directories_with_different_retention: now uses
  cleanup_directory() per directory directly
- cleanup_respects_cadence: rewritten as a sync unit test that
  validates state file + timestamp logic without touching the
  global guard

Verified stable across 3 consecutive runs (3793 tests, 0 failures).

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review comments — metadata ordering, PG locking, hygiene safety

1. Metadata applied BEFORE write/patch (nearai#10-11,15): metadata param is
   now set via get_or_create + update_metadata before the write/patch
   call, so skip_indexing/skip_versioning take effect for the same
   operation instead of only subsequent ones.

2. Layer write doc ID (nearai#13-14): metadata no longer re-reads after write
   since it's applied upfront. Removes the stale-scope risk.

3. Version param overflow (nearai#16): validates version is 1..i32::MAX
   before casting, returns InvalidParameters on out-of-range.

4. Hygiene protection list (nearai#18): added HYGIENE_PROTECTED_PATHS that
   includes MEMORY.md, HEARTBEAT.md, README.md (missing from
   IDENTITY_PATHS). cleanup_directory now uses is_protected_document()
   which checks both lists with case-insensitive matching.

5. PG FOR UPDATE on empty table (nearai#22-24): now locks the parent
   memory_documents row (SELECT 1 FROM memory_documents WHERE id=$1
   FOR UPDATE) before computing MAX(version), which works even when
   no version rows exist yet.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review comments — metadata merge, retention guard, migration ordering

1. **Metadata merge in memory_write tool**: incoming metadata is now
   merged with existing document metadata via `DocumentMetadata::merge()`
   instead of full replacement, so setting `{hygiene: {enabled: true}}`
   no longer silently drops a previously-set `skip_versioning: true`.

2. **Minimum retention_days**: `HygieneMetadata.retention_days` is now
   clamped to a minimum of 1 day during deserialization, preventing an
   LLM from writing `retention_days: 0` and causing mass-deletion on
   the next hygiene pass.

3. **Migration version ordering**: renumbered document_versions migration
   to come after staging's already-deployed migrations (PG: V15→V16,
   libSQL: 15→17). Documented the convention that new migrations must
   always be numbered after the highest version on staging/main.

4. **Duplicate doc comment**: removed duplicated line on
   `reindex_document_with_metadata`.

5. **HygieneSettings**: added `version_keep_count` field to persist
   the setting through the DB-first config resolution chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: skip metadata pre-apply when layer is specified, clean up stale comment

1. When a layer is specified, skip the metadata pre-apply via
   get_or_create — it operates on the primary scope and would create a
   ghost document there while the actual content write targets the
   layer's scope.

2. Removed stale "See review comments nearai#10-11,15" reference; the
   surrounding comment already explains the rationale.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use BEGIN IMMEDIATE for libSQL save_version to serialize writers

The default DEFERRED transaction only acquires a write lock at the first
write statement (INSERT), not at the SELECT. Two concurrent writers could
both read the same MAX(version) before either inserts, causing a UNIQUE
violation. BEGIN IMMEDIATE acquires the write lock upfront, matching the
existing pattern in conversations.rs.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document trust boundary on metadata/versioning WorkspaceStore methods

These methods accept bare document UUIDs without user_id checks at the
DB layer. The Workspace struct (the only caller) always obtains UUIDs
through user-scoped queries first. Document this trust boundary
explicitly on the trait so future implementors/callers know not to pass
unverified UUIDs from external input.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address new Copilot review comments — ghost doc, param validation, overflow

1. Skip metadata pre-apply in patch mode to avoid creating a ghost
   empty document via get_or_create when the document doesn't exist,
   which would change a "not found" error into "old_string not found".

2. Validate list_versions and version as mutually exclusive in
   memory_read to avoid ambiguous behavior (list_versions silently won).

3. Clamp version_keep_count to i32::MAX before casting to prevent
   overflow on extreme config values.

4. Mark daily_retention_days and conversation_retention_days as
   deprecated in HygieneSettings — retention is now per-folder via
   .config metadata.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: apply metadata in patch mode via read(), reorder libSQL migrations

1. Metadata is no longer silently ignored in patch mode — uses
   workspace.read() (which won't create ghost docs) instead of skipping
   entirely, so skip_versioning/skip_indexing flags take effect for
   patches on existing documents.

2. Reorder INCREMENTAL_MIGRATIONS to strictly ascending version order
   (16 before 17) to match iteration order in run_incremental().

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove duplicate is_patch_mode, add TODO comments for known limitations

- Remove duplicate `is_patch_mode` binding in memory_write (was
  computed at line 280 and again at line 368).
- Document multi-scope hygiene edge case: workspace.list() includes
  secondary scopes but workspace.delete() is primary-only, causing
  silent no-ops for cross-scope entries.
- Document O(n) reads in version pruning as acceptable for typical
  directory sizes.
- Add TODO on WorkspaceError::SearchFailed catch-all for future
  cleanup into more specific variants.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: reindex on no-op writes for metadata changes, use read_primary in patch

1. write() no longer fully short-circuits when content is unchanged —
   it still resolves metadata and reindexes so that metadata-driven
   flags (e.g. skip_indexing toggled via memory_write's metadata param)
   take effect immediately even without a content change.

2. Patch-mode metadata pre-apply now uses workspace.read_primary()
   instead of workspace.read() to ensure we target the same scope that
   patch() operates on, preventing cross-scope metadata mutation in
   multi-scope mode.

[skip-regression-check]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Apr 9, 2026
* feat(workspace): add JSON Schema validation to document metadata

Add a `schema` field to `DocumentMetadata` that enables automatic content
validation on workspace writes. When a document or its folder `.config`
carries a JSON Schema, all write operations (write, append, patch,
write_to_layer, append_to_layer) validate content against it before
persisting. This is the foundation for typed system state (settings,
extension configs, skill manifests) stored as workspace documents.

Builds on the metadata infrastructure from #1723 — schema is inherited
via the existing `.config` chain (folder → document → defaults).

Refs: #640, #1894, #1937

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tools): add channel-agnostic ToolDispatcher with audit trail

Introduce `ToolDispatcher` — a universal entry point for executing tools
from any caller (gateway, CLI, routine engine, WASM channels). Creates
lightweight system jobs for FK integrity, records ActionRecords, and
returns ToolOutput. This is a third entry point alongside v1's
Worker::execute_tool() and v2's EffectBridgeAdapter::execute_action().

DispatchSource::Channel(String) is intentionally string-typed — channels
are interchangeable extensions that can appear at runtime.

Also adds JobContext::system() factory and create_system_job() to both
PostgreSQL and libSQL backends.

Refs: #640

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(workspace): settings-as-workspace-documents with dual-write adapter

Add WorkspaceSettingsAdapter that implements SettingsStore by reading/
writing workspace documents at _system/settings/{key}.json. During
migration, dual-writes to both the legacy settings table and workspace.
Reads prefer workspace, falling back to the legacy table.

Known setting keys (llm_backend, selected_model, tool_permissions.*, etc.)
get JSON Schemas stored in document metadata — writes are validated
automatically by Phase 0's schema validation.

Also adds settings_schemas.rs with compile-time schema registry and
settings_path() helper.

Refs: #640, #1937

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(gateway): wire ToolDispatcher into GatewayState

Add tool_dispatcher field to GatewayState with with_tool_dispatcher()
builder method. Create and wire the dispatcher in main.rs when both
tool_registry and database are available. All 16 GatewayState
construction sites updated.

Per-handler migration (routing mutations through ToolDispatcher instead
of direct DB calls) is deferred to follow-up PRs — each handler has
complex ownership checks, cache refresh, and response types.

Refs: #640

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tools): add system introspection tools (tools_list, version)

Add SystemToolsListTool and SystemVersionTool as proper Tool
implementations that replace hardcoded /tools and /version commands.
Registered at startup via register_system_tools(). Available in both
v1 and v2 engines — no is_v1_only_tool filter to worry about.

Refs: #640

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(workspace): extension and skill state schemas and path helpers

Add workspace path helpers and JSON Schemas for storing extension configs,
extension state, and skill manifests under _system/extensions/ and
_system/skills/. This establishes the workspace document structure that
ExtensionManager and SkillRegistry will use as a durable persistence
backend (read-through cache pattern).

Runtime state (active MCP connections, WASM runtimes) stays in memory.
Only durable config and activation state moves to workspace documents.

Refs: #640, #1741

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address PR review feedback and CI failures

CI fixes:
- deny.toml: allow MIT-0 license required by jsonschema
- workspace/document.rs: #[allow(dead_code)] on system path constants
  pending follow-up phases that consume them
- workspace/settings_adapter.rs: remove unused chrono::Utc import
- workspace/settings_adapter.rs: collapse nested if into && form

Review fixes (gemini-code-assist):
- tools/dispatch.rs: await save_action directly instead of fire-and-forget
  tokio::spawn so short-lived CLI callers cannot drop audit records before
  they are persisted; surface errors via tracing::warn
- tools/dispatch.rs: remove DispatchSource::Agent variant — sequence_num=0
  with a reused job_id would violate UNIQUE(job_id, sequence_num). Agent
  callers must use Worker::execute_tool() which manages sequence numbers
  atomically against the agent's existing job
- workspace/settings_adapter.rs: validate content against the schema BEFORE
  the first workspace write so the initial document creation cannot bypass
  schema enforcement (subsequent writes are validated by the workspace
  resolved-metadata path established after the first write)

Refs: #2049

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: unify all machine state under .system/

Rename the workspace prefix from `_system/` to `.system/` (Unix dot-prefix
convention for hidden internal state) and migrate v2 engine state from
`engine/` to `.system/engine/` so all machine-managed state lives under
one root.

New layout:

  .system/
  ├── settings/         (per-user settings as workspace docs)
  ├── extensions/       (extension config + activation state)
  ├── skills/           (skill manifests)
  └── engine/
      ├── README.md     (auto-generated index)
      ├── knowledge/    (lessons, skills, summaries, specs, issues)
      ├── orchestrator/ (Python orchestrator versions, failures, overlays)
      ├── projects/     (project files + nested missions/)
      └── runtime/      (threads, steps, events, leases, conversations)

The inner `.runtime/` dot-prefix is dropped under `.system/engine/` since
`.system/` itself is the hidden marker; no double-hiding needed.

The `ENGINE_PREFIX` constant in `workspace::document::system_paths` is
declared as the canonical convention; bridge `store_adapter` continues
to define per-subdirectory constants below it for ergonomic interpolation.

No legacy migration code — pre-production rename.

Refs: #2049

* fix(pr-2049): security, correctness, and robustness fixes from review

Critical security:
- dispatch.rs: redact sensitive params before persisting ActionRecord
  (was leaking plaintext secrets into the audit log for tools with
  sensitive_params())
- settings_schemas.rs: validate settings keys against path traversal
  (reject /, \, .., leading ., empty, length > 128, non-alphanumeric);
  wire validation into all settings_adapter read/write/delete paths

Data correctness:
- history/store.rs + libsql/jobs.rs: write status as JobState::Completed
  .to_string() ('completed' snake_case) instead of 'Completed'; system
  jobs were round-tripping as Pending in parse_job_state()
- settings_adapter.rs: fix .system/.config metadata to set
  skip_versioning: false (was true) — descendants inherit this via
  find_nearest_config, so the previous value silently disabled
  versioning for ALL .system/** documents, contradicting the audit-
  trail intent
- workspace/mod.rs: add resolve_metadata_in_scope; use it in
  write_to_layer / append_to_layer so non-primary layer writes resolve
  schema/indexing/versioning from the target layer's .config chain
  instead of the primary user_id's. Also pass &scope (not &self.user_id)
  to maybe_save_version so versions are attributed to the correct scope

Pipeline parity:
- dispatch.rs: add SafetyLayer to ToolDispatcher; mirror Worker pipeline
  (prepare_tool_params -> validator -> redact -> timeout -> sanitize
  output) so dispatch path gets the same safety guarantees as the agent
  worker. Sanitized output is now stored in ActionRecord.output_sanitized
  instead of duplicating raw JSON

Robustness:
- settings_adapter.rs: propagate update_metadata errors in
  ensure_system_config and write_to_workspace (was silently ignored
  via let _ =, leaving schemas/skip_indexing unenforced)
- settings_adapter.rs: set_all_settings now collects the first workspace
  write error and returns it after the legacy write completes, so
  partial-migration state is observable
- settings_schemas.rs: rewrite llm_custom_providers schema to match
  CustomLlmProviderSettings (id/name/adapter/base_url/default_model/
  api_key/builtin instead of stale name/protocol/base_url/model)

Build:
- Cargo.toml: jsonschema with default-features = false to avoid pulling
  a second reqwest major version

Docs:
- db/mod.rs: docstring for create_system_job uses 'completed' snake_case
- workspace/document.rs: clarify .system/ versioning ("by default ARE
  versioned; individual files may opt out via skip_versioning")
- settings_adapter.rs: clarify per-key reads prefer workspace, aggregate
  reads stay on legacy during migration
- tools/builtin/system.rs: trim doc to match implemented scope
  (system_tools_list, system_version)
- channels/web/mod.rs: move stale 'sweep tasks managed by with_oauth'
  comment back to oauth_sweep_shutdown line

Refs: #2049

* docs+ci: enforce 'everything goes through tools' principle

Document the core design principle from #2049 in two places so future
contributors (human and AI) discover it during development:

- CLAUDE.md: new "Everything Goes Through Tools" section near the
  "Adding a New Channel" guide. Includes the rule, the rationale (audit
  trail, safety pipeline parity, channel-agnostic surface, agent
  parity), and a pointer to the detailed rule file.
- .claude/rules/tools.md: full pattern with required/forbidden examples,
  the list of layers that ARE exempt (Worker::execute_tool, v2
  EffectBridgeAdapter, tool implementations themselves, background
  engine jobs, read-aggregation queries), and how to annotate
  intentional exceptions. Also extends `paths` to cover
  src/channels/** and src/cli/** so it surfaces when those files are
  edited.

Enforce with a new pre-commit safety check (#7) in
scripts/pre-commit-safety.sh:

- Scans newly added lines under src/channels/web/handlers/*.rs and
  src/cli/*.rs for direct touches of state.{store, workspace,
  workspace_pool, extension_manager, skill_registry, session_manager}.
- Suppress with a trailing `// dispatch-exempt: <reason>` comment on
  the same line, matching the existing `// safety:` convention.
- Only checks added lines (`+` in the diff), so existing untouched
  handlers don't trip the check during incremental migration.

The check fires only for new code: handlers that haven't been migrated
yet (52 existing direct accesses across 12 handler files) won't break
unmodified, but any new line that bypasses the dispatcher will be
flagged at commit time.

Refs: #2049

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-2049): address Copilot review on workspace schema layer

- workspace::extension_state: extension/skill path helpers now reuse the
  canonical name validators (`canonicalize_extension_name`,
  `validate_skill_name`) instead of a weak `replace('/', "_")`. Names
  containing `..`, `\`, NUL, or other escapes are now rejected at the
  helper boundary, eliminating a path-traversal foothold for callers.
  Helpers return `Result<String, PathError>`. Regression tests added.

- workspace::settings_adapter::ensure_system_config: now idempotent across
  upgrades. If `.system/.config` already exists with stale metadata
  (e.g. an older `skip_versioning: true` from before fix #3042846635),
  it is repaired to the expected inherited values instead of being left
  silently broken. Regression test added.

- workspace::settings_adapter::write_to_workspace: lazily seeds
  `.system/.config` via a `OnceCell`, so callers no longer need to
  remember to invoke `ensure_system_config()` at startup before any
  setting write. Regression test added.

- workspace::settings_adapter::delete_setting: workspace delete failures
  are now logged via `tracing::warn!` instead of being silently dropped.
  We still don't propagate the error — the legacy table is the source of
  truth during migration and a stale workspace doc is recoverable on the
  next write — but partial-delete state is now observable.

- workspace::schema: documented why we don't cache compiled validators
  yet (settings/extension/skill writes are not a hot path; revisit if
  schema validation moves into a frequent write path).

[skip-regression-check] schema.rs change is doc-only.

* fix(pr-2049): address 4 remaining review issues

1. tool_dispatcher dropped during gateway startup
   src/channels/web/mod.rs: rebuild_state was initializing
   tool_dispatcher to None, so every subsequent with_* call zeroed
   the dispatcher the first caller injected. Preserve it across
   rebuild_state like every other field. Regression test:
   tool_dispatcher_survives_subsequent_with_calls.

2. WorkspaceSettingsAdapter not wired into runtime
   src/app.rs: Build the adapter in build_all() when workspace+db
   are both present, eagerly call ensure_system_config(), expose
   on AppComponents as settings_store, and thread it into
   init_extensions(...) so register_permission_tools and
   upgrade_tool_list receive it instead of the raw db.
   src/main.rs: SIGHUP handler prefers the adapter over raw db.
   src/workspace/mod.rs: re-export WorkspaceSettingsAdapter.

3. changed_by regression on layered writes
   src/workspace/mod.rs: write_to_layer and append_to_layer were
   passing the target layer's scope as changed_by, so version
   history attributed layered edits to the layer name instead of
   the actor. Pass self.user_id while keeping metadata resolution
   in the target scope. Regression test:
   layered_writes_record_actor_in_changed_by.

4. Legacy engine/ paths invisible after upgrade
   src/bridge/store_adapter.rs: Add migrate_legacy_engine_paths(),
   called at the start of load_state_from_workspace(), which scans
   list_all() for engine/... documents and rewrites them to
   .system/engine/... Idempotent: skips rewrites when the new path
   already exists, deletes the legacy duplicate either way. Three
   regression tests in #[cfg(all(test, feature = "libsql"))]
   module.

Quality gate: cargo fmt, cargo clippy --all --all-features zero
warnings, cargo test --all-features --lib 4313 passed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(e2e): use PUT for settings write in ownership test

test_settings_written_and_readable was sending POST /api/settings/{key}
but the route has been PUT since #4 (Feb 2026) — the test was returning
405 Method Not Allowed. Switch to httpx.put() so it matches the current
route registration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-2049): address second round of review feedback

Addresses the remaining unresolved PR #2049 review comments from
serrrfirat and ilblackdragon.

## Changes

### ToolDispatcher — integration coverage + log level
- src/tools/dispatch.rs: add two libsql-gated integration tests for
  the full dispatch pipeline: (a) persist an ActionRecord with
  sensitive params redacted in the audit row while the tool still
  sees the raw value, sanitized output populated; (b) honor the
  per-tool execution_timeout() and record a failure action.
- Tests use a raw-SQL helper to find system-category jobs since
  list_agent_jobs_for_user intentionally filters them out.
- Replace warn! with debug! on audit persistence failure — dispatch
  is reachable from interactive CLI/REPL sessions where warn!/info!
  output corrupts the terminal UI (CLAUDE.md Code Style → logging).

### WorkspaceSettingsAdapter — log level
- src/workspace/settings_adapter.rs: same warn! → debug! fix on the
  delete_setting workspace failure path, for the same REPL reason.

### Schema validation — surface all errors
- src/workspace/schema.rs: switch from jsonschema::validate to
  validator_for + iter_errors so users fixing a malformed setting
  see every violation in one round instead of playing whack-a-mole.
  Also distinguishes "invalid schema" from "invalid content" errors.
- Regression tests: multiple_errors_are_all_reported and
  invalid_schema_is_distinguished_from_invalid_content.

### create_system_job — started_at + row growth docs
- src/db/libsql/jobs.rs and src/history/store.rs: include started_at
  in the INSERT (set to the same instant as created_at/completed_at)
  so duration queries don't see NULL and "started but not completed"
  filters don't misclassify these rows. Fixed in both backends.
- Add doc comments on both impls warning about row growth per
  dispatch call. Deleting rows would violate "LLM data is never
  deleted" (CLAUDE.md); if listing-query performance becomes a
  concern, prefer a partial index (WHERE category != 'system') over
  deletion.

### Lib test repair
- src/channels/web/server.rs: extensions_setup_submit_handler Err
  branch now sets resp.activated = Some(false) so clients and the
  regression test see an explicit `false` rather than `null`. Also
  rename the test's fake channel to snake_case (test_failing_channel)
  so it matches the canonicalize-extension-names behavior from
  PR #2129 — previously the test was passing a dashed name and
  getting "Capabilities file not found" instead of the intended
  activation failure.

## Not addressed (false positive / deferred)
- dispatch.rs:177 output_raw/output_sanitized swap — verified against
  ActionRecord::succeed(Option<String>, Value, Duration) and the
  worker's call site at job.rs:704; argument order is correct.
- settings_adapter.rs:186 TOCTOU window — author self-classified as
  "Low / completeness" and no other code path writes to
  .system/settings/** without going through write_to_workspace.
- schema.rs recompilation caching — deferred per earlier review.

## Quality gate
- cargo fmt
- cargo clippy --all --benches --tests --examples --all-features
  zero warnings
- cargo test --all-features --lib: 4387 passed, 0 failed, 3 ignored

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-2049): address third round of review feedback

Addresses unresolved comments from serrrfirat's "Paranoid Architect
Review" and Copilot's third pass on the engine-state migration.

## src/workspace/settings_adapter.rs

### HIGH — Cross-tenant data leak through owner-scoped Workspace

`Workspace` is constructed for a single user_id at AppBuilder time.
Without gating, `set_setting("user_B", key, val)` would dual-write into
the **owner's** workspace, and a subsequent `user_A.get_setting(...)`
would return user_B's value: a real cross-user data leak.

Fix:
- Add `gate_user_id` field set to `workspace.user_id()` at construction.
- All `SettingsStore` methods that touch the workspace now check
  `workspace_allowed_for(user_id)` first; non-owner callers fall through
  to the legacy table only — preserving their pre-#2049 behavior.
- This matches the long-term plan: per-user settings live in the legacy
  table until a per-user `WorkspaceSettingsAdapter` (one per
  WorkspacePool entry) is wired up; admin/global settings go through
  the workspace-backed path so they pick up schema validation.

Regression test: `workspace_settings_are_owner_gated_in_multi_tenant_mode`
asserts (a) owner's workspace doc is not overwritten by a non-owner write,
(b) each user reads back their own legacy value, and (c) a non-owner with
no legacy entry must NOT see the owner's workspace value bleeding through.

### MEDIUM — Dual-write order

Reverse `set_setting` and `set_all_settings` to write legacy first,
workspace second. The legacy table is the source of truth during
migration (it backs aggregate `list_settings` reads), so writing it
first guarantees those readers always see a consistent value even if
the workspace write fails. Failed workspace writes are self-healing on
the next per-key read-miss.

### MEDIUM — `ensure_system_config_lazy` double-execution race

Replace the manual `get()`/`set()` pattern with
`OnceCell::get_or_try_init`. Two concurrent first-callers no longer
both run `ensure_system_config()`. Functionally equivalent (idempotent
either way) but no longer wasteful.

## src/bridge/store_adapter.rs

### MEDIUM — Migration drops document metadata (S3)

`migrate_legacy_engine_paths` previously copied only `doc.content`,
silently dropping the `metadata` column. Now calls
`ws.update_metadata(new_doc.id, &doc.metadata)` after each write to
preserve schema/skip_indexing/hygiene flags. Logged-not-fatal: content
has already been moved, metadata loss is recoverable.

Regression test: `migration_preserves_document_metadata` seeds a doc
with custom metadata and asserts it survives the rewrite.

### MEDIUM — `ws.exists()` swallowed transient errors (Copilot)

`unwrap_or(false)` on the existence check could cause the migrator to
overwrite an existing `.system/engine/...` doc when storage hiccups.
Now propagates the error (counts as failed step + `continue`), per
Copilot's exact suggested patch.

### LOW — `list_all()` runs every startup (Copilot)

Add a cheap preflight: `ws.list("engine")` first; only fall through to
the recursive `list_all()` discovery when the directory listing returns
at least one entry. Steady-state startups (post-migration) skip the
full workspace scan entirely.

Regression test: `migration_preflight_skips_full_scan_when_no_legacy_paths`
asserts unrelated and already-migrated documents are untouched.

### MEDIUM — Counter undercount on `already_present` (S5)

When `already_present` is true the legacy duplicate is still deleted,
but the previous code skipped the `migrated += 1` increment, undercounting
in debug logs. Fixed: `migrated` now counts every successful path
migration including the already-present case.

### Documented — Version-history loss is acceptable scope (C1)

Read-write-delete pattern means `memory_document_versions.document_id
ON DELETE CASCADE` drops the legacy doc's version chain. Documented in
the function-level doc comment as intentional + bounded:
- v2 engine state is runtime state (rewritten on every mutation), not
  user-curated data
- v2 was newly introduced in this PR — no production deployment with
  pre-existing curated history at risk
- A path-preserving rename op would need new trait methods on both
  backends; out of scope for fix-forward. If a future caller needs
  history-preserving rename, it should be added to the storage layer
  properly, not bolted onto migration.

## Quality gate
- cargo fmt
- cargo clippy --all --benches --tests --examples --all-features
  zero warnings
- cargo test --all-features --lib: 4390 passed, 0 failed, 3 ignored
  (+3 new tests on top of round 2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-2049): address fourth round of review feedback

Two latent issues flagged by serrrfirat in the latest review pass:

1. **Null schema permanently locks documents** (`src/workspace/schema.rs`).
   `serde_json` deserializes a metadata field of `"schema": null` as
   `Some(Value::Null)`, not `None`, so the upstream
   `if let Some(schema) = &metadata.schema` check passes through to
   `validate_content_against_schema`. There, `validator_for(Value::Null)`
   errors out and every subsequent write to that document is blocked — a
   latent DoS. Added an explicit `schema.is_null()` early-return guard at
   the top of the validator, plus a regression test
   (`null_schema_is_treated_as_no_op`) that asserts even non-JSON content
   passes when the schema is null.

2. **System job titles were raw source labels** (`src/history/store.rs`,
   `src/db/libsql/jobs.rs`). `create_system_job` set `title = source`,
   so any UI rendering `agent_jobs.title` would display dispatched
   system jobs as `channel:gateway` / `system` / etc. instead of a
   human-readable label. Both PostgreSQL and libSQL backends now write
   `format!("System: {source}")`. Updated the two dispatch integration
   tests that pinned the old format.

Schema-recompilation comment (`schema.rs:47`) was acknowledged as
"acceptable for now" by the reviewer; existing NOTE in the source
already documents the caching trade-off and upgrade path, so no code
change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-2049): address fifth round of review feedback

Eight comments from Copilot + serrrfirat. Real fixes for the load-bearing
gaps; doc clarifications for the rest where the existing behavior is
intentional.

**Real code changes**

- `src/tools/dispatch.rs` — enforce `tool.parameters_schema()` (JSON
  Schema) in the dispatch path. Previously the SafetyLayer validator only
  checked for injection patterns; channel/CLI/routine callers could pass
  arbitrary shapes and only discover the mismatch (or worse, silently
  malformed behavior) inside the tool itself. Now we run
  `jsonschema::validate(&tool.parameters_schema(), &normalized_params)`
  after the injection check, with a permissive-empty-schema fast path so
  tools that haven't yet declared a schema aren't penalised. Regression
  test `dispatch_rejects_params_violating_tool_schema` asserts a
  required-field violation is rejected before the tool is invoked.

- `src/workspace/settings_adapter.rs` — `write_to_workspace` now calls
  `schema_for_key(key)` once and reuses the resolved schema for both
  pre-write validation and post-write metadata persistence (was called
  twice). Eliminates duplicate work and removes a theoretical
  divergence window if the schema registry ever became non-deterministic.

- `src/workspace/settings_adapter.rs` — `ensure_system_config` now also
  rewrites the `.config` document content when its metadata is repaired,
  not just the metadata column. The metadata column is the inheritance
  source of truth, but having the doc's content silently diverge from it
  confuses anyone reading the doc directly to understand which inherited
  flags are active.

- `src/error.rs` + `src/workspace/settings_schemas.rs` — new
  `WorkspaceError::InvalidPath { path, reason }` variant. Path/key
  rejection (path-traversal, character set, length) now surfaces as
  `InvalidPath`, not `SchemaValidation` — callers and downstream UIs can
  distinguish "your settings *key* has bad characters" from "your
  settings *value* failed JSON-Schema validation" without string-matching
  error messages. `validate_settings_key` returns the new variant; the
  one match site in `settings_adapter.rs::write_to_workspace` is updated.
  Regression test `validate_settings_key_returns_invalid_path_variant`.

**Documentation-only fixes**

- `src/tools/dispatch.rs` — clarify in the `dispatch()` doc-comment that
  `sanitize_tool_output` runs only against the persisted ActionRecord
  payload, NOT against the value returned to the caller. This mirrors
  `Worker::execute_tool` (the agent loop also receives the raw output so
  reasoning can be reproduced from history). Channels that forward
  dispatcher output to end users must run their own boundary
  sanitization at the channel edge.

- `src/history/store.rs` + `src/db/libsql/jobs.rs` —
  `create_system_job` doc updated to explicitly state that system job
  timestamps do NOT reflect tool execution time (the row is INSERTed
  before the tool runs, with all three timestamps pinned to "now").
  Consumers that need execution duration must read
  `job_actions.duration_ms` for the associated action rows. Restructuring
  to a two-phase INSERT+UPDATE was rejected: the audit row must be
  durable even if the dispatcher panics mid-tool, and the second write
  would double per-dispatch DB cost.

- `src/workspace/schema.rs` — added baseline regression test
  `moderately_complex_schema_compiles_within_budget` that pins schema
  compile + validate latency for a moderately deep nested schema at
  <500ms wall-clock. Guards against orders-of-magnitude regressions
  from a future `jsonschema` upgrade or accidentally pathological
  schema construction. Hard limits on schema complexity are deferred
  (the real defense today is keeping schema-bearing paths under
  `.system/`, which is system-controlled).

**Acknowledged, no change**

- libSQL `create_system_job` unbounded row growth — already documented
  as intentional in the existing comment block, with the mitigation path
  spelled out (partial index on `WHERE category != 'system'` for listing
  queries). Rate-limiting dispatch would silently drop user-initiated
  actions, which is worse than unbounded retention. The "LLM data is
  never deleted" rule (CLAUDE.md) explicitly applies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ironclaw-ci ironclaw-ci bot mentioned this pull request Apr 10, 2026
@ironclaw-ci ironclaw-ci bot mentioned this pull request Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: config Configuration scope: db/libsql libSQL / Turso backend scope: db/postgres PostgreSQL backend scope: db Database trait / abstraction scope: docs Documentation scope: tool/builtin Built-in tools scope: workspace Persistent memory / workspace size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants