perf: fix multi-tenant inference latency (per-conversation locking + workspace indexing)#2127
Conversation
Engine state files under engine/.runtime/ are execution state blobs, not semantic documents — they should never trigger FTS/vector chunking. This eliminates the chunk INSERT burst that floods the DB connection pool under multi-tenant load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ConversationManager::handle_user_message() was holding write locks on conversations and channel_user_index maps while calling save_conversation(), which triggers slow workspace writes. Under multi-tenant load this fully serialized all concurrent tenants through this critical section. Now only in-memory HashMap mutations happen under the write locks; the DB persist call happens after locks are released. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
At 10 concurrent tenants, the previous default of 10 connections is insufficient — workspace chunk INSERTs alone can exhaust the pool, causing persist failures and cold-loads on subsequent turns. Configurable via DATABASE_POOL_SIZE env var. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the global RwLock<HashMap<ConversationId, ConversationSurface>> with RwLock<HashMap<ConversationId, Arc<Mutex<ConversationSurface>>>>. The global RwLock is now a directory only — held briefly for HashMap lookups and inserts. The tokio::sync::Mutex is per-conversation — held for the full duration of a message including async DB calls, but only blocks operations on the same conversation. A single get_conversation_lock() helper encapsulates the two-step access pattern. All methods use this one entry point — the lock ordering invariant (global RwLock released before per-conv Mutex acquired) is enforced in one place. Fixes: - Cross-tenant serialization: different conversations now run fully in parallel - TOCTOU double-spawn race in handle_user_message (per-conv Mutex held continuously from active_thread check through spawn+track) - record_thread_outcome 7s post-execution gap - clear_conversation and get_or_create_conversation global lock contention Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C1: record_thread_outcome + clear_conversation now propagate Err when conversation is not in memory instead of silently returning Ok. W1: user ConversationEntry added after thread operation succeeds, not before — prevents orphaned entries on async failure paths. W2: document lock acquisition order (conversations before channel_user_index) to prevent future deadlocks. W3: bootstrap_user skips conversations already in memory — prevents concurrent bootstraps orphaning existing Arc holders. W4: document known limitation in record_thread_outcome where in-memory mutations are not rolled back on save_conversation failure. W5: document list_conversations best-effort snapshot semantics. Tests: concurrent handle_user_message test (T1) and record_thread_outcome with unknown conversation_id test (T4). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C2: is_engine_runtime_path() now covers engine/projects/ and engine/orchestrator/failures.json — these machine-generated blobs were still triggering FTS/vector chunking despite the runtime path guard. Semantic content (engine/knowledge/, orchestrator scripts) still indexed. C3: DocumentMetadata for runtime paths now sets skip_versioning: Some(true) — prevents O(steps x tenants) version rows under multi-tenant load. Tests: T2 verifies projects/ skip indexing + knowledge/ still indexed. T3 verifies runtime path writes produce zero version rows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reflects the new default in src/config/database.rs. Operators copying .env.example as a starting point would otherwise explicitly pin to the old value of 10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a granular locking strategy for the ConversationManager by wrapping individual conversations in Arc<Mutex<...>>, enabling parallel processing of messages across different conversations. It also optimizes database performance by increasing the default connection pool size and excluding machine-generated runtime state files from indexing and versioning. A performance concern was raised regarding list_conversations, which currently iterates over all conversations to filter by user ID; utilizing the channel_user_index for pre-filtering is suggested to reduce lock contention and improve efficiency in multi-tenant environments.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Reduces multi-tenant inference latency by removing expensive workspace indexing from engine runtime-state writes and by eliminating cross-tenant serialization in the in-memory conversation router via per-conversation locking.
Changes:
- Skip FTS/vector indexing and versioning for machine-generated engine runtime-state paths on
Workspace::write(). - Replace a global conversations
RwLockwith per-conversationMutexes inConversationManagerto allow parallel turns across conversations and eliminate double-spawn races. - Increase the default
DATABASE_POOL_SIZE(10 → 30) and update.env.example.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/workspace/mod.rs |
Adds engine runtime path classification and short-circuits write-time metadata/versioning/indexing for those paths; adds tests for indexing/version suppression. |
crates/ironclaw_engine/src/runtime/conversation.rs |
Introduces per-conversation mutex locking and refactors message routing/outcome recording; adds concurrency-focused tests. |
src/config/database.rs |
Raises default DB pool size to reduce contention under load. |
.env.example |
Documents the new default pool size and deployment guidance. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- list_conversations: use channel_user_index to pre-filter by user_id
before acquiring per-conv locks — O(matched) instead of O(all)
(gemini-code-assist suggestion)
- is_engine_runtime_path: add !path.contains('..') guard to prevent
path traversal bypass (e.g. engine/.runtime/../knowledge/foo.md)
(Copilot)
- bootstrap_user: return count of newly inserted conversations rather
than total from store — accurate after idempotency guard (Copilot)
- handle_user_message: defer entries clone to None branch only —
inject/resume fast paths no longer pay the O(entries) allocation
(Copilot)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the review @gemini-code-assist! The |
|
That sounds great, @henrypark133. Reducing the lock scope to O(user's conversations) is a significant improvement for multi-tenant performance. The implementation details look solid, and the added tests should help prevent regressions in this area. Thanks for the update! |
- bootstrap_user: upsert channel_user_index even when conversation is already in convs — index may be stale after a prior rollback in get_or_create_conversation - is_engine_runtime_path: add engine/README.md (auto-generated, regenerated at engine-turn frequency, should not accumulate versions) - is_engine_runtime_path: clarify normalize_path() does not resolve '..' so the guard is load-bearing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Before this PR, engine runtime paths were indexed normally and accumulated chunks in memory_chunks. The early-exit guard prevents new chunks but left pre-existing ones polluting FTS/vector search results and consuming storage indefinitely. Now delete_chunks() is called on every runtime path write — a no-op once the document has no chunks, a one-time cleanup on the first write after deploying this change. Fail-open so chunk deletion can't block state writes. Raised by Copilot in PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem
Multi-tenant conversational turns were taking ~20s E2E for messages that required only ~1.7s of actual LLM inference. Trace analysis revealed two bottlenecks in the bridge layer:
bridge::routerblocked before launching the Python orchestratorthread execution finishedbefore loggingcompletedRoot cause: every engine state write triggered full FTS/vector document chunking via
workspace.write(), flooding the shared DB connection pool (default: 10 connections) with chunk INSERTs. This cascaded into lock contention inConversationManagerwhere a globalRwLockwas held across slow DB writes — serializing all concurrent tenants.Changes
Fix 1 — Skip workspace indexing for engine runtime state
Engine execution state files (
engine/.runtime/,engine/projects/,engine/orchestrator/failures.json,engine/README.md) are machine-generated blobs, not semantic documents. They no longer triggerreindex_document_with_metadata(), version row creation, or FTS/vector indexing on write. Thenormalize_path()does not resolve..segments — an explicit!path.contains("..")guard prevents path traversal bypass.Fix 2 — Per-conversation mutex in
ConversationManagerReplaces
RwLock<HashMap<ConversationId, ConversationSurface>>withRwLock<HashMap<ConversationId, Arc<Mutex<ConversationSurface>>>>. The globalRwLockis now a directory only (held for brief HashMap lookups). A singleget_conversation_lock()helper encapsulates the two-step access pattern and enforces the lock ordering invariant in one place.handle_user_messagerecord_thread_outcome7s post-execution gaprecord_thread_outcomeandclear_conversationnow propagateErrwhen conversation not in memory (previously silently returnedOk)bootstrap_userupsertschannel_user_indexeven for already-present conversations (repairs stale index entries after rollback)list_conversationspre-filters viachannel_user_indexbefore acquiring per-conv locks — O(matched) not O(all)entries_snapshotclone deferred toNonebranch only (inject/resume paths skip the allocation)Fix 3 —
DATABASE_POOL_SIZEdefault 10→30Immediate pool headroom while structural fixes take effect. Configurable via env var.
.env.exampleupdated.Tests
concurrent_handle_user_message_spawns_one_thread— two concurrent messages to same conversation; asserts exactly one thread spawned (validates TOCTOU fix)record_thread_outcome_unknown_conv_returns_err— validates error propagation fixengine_projects_path_skips_indexing_but_knowledge_does_not— validates workspace fix coverageruntime_path_writes_produce_no_versions— validates versioning suppression267 tests passing, 0 failures, 0 clippy warnings
Known limitations (documented in code)
list_conversationsis a best-effort snapshot, not point-in-time consistent (accepted trade-off of two-phase collect)get_or_create_conversationhas a narrow concurrent-creation race where a caller that races between insert and rollback may hold a transiently unpersistedConversationId(documented with comment)record_thread_outcomein-memory mutations are not rolled back ifsave_conversationfails (accepted, documented)clear_engine_conversationinrouter.rshas a pre-existing TOCTOU where threads spawned concurrently may be cleared without being stopped (pre-existing, out of scope)