Skip to content

perf: fix multi-tenant inference latency (per-conversation locking + workspace indexing)#2127

Merged
henrypark133 merged 11 commits intostagingfrom
fix/multitenant-inference-latency-v2
Apr 8, 2026
Merged

perf: fix multi-tenant inference latency (per-conversation locking + workspace indexing)#2127
henrypark133 merged 11 commits intostagingfrom
fix/multitenant-inference-latency-v2

Conversation

@henrypark133
Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 commented Apr 7, 2026

Problem

Multi-tenant conversational turns were taking ~20s E2E for messages that required only ~1.7s of actual LLM inference. Trace analysis revealed two bottlenecks in the bridge layer:

  • 6s pre-LLM gap: bridge::router blocked before launching the Python orchestrator
  • 7s post-execution gap: bridge blocked after thread execution finished before logging completed

Root cause: every engine state write triggered full FTS/vector document chunking via workspace.write(), flooding the shared DB connection pool (default: 10 connections) with chunk INSERTs. This cascaded into lock contention in ConversationManager where a global RwLock was held across slow DB writes — serializing all concurrent tenants.

Changes

Fix 1 — Skip workspace indexing for engine runtime state

Engine execution state files (engine/.runtime/, engine/projects/, engine/orchestrator/failures.json, engine/README.md) are machine-generated blobs, not semantic documents. They no longer trigger reindex_document_with_metadata(), version row creation, or FTS/vector indexing on write. The normalize_path() does not resolve .. segments — an explicit !path.contains("..") guard prevents path traversal bypass.

Fix 2 — Per-conversation mutex in ConversationManager

Replaces RwLock<HashMap<ConversationId, ConversationSurface>> with RwLock<HashMap<ConversationId, Arc<Mutex<ConversationSurface>>>>. The global RwLock is now a directory only (held for brief HashMap lookups). A single get_conversation_lock() helper encapsulates the two-step access pattern and enforces the lock ordering invariant in one place.

  • Different conversations now run fully in parallel
  • Closes the TOCTOU double-spawn race in handle_user_message
  • Fixes record_thread_outcome 7s post-execution gap
  • record_thread_outcome and clear_conversation now propagate Err when conversation not in memory (previously silently returned Ok)
  • bootstrap_user upserts channel_user_index even for already-present conversations (repairs stale index entries after rollback)
  • list_conversations pre-filters via channel_user_index before acquiring per-conv locks — O(matched) not O(all)
  • entries_snapshot clone deferred to None branch only (inject/resume paths skip the allocation)

Fix 3 — DATABASE_POOL_SIZE default 10→30

Immediate pool headroom while structural fixes take effect. Configurable via env var. .env.example updated.

Tests

  • concurrent_handle_user_message_spawns_one_thread — two concurrent messages to same conversation; asserts exactly one thread spawned (validates TOCTOU fix)
  • record_thread_outcome_unknown_conv_returns_err — validates error propagation fix
  • engine_projects_path_skips_indexing_but_knowledge_does_not — validates workspace fix coverage
  • runtime_path_writes_produce_no_versions — validates versioning suppression

267 tests passing, 0 failures, 0 clippy warnings

Known limitations (documented in code)

  • list_conversations is a best-effort snapshot, not point-in-time consistent (accepted trade-off of two-phase collect)
  • get_or_create_conversation has a narrow concurrent-creation race where a caller that races between insert and rollback may hold a transiently unpersisted ConversationId (documented with comment)
  • record_thread_outcome in-memory mutations are not rolled back if save_conversation fails (accepted, documented)
  • clear_engine_conversation in router.rs has a pre-existing TOCTOU where threads spawned concurrently may be cleared without being stopped (pre-existing, out of scope)

henrypark133 and others added 7 commits April 7, 2026 15:07
Engine state files under engine/.runtime/ are execution state blobs,
not semantic documents — they should never trigger FTS/vector chunking.
This eliminates the chunk INSERT burst that floods the DB connection
pool under multi-tenant load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ConversationManager::handle_user_message() was holding write locks on
conversations and channel_user_index maps while calling save_conversation(),
which triggers slow workspace writes. Under multi-tenant load this fully
serialized all concurrent tenants through this critical section.

Now only in-memory HashMap mutations happen under the write locks; the
DB persist call happens after locks are released.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
At 10 concurrent tenants, the previous default of 10 connections is
insufficient — workspace chunk INSERTs alone can exhaust the pool,
causing persist failures and cold-loads on subsequent turns.

Configurable via DATABASE_POOL_SIZE env var.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the global RwLock<HashMap<ConversationId, ConversationSurface>> with
RwLock<HashMap<ConversationId, Arc<Mutex<ConversationSurface>>>>.

The global RwLock is now a directory only — held briefly for HashMap lookups
and inserts. The tokio::sync::Mutex is per-conversation — held for the full
duration of a message including async DB calls, but only blocks operations on
the same conversation.

A single get_conversation_lock() helper encapsulates the two-step access
pattern. All methods use this one entry point — the lock ordering invariant
(global RwLock released before per-conv Mutex acquired) is enforced in one place.

Fixes:
- Cross-tenant serialization: different conversations now run fully in parallel
- TOCTOU double-spawn race in handle_user_message (per-conv Mutex held
  continuously from active_thread check through spawn+track)
- record_thread_outcome 7s post-execution gap
- clear_conversation and get_or_create_conversation global lock contention

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C1: record_thread_outcome + clear_conversation now propagate Err when
conversation is not in memory instead of silently returning Ok.

W1: user ConversationEntry added after thread operation succeeds, not
before — prevents orphaned entries on async failure paths.

W2: document lock acquisition order (conversations before
channel_user_index) to prevent future deadlocks.

W3: bootstrap_user skips conversations already in memory — prevents
concurrent bootstraps orphaning existing Arc holders.

W4: document known limitation in record_thread_outcome where in-memory
mutations are not rolled back on save_conversation failure.

W5: document list_conversations best-effort snapshot semantics.

Tests: concurrent handle_user_message test (T1) and record_thread_outcome
with unknown conversation_id test (T4).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C2: is_engine_runtime_path() now covers engine/projects/ and
engine/orchestrator/failures.json — these machine-generated blobs were
still triggering FTS/vector chunking despite the runtime path guard.
Semantic content (engine/knowledge/, orchestrator scripts) still indexed.

C3: DocumentMetadata for runtime paths now sets skip_versioning: Some(true)
— prevents O(steps x tenants) version rows under multi-tenant load.

Tests: T2 verifies projects/ skip indexing + knowledge/ still indexed.
T3 verifies runtime path writes produce zero version rows.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reflects the new default in src/config/database.rs. Operators copying
.env.example as a starting point would otherwise explicitly pin to the
old value of 10.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 7, 2026 22:51
@github-actions github-actions bot added scope: workspace Persistent memory / workspace size: XL 500+ changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 7, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a granular locking strategy for the ConversationManager by wrapping individual conversations in Arc<Mutex<...>>, enabling parallel processing of messages across different conversations. It also optimizes database performance by increasing the default connection pool size and excluding machine-generated runtime state files from indexing and versioning. A performance concern was raised regarding list_conversations, which currently iterates over all conversations to filter by user ID; utilizing the channel_user_index for pre-filtering is suggested to reduce lock contention and improve efficiency in multi-tenant environments.

Comment thread crates/ironclaw_engine/src/runtime/conversation.rs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces multi-tenant inference latency by removing expensive workspace indexing from engine runtime-state writes and by eliminating cross-tenant serialization in the in-memory conversation router via per-conversation locking.

Changes:

  • Skip FTS/vector indexing and versioning for machine-generated engine runtime-state paths on Workspace::write().
  • Replace a global conversations RwLock with per-conversation Mutexes in ConversationManager to allow parallel turns across conversations and eliminate double-spawn races.
  • Increase the default DATABASE_POOL_SIZE (10 → 30) and update .env.example.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/workspace/mod.rs Adds engine runtime path classification and short-circuits write-time metadata/versioning/indexing for those paths; adds tests for indexing/version suppression.
crates/ironclaw_engine/src/runtime/conversation.rs Introduces per-conversation mutex locking and refactors message routing/outcome recording; adds concurrency-focused tests.
src/config/database.rs Raises default DB pool size to reduce contention under load.
.env.example Documents the new default pool size and deployment guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/workspace/mod.rs
Comment thread crates/ironclaw_engine/src/runtime/conversation.rs
Comment thread crates/ironclaw_engine/src/runtime/conversation.rs Outdated
- list_conversations: use channel_user_index to pre-filter by user_id
  before acquiring per-conv locks — O(matched) instead of O(all)
  (gemini-code-assist suggestion)

- is_engine_runtime_path: add !path.contains('..') guard to prevent
  path traversal bypass (e.g. engine/.runtime/../knowledge/foo.md)
  (Copilot)

- bootstrap_user: return count of newly inserted conversations rather
  than total from store — accurate after idempotency guard (Copilot)

- handle_user_message: defer entries clone to None branch only —
  inject/resume fast paths no longer pay the O(entries) allocation
  (Copilot)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@henrypark133
Copy link
Copy Markdown
Collaborator Author

Thanks for the review @gemini-code-assist! The list_conversations suggestion was a good catch — applied your exact suggestion in 60608d8. It now pre-filters via channel_user_index before acquiring any per-conversation locks, reducing lock scope from O(all conversations) to O(user's conversations). Thread resolved.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

That sounds great, @henrypark133. Reducing the lock scope to O(user's conversations) is a significant improvement for multi-tenant performance. The implementation details look solid, and the added tests should help prevent regressions in this area. Thanks for the update!

- bootstrap_user: upsert channel_user_index even when conversation is
  already in convs — index may be stale after a prior rollback in
  get_or_create_conversation

- is_engine_runtime_path: add engine/README.md (auto-generated,
  regenerated at engine-turn frequency, should not accumulate versions)

- is_engine_runtime_path: clarify normalize_path() does not resolve '..'
  so the guard is load-bearing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 7, 2026 23:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/workspace/mod.rs
Before this PR, engine runtime paths were indexed normally and accumulated
chunks in memory_chunks. The early-exit guard prevents new chunks but
left pre-existing ones polluting FTS/vector search results and consuming
storage indefinitely.

Now delete_chunks() is called on every runtime path write — a no-op once
the document has no chunks, a one-time cleanup on the first write after
deploying this change. Fail-open so chunk deletion can't block state writes.

Raised by Copilot in PR review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: workspace Persistent memory / workspace size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants