perf: fix multi-tenant inference latency (per-conversation locking + workspace indexing) by henrypark133 · Pull Request #2127 · nearai/ironclaw

henrypark133 · 2026-04-07T22:51:07Z

Problem

Multi-tenant conversational turns were taking ~20s E2E for messages that required only ~1.7s of actual LLM inference. Trace analysis revealed two bottlenecks in the bridge layer:

6s pre-LLM gap: bridge::router blocked before launching the Python orchestrator
7s post-execution gap: bridge blocked after thread execution finished before logging completed

Root cause: every engine state write triggered full FTS/vector document chunking via workspace.write(), flooding the shared DB connection pool (default: 10 connections) with chunk INSERTs. This cascaded into lock contention in ConversationManager where a global RwLock was held across slow DB writes — serializing all concurrent tenants.

Changes

Fix 1 — Skip workspace indexing for engine runtime state

Engine execution state files (engine/.runtime/, engine/projects/, engine/orchestrator/failures.json, engine/README.md) are machine-generated blobs, not semantic documents. They no longer trigger reindex_document_with_metadata(), version row creation, or FTS/vector indexing on write. The normalize_path() does not resolve .. segments — an explicit !path.contains("..") guard prevents path traversal bypass.

Fix 2 — Per-conversation mutex in `ConversationManager`

Replaces RwLock<HashMap<ConversationId, ConversationSurface>> with RwLock<HashMap<ConversationId, Arc<Mutex<ConversationSurface>>>>. The global RwLock is now a directory only (held for brief HashMap lookups). A single get_conversation_lock() helper encapsulates the two-step access pattern and enforces the lock ordering invariant in one place.

Different conversations now run fully in parallel
Closes the TOCTOU double-spawn race in handle_user_message
Fixes record_thread_outcome 7s post-execution gap
record_thread_outcome and clear_conversation now propagate Err when conversation not in memory (previously silently returned Ok)
bootstrap_user upserts channel_user_index even for already-present conversations (repairs stale index entries after rollback)
list_conversations pre-filters via channel_user_index before acquiring per-conv locks — O(matched) not O(all)
entries_snapshot clone deferred to None branch only (inject/resume paths skip the allocation)

Fix 3 — `DATABASE_POOL_SIZE` default 10→30

Immediate pool headroom while structural fixes take effect. Configurable via env var. .env.example updated.

Tests

concurrent_handle_user_message_spawns_one_thread — two concurrent messages to same conversation; asserts exactly one thread spawned (validates TOCTOU fix)
record_thread_outcome_unknown_conv_returns_err — validates error propagation fix
engine_projects_path_skips_indexing_but_knowledge_does_not — validates workspace fix coverage
runtime_path_writes_produce_no_versions — validates versioning suppression

267 tests passing, 0 failures, 0 clippy warnings

Known limitations (documented in code)

list_conversations is a best-effort snapshot, not point-in-time consistent (accepted trade-off of two-phase collect)
get_or_create_conversation has a narrow concurrent-creation race where a caller that races between insert and rollback may hold a transiently unpersisted ConversationId (documented with comment)
record_thread_outcome in-memory mutations are not rolled back if save_conversation fails (accepted, documented)
clear_engine_conversation in router.rs has a pre-existing TOCTOU where threads spawned concurrently may be cleared without being stopped (pre-existing, out of scope)

Engine state files under engine/.runtime/ are execution state blobs, not semantic documents — they should never trigger FTS/vector chunking. This eliminates the chunk INSERT burst that floods the DB connection pool under multi-tenant load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ConversationManager::handle_user_message() was holding write locks on conversations and channel_user_index maps while calling save_conversation(), which triggers slow workspace writes. Under multi-tenant load this fully serialized all concurrent tenants through this critical section. Now only in-memory HashMap mutations happen under the write locks; the DB persist call happens after locks are released. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

At 10 concurrent tenants, the previous default of 10 connections is insufficient — workspace chunk INSERTs alone can exhaust the pool, causing persist failures and cold-loads on subsequent turns. Configurable via DATABASE_POOL_SIZE env var. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the global RwLock<HashMap<ConversationId, ConversationSurface>> with RwLock<HashMap<ConversationId, Arc<Mutex<ConversationSurface>>>>. The global RwLock is now a directory only — held briefly for HashMap lookups and inserts. The tokio::sync::Mutex is per-conversation — held for the full duration of a message including async DB calls, but only blocks operations on the same conversation. A single get_conversation_lock() helper encapsulates the two-step access pattern. All methods use this one entry point — the lock ordering invariant (global RwLock released before per-conv Mutex acquired) is enforced in one place. Fixes: - Cross-tenant serialization: different conversations now run fully in parallel - TOCTOU double-spawn race in handle_user_message (per-conv Mutex held continuously from active_thread check through spawn+track) - record_thread_outcome 7s post-execution gap - clear_conversation and get_or_create_conversation global lock contention Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

C1: record_thread_outcome + clear_conversation now propagate Err when conversation is not in memory instead of silently returning Ok. W1: user ConversationEntry added after thread operation succeeds, not before — prevents orphaned entries on async failure paths. W2: document lock acquisition order (conversations before channel_user_index) to prevent future deadlocks. W3: bootstrap_user skips conversations already in memory — prevents concurrent bootstraps orphaning existing Arc holders. W4: document known limitation in record_thread_outcome where in-memory mutations are not rolled back on save_conversation failure. W5: document list_conversations best-effort snapshot semantics. Tests: concurrent handle_user_message test (T1) and record_thread_outcome with unknown conversation_id test (T4). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

C2: is_engine_runtime_path() now covers engine/projects/ and engine/orchestrator/failures.json — these machine-generated blobs were still triggering FTS/vector chunking despite the runtime path guard. Semantic content (engine/knowledge/, orchestrator scripts) still indexed. C3: DocumentMetadata for runtime paths now sets skip_versioning: Some(true) — prevents O(steps x tenants) version rows under multi-tenant load. Tests: T2 verifies projects/ skip indexing + knowledge/ still indexed. T3 verifies runtime path writes produce zero version rows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reflects the new default in src/config/database.rs. Operators copying .env.example as a starting point would otherwise explicitly pin to the old value of 10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a granular locking strategy for the ConversationManager by wrapping individual conversations in Arc<Mutex<...>>, enabling parallel processing of messages across different conversations. It also optimizes database performance by increasing the default connection pool size and excluding machine-generated runtime state files from indexing and versioning. A performance concern was raised regarding list_conversations, which currently iterates over all conversations to filter by user ID; utilizing the channel_user_index for pre-filtering is suggested to reduce lock contention and improve efficiency in multi-tenant environments.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Reduces multi-tenant inference latency by removing expensive workspace indexing from engine runtime-state writes and by eliminating cross-tenant serialization in the in-memory conversation router via per-conversation locking.

Changes:

Skip FTS/vector indexing and versioning for machine-generated engine runtime-state paths on Workspace::write().
Replace a global conversations RwLock with per-conversation Mutexes in ConversationManager to allow parallel turns across conversations and eliminate double-spawn races.
Increase the default DATABASE_POOL_SIZE (10 → 30) and update .env.example.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`src/workspace/mod.rs`	Adds engine runtime path classification and short-circuits write-time metadata/versioning/indexing for those paths; adds tests for indexing/version suppression.
`crates/ironclaw_engine/src/runtime/conversation.rs`	Introduces per-conversation mutex locking and refactors message routing/outcome recording; adds concurrency-focused tests.
`src/config/database.rs`	Raises default DB pool size to reduce contention under load.
`.env.example`	Documents the new default pool size and deployment guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- list_conversations: use channel_user_index to pre-filter by user_id before acquiring per-conv locks — O(matched) instead of O(all) (gemini-code-assist suggestion) - is_engine_runtime_path: add !path.contains('..') guard to prevent path traversal bypass (e.g. engine/.runtime/../knowledge/foo.md) (Copilot) - bootstrap_user: return count of newly inserted conversations rather than total from store — accurate after idempotency guard (Copilot) - handle_user_message: defer entries clone to None branch only — inject/resume fast paths no longer pay the O(entries) allocation (Copilot) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

henrypark133 · 2026-04-07T23:12:40Z

Thanks for the review @gemini-code-assist! The list_conversations suggestion was a good catch — applied your exact suggestion in 60608d8. It now pre-filters via channel_user_index before acquiring any per-conversation locks, reducing lock scope from O(all conversations) to O(user's conversations). Thread resolved.

gemini-code-assist · 2026-04-07T23:13:04Z

That sounds great, @henrypark133. Reducing the lock scope to O(user's conversations) is a significant improvement for multi-tenant performance. The implementation details look solid, and the added tests should help prevent regressions in this area. Thanks for the update!

- bootstrap_user: upsert channel_user_index even when conversation is already in convs — index may be stale after a prior rollback in get_or_create_conversation - is_engine_runtime_path: add engine/README.md (auto-generated, regenerated at engine-turn frequency, should not accumulate versions) - is_engine_runtime_path: clarify normalize_path() does not resolve '..' so the guard is load-bearing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Before this PR, engine runtime paths were indexed normally and accumulated chunks in memory_chunks. The early-exit guard prevents new chunks but left pre-existing ones polluting FTS/vector search results and consuming storage indefinitely. Now delete_chunks() is called on every runtime path write — a no-op once the document has no chunks, a one-time cleanup on the first write after deploying this change. Fail-open so chunk deletion can't block state writes. Raised by Copilot in PR review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

henrypark133 and others added 7 commits April 7, 2026 15:07

docs: update DATABASE_POOL_SIZE default in .env.example to 30

d16cdc9

Reflects the new default in src/config/database.rs. Operators copying .env.example as a starting point would otherwise explicitly pin to the old value of 10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 7, 2026 22:51

github-actions bot added scope: workspace Persistent memory / workspace size: XL 500+ changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 7, 2026

Copilot started reviewing on behalf of henrypark133 April 7, 2026 22:51 View session

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

Comment thread crates/ironclaw_engine/src/runtime/conversation.rs

style: cargo fmt

3b8230e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/workspace/mod.rs

Comment thread crates/ironclaw_engine/src/runtime/conversation.rs

Comment thread crates/ironclaw_engine/src/runtime/conversation.rs Outdated

Copilot AI review requested due to automatic review settings April 7, 2026 23:27

Copilot started reviewing on behalf of henrypark133 April 7, 2026 23:27 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/workspace/mod.rs

nickpismenkov approved these changes Apr 8, 2026

View reviewed changes

henrypark133 merged commit a56fec7 into staging Apr 8, 2026
14 checks passed

henrypark133 deleted the fix/multitenant-inference-latency-v2 branch April 8, 2026 03:05

github-actions bot mentioned this pull request Apr 10, 2026

chore: promote staging to staging-promote/4c9a985b-23931806540 (2026-04-03 05:32 UTC) #1953

Merged

ironclaw-ci bot mentioned this pull request Apr 10, 2026

chore: release #2075

Merged

ironclaw-ci bot mentioned this pull request Apr 18, 2026

chore: release #2606

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: fix multi-tenant inference latency (per-conversation locking + workspace indexing)#2127

perf: fix multi-tenant inference latency (per-conversation locking + workspace indexing)#2127
henrypark133 merged 11 commits intostagingfrom
fix/multitenant-inference-latency-v2

henrypark133 commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henrypark133 commented Apr 7, 2026

Uh oh!

gemini-code-assist bot commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

henrypark133 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Fix 1 — Skip workspace indexing for engine runtime state

Fix 2 — Per-conversation mutex in ConversationManager

Fix 3 — DATABASE_POOL_SIZE default 10→30

Tests

Known limitations (documented in code)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henrypark133 commented Apr 7, 2026

Uh oh!

gemini-code-assist bot commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

henrypark133 commented Apr 7, 2026 •

edited

Loading

Fix 2 — Per-conversation mutex in `ConversationManager`

Fix 3 — `DATABASE_POOL_SIZE` default 10→30