Skip to content

fix: prevent cross-session nested SQLite transactions during bootstrap#261

Merged
jalehman merged 4 commits intoMartian-Engineering:mainfrom
100yenadmin:fix/txn-race-bootstrap
Apr 5, 2026
Merged

fix: prevent cross-session nested SQLite transactions during bootstrap#261
jalehman merged 4 commits intoMartian-Engineering:mainfrom
100yenadmin:fix/txn-race-bootstrap

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

Summary

  • add a per-DB async transaction mutex for all explicit transaction entry points
  • prevent cross-session nested-transaction SQLite failures on a shared DatabaseSync handle
  • add regression coverage for cross-store / cross-session contention

Root cause

Lossless-Claw shares one SQLite DatabaseSync connection across sessions, but transaction serialization was only per session. Some transaction scopes span awaited async work (especially bootstrap), so a second session could issue BEGIN on the same connection while the first transaction was still open, triggering:

cannot start a transaction within a transaction

What changed

  • added acquireTransactionLock(db) in src/transaction-mutex.ts
  • wrapped all explicit transaction entry points with the per-DB mutex:
    • ConversationStore.withTransaction()
    • SummaryStore.replaceContextRangeWithSummary()
    • applyScopedDoctorRepair()
  • added test/transaction-mutex.test.ts

Testing

  • npx vitest run test/transaction-mutex.test.ts
  • npx vitest run test/summary-store.test.ts test/migration.test.ts test/fts5-sanitize.test.ts

Notes / caveats

  • This is a containment hotfix for Fix cross-session SQLite transaction collisions during bootstrap/restart #260, not a full transaction-architecture refactor.
  • Adversarial review found the mutex is non-reentrant: nested same-DB transaction entry would hang instead of throwing. I did not find an in-tree current call path that triggers that for the patched entry points, so I’m shipping this as the narrow hotfix and treating reentrancy hardening as follow-up work.

…ed-transaction failures

Fixes Martian-Engineering#260

Root cause: Multiple async sessions share one synchronous DatabaseSync handle.
SQLite's transaction state is per-connection, so concurrent async code paths
that both issue BEGIN while the other is mid-transaction (awaiting async work)
cause 'cannot start a transaction within a transaction' errors.

Fix: Introduce acquireTransactionLock() — a per-database async mutex using
a WeakMap<DatabaseSync, promise-chain>. Applied to all three explicit
transaction entry points:

- ConversationStore.withTransaction() — BEGIN IMMEDIATE
- SummaryStore.replaceContextRangeWithSummary() — BEGIN
- lcm-doctor-apply.ts applyScopedDoctorRepair() — BEGIN IMMEDIATE

The mutex serializes transaction acquisition per DB instance while allowing
different databases to proceed independently.

Includes regression tests covering:
- Concurrent withTransaction from multiple sessions on one DB
- Concurrent replaceContextRangeWithSummary calls
- Cross-store (ConversationStore + SummaryStore) concurrent transactions
- Error propagation without mutex deadlock
- 10-session stress test
- Independent database isolation
Copilot AI review requested due to automatic review settings April 4, 2026 00:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a per-database async mutex to serialize explicit SQLite transaction entry points across sessions, preventing cannot start a transaction within a transaction failures when multiple sessions share one DatabaseSync connection during bootstrap/other awaited transaction scopes.

Changes:

  • Added acquireTransactionLock(db) (per-DatabaseSync mutex via WeakMap) to serialize transaction entry.
  • Wrapped explicit transaction entry points (ConversationStore.withTransaction, SummaryStore.replaceContextRangeWithSummary, applyScopedDoctorRepair) with the mutex.
  • Added regression tests covering same-DB serialization, cross-store contention, and basic concurrency stress.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/transaction-mutex.ts Adds per-DB async mutex used to serialize transaction entry across sessions.
src/store/conversation-store.ts Wraps withTransaction() with the per-DB mutex to prevent cross-session nested BEGIN IMMEDIATE.
src/store/summary-store.ts Wraps replaceContextRangeWithSummary() with the per-DB mutex to prevent nested BEGIN.
src/plugin/lcm-doctor-apply.ts Wraps applyScopedDoctorRepair() transaction with the per-DB mutex.
test/transaction-mutex.test.ts Adds regression coverage for concurrent transaction entry across sessions/stores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@100yenadmin
Copy link
Copy Markdown
Contributor Author

100yenadmin commented Apr 4, 2026

TLDR: last patch fix created permanent lockout loop potential for any agent using LCM. For users experiencing the bug, the fix is to switch to legacy and agent will become responsive again. Bug can be recreated via fast restart, /new or /reset. Extremely difficult to debug ie code looked right but timing issues were unforeseeable until tested in production.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@100yenadmin
Copy link
Copy Markdown
Contributor Author

@jalehman discovered this bug last night after rollout. critical bug that locks user main agent up + subagents (if they are not isolated) and leaves them unresponsive without any log/errors.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jalehman jalehman merged commit 65c76f1 into Martian-Engineering:main Apr 5, 2026
1 check passed
@github-actions github-actions bot mentioned this pull request Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants