Skip to content

fix: prevent message loss when new message arrives as idle timeout fires#802

Open
dptts wants to merge 3 commits intoqwibitai:mainfrom
dptts:fix/message-loss-race
Open

fix: prevent message loss when new message arrives as idle timeout fires#802
dptts wants to merge 3 commits intoqwibitai:mainfrom
dptts:fix/message-loss-race

Conversation

@dptts
Copy link
Copy Markdown

@dptts dptts commented Mar 7, 2026

Type of Change

  • Skill - adds a new skill in .claude/skills/
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code

Description

I added added support for agents to respond to webhooks and found a race condition. GroupQueue.sendMessage() could silently drop incoming messages.

Sequence:

  1. Container is active processing messages (state.active = true)
  2. Agent finishes its turn and enters idle-wait — the orchestrator's idle timer fires and calls closeStdin(), which writes a _close sentinel to the IPC directory and signals the container to wind down
  3. A new user message arrives before the container process has actually exited — state.active is still true
  4. sendMessage() passes its guard (!state.active || !state.groupFolder || !state.isTaskContainer) and writes the new message's IPC file into the input directory
  5. The container reads _close, exits — the IPC file is never processed. Message silently lost.

Fix

Add a closing: boolean flag to GroupState. closeStdin() sets it; notifyContainerStart() clears it (reset for the next run). The sendMessage() guard gains || state.closing, making it return false during the window between idle-timeout and container exit. The caller then treats this as a queued pending message, which is processed in the next container run.

Evidence

The regression test in this PR (src/group-queue.test.ts) was written to fail on unfixed code and pass on the fix.

For Skills

  • I have not made any changes to source code
  • My skill contains instructions for Claude to follow (not pre-built code)
  • I tested this skill on a fresh clone

dptts and others added 2 commits March 7, 2026 19:27
Demonstrates the message-loss race fixed in the parent commit:
after closeStdin() fires (idle timer), sendMessage() must return
false even while state.active is still true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@apatella apatella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix logic looks reasonable — the race between an incoming message and the idle timeout firing is a real edge case worth handling.

Two files need Prettier formatting before CI will pass:

  • src/group-queue.ts
  • src/group-queue.test.ts

Quick fix:

npx prettier --write src/group-queue.ts src/group-queue.test.ts

Once formatted, CI should be green. Nice catch on this bug! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fix Bug fix Status: Needs Review Ready for maintainer review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants