Skip to content

fix: prevent message loss when container times out after pipe#694

Closed
neocode24 wants to merge 1 commit intonanocoai:mainfrom
neocode24:fix/pipe-message-loss
Closed

fix: prevent message loss when container times out after pipe#694
neocode24 wants to merge 1 commit intonanocoai:mainfrom
neocode24:fix/pipe-message-loss

Conversation

@neocode24
Copy link
Copy Markdown
Contributor

Summary

  • Separate lastPipedTimestamp from lastAgentTimestamp: piping follow-up messages to an active container no longer advances the authoritative cursor. Only processGroupMessages does. If the container times out before processing piped messages, they are re-queued on the next cycle.
  • Set pendingMessages = true on pipe: ensures the drain loop re-checks for unprocessed messages after container exit.
  • Add onPipeCallback to reset idle timer: piped messages now reset the container idle timer, preventing premature timeout immediately after receiving input.

Problem

When a user sends follow-up messages while the agent is processing, the message loop pipes them to the active container via IPC. Previously, lastAgentTimestamp was advanced immediately on pipe. If the container then timed out (e.g., due to network instability), those piped messages were permanently lost — the cursor had already moved past them.

Test plan

  • npx tsc --noEmit passes
  • npx vitest run — no regressions
  • Send follow-up messages while agent is processing, then force-kill the container → messages should be re-processed on next cycle

🤖 Generated with Claude Code

When follow-up messages are piped to an active container via IPC, the
message loop was advancing lastAgentTimestamp immediately. If the
container then timed out before processing the piped messages, those
messages were permanently skipped — the cursor had already moved past
them.

Three changes fix this:

1. Separate lastPipedTimestamp from lastAgentTimestamp: piping only
   advances a temporary cursor to prevent re-piping the same messages.
   Only processGroupMessages advances the authoritative cursor. On
   container exit, lastPipedTimestamp is cleared so the messages get
   re-processed.

2. Set pendingMessages=true on pipe: ensures the drain loop re-checks
   for unprocessed messages after the container exits.

3. Add onPipeCallback to reset the idle timer: when messages are piped,
   the container needs time to process them. Without resetting the idle
   timer, the container could time out immediately after receiving input.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@TomGranot
Copy link
Copy Markdown
Collaborator

Message loss on container timeout is a real issue — glad someone is looking at this. @gavrielc would be good to review the approach here.

@Andy-NanoClaw-AI Andy-NanoClaw-AI added PR: Fix Bug fix Status: Needs Review Ready for maintainer review labels Mar 5, 2026
klapom added a commit to klapom/nanoclaw that referenced this pull request Mar 5, 2026
…res)

Bug fixes applied:
- nanocoai#636: task-scheduler recalculates next_run before enqueue
- nanocoai#655: LIMIT 200 on message queries to prevent OOM
- nanocoai#670: rateLimitResetAt field in ContainerOutput interface
- nanocoai#694: ANTHROPIC_MODEL passthrough to container env
- nanocoai#700: session rotation at 5MB JSONL threshold
- nanocoai#701: session retry on corrupted resume (clear + retry)
- nanocoai#708: update_task MCP tool in ipc-mcp-stdio
- nanocoai#719: outputChain .catch() to prevent group hang
- nanocoai#729: fix send_message description (remove incorrect scheduled-task note)
- nanocoai#735: datePrefix() injects current date/time into all agent prompts
- nanocoai#738: ANTHROPIC_MODEL from .env passed to agent container
- nanocoai#746: systemd OnFailure restart prevention logic (container hardening)
- nanocoai#751: DM-with-bot JID normalization
- nanocoai#754: setOnPipeCallback to reset idle timer on piped messages
- nanocoai#756: cursorBeforePipe rollback on container crash

Features added:
- nanocoai#723: streaming infrastructure (STREAM_TEXT markers, onStreamDelta)
- nanocoai#742: container hardening (entrypoint.sh privilege drop, env sanitize)
- nanocoai#680: add-cli skill (CLI send binary)
- nanocoai#727: add-memory skill extracted to .claude/skills/add-memory/
- nanocoai#744: add-s3-storage skill extracted to .claude/skills/add-s3-storage/

Test fixes:
- Mock fs.promises in container-runner.test.ts to prevent real I/O
- Add ANTHROPIC_MODEL to config mock
- Fix cpSync expectation: { recursive: true, force: true }
- Fix isActive() to use state.active instead of state.process
- Fix container-runtime error message: Docker → Container runtime

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Andy-NanoClaw-AI Andy-NanoClaw-AI added Status: Blocked Blocked by merge conflicts or dependencies Status: Needs Review Ready for maintainer review and removed Status: Needs Review Ready for maintainer review Status: Blocked Blocked by merge conflicts or dependencies labels Mar 14, 2026
@neocode24
Copy link
Copy Markdown
Contributor Author

Closing for now — upstream structure has changed significantly since this PR. The fix is maintained in our fork. Will revisit if the issue resurfaces upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fix Bug fix Status: Blocked Blocked by merge conflicts or dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants