Skip to content

fix: atomic claim prevents scheduled tasks from executing twice#657

Merged
gavrielc merged 9 commits intonanocoai:mainfrom
gabi-simons:fix/scheduler-race-condition-final
Mar 4, 2026
Merged

fix: atomic claim prevents scheduled tasks from executing twice#657
gavrielc merged 9 commits intonanocoai:mainfrom
gabi-simons:fix/scheduler-race-condition-final

Conversation

@gabi-simons
Copy link
Copy Markdown
Collaborator

@gabi-simons gabi-simons commented Mar 2, 2026

Summary

Fixes the scheduler race condition where tasks execute twice because enqueueTask() only deduplicates against pendingTasks — a running task has already been shifted out, so the next poll re-queues it.

Closes #138, #211, #300, #578, #669

The fix

3-line fix in group-queue.ts: Add runningTaskId to GroupState. enqueueTask() now checks both the pending array and the currently-running task ID, rejecting duplicates at the queue layer where the state lives.

Drift fix in task-scheduler.ts: computeNextRun() anchors interval tasks to scheduled_time + N*interval instead of Date.now() + interval, preventing cumulative timing drift. Includes a guard against zero/negative interval values.

Why not "atomic claim" (previous approach in this PR)

The first commits implemented claimDueTasks() — a SQLite transaction that advanced next_run before execution. Per review, this had real downsides:

  • Once-task crash risk: process dies after claim → task orphaned forever
  • No retry on failure: next_run already skipped ahead
  • Misleading transaction: Node.js is single-threaded, better-sqlite3 is synchronous — no interleaving possible

The dedup belongs in GroupQueue, not the DB layer.

Test plan

  • 334/334 tests pass (npx vitest run)
  • New regression test: rejects duplicate enqueue of a currently-running task
  • Drift test: interval anchored to scheduled time, not wall clock
  • Deploy to staging and verify no duplicate task runs over 24h

Credits

Drift logic from PR #601 by @taslim. Issues reported by @baijunjie (#138), @Michaelliv (#300), and @kk17 (#669).

🤖 Generated with Claude Code

…coai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@gabi-simons gabi-simons requested a review from gavrielc as a code owner March 2, 2026 20:43
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gavrielc
Copy link
Copy Markdown
Collaborator

gavrielc commented Mar 2, 2026

The drift fix (anchoring to scheduled_time + N*interval) is a solid improvement — worth keeping regardless of approach.

But I think the dedup is being solved at the wrong layer. The GroupQueue already deduplicates pending tasks by ID (group-queue.ts:94), but that check misses currently-running tasks (they've been shifted out of pendingTasks). So the next poll re-discovers the task and re-queues it. That's the actual bug.

Advancing next_run before execution works around it, but has real downsides:

  • Once-task crash risk: process dies after claim (next_run = NULL) but before execution → task is orphaned forever (active status, but no poll finds it).
  • No retry on failure: execution fails, but next_run already skipped ahead.
  • Transaction framing is misleading: Node.js is single-threaded, better-sqlite3 is synchronous — no interleaving is possible. The transaction provides no protection over sequential statements.

The fix is ~3 lines in group-queue.ts — track the running task ID in GroupState and include it in the dedup check:

// Add to GroupState: runningTaskId: string | null
// In enqueueTask: if (state.isTaskContainer && state.runningTaskId === taskId) return;
// Set/clear in runTask()

This keeps next_run accurate (updated post-execution), preserves crash recovery, and avoids coupling scheduler logic into the DB layer. The computeNextRun drift fix can be applied in updateTaskAfterRun where it belongs.

gabi-simons and others added 2 commits March 3, 2026 19:52
…on (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
xicv pushed a commit to xicv/nanoclaw that referenced this pull request Mar 3, 2026
- nanocoai#657: computeNextRun() prevents interval drift on scheduled tasks
- nanocoai#655: auto-retry with fresh session when resume fails (breaks infinite loops)
- nanocoai#636: update next_run before enqueue to prevent duplicate task runs
- nanocoai#651: reduce container stop timeout to 1s for faster restarts
- nanocoai#622: add jq to container Dockerfile for safe JSON parsing
- nanocoai#627: inject date/time context into all agent prompts
@Koshkoshinsk
Copy link
Copy Markdown
Collaborator

E2E Test Results

@gabi-simons — Full end-to-end testing completed. All checks pass.

Summary

# Check Expected Result
1 Phase 2 (main): duplicate execution 2+ runs 2 runs (10:21:33 & 10:21:48) — bug reproduced
2 Phase 3 (PR): "Task already running, skipping" Duplicate rejected 3 duplicates rejected
3 Phase 3 (PR): single execution in DB 1 row 1 row (success, 49793ms)
4 Phase 4: npm test 0 failures 351 tests passed (30 files, 0 failures)

Test Details

  • Bug reproduction (main): Reduced poll interval to 15s, created an interval task with sleep 25. On main, the task was enqueued twice within 15 seconds — confirmed duplicate execution.
  • Fix verification (PR branch): Same setup. Logs showed Task already running, skipping on 3 consecutive polls while the container was active. DB confirmed exactly 1 execution.
  • Unit tests: All 351 tests pass including the 4 new tests (computeNextRun anchoring, once-task null return, missed interval skip, duplicate enqueue rejection).
  • Edge case review: Verified that different tasks for the same group are not blocked, runningTaskId clears properly in finally on both success and failure paths, and messages queue correctly while tasks run.

No regressions found.

@gabi-simons gabi-simons requested a review from Koshkoshinsk March 4, 2026 10:40
gabi-simons and others added 5 commits March 4, 2026 13:36
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
@gavrielc gavrielc merged commit f794185 into nanocoai:main Mar 4, 2026
1 check passed
@gavrielc
Copy link
Copy Markdown
Collaborator

gavrielc commented Mar 4, 2026

Great fix. Thanks!!!

garettmd added a commit to garettmd/nanoclaw that referenced this pull request Mar 4, 2026
Merge upstream/main which includes:
- fix: scheduler race condition preventing duplicate task execution (nanocoai#657)
- refactor: multi-channel architecture with self-registering channels (nanocoai#500)
- feat: local whisper skill (nanocoai#702)
- fix: WhatsApp error handling (nanocoai#695)

Adapted our Telegram channel to use the new channel registry pattern
(registerChannel + factory). Removed direct channel instantiation from
index.ts in favor of the upstream barrel import approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
doggybee pushed a commit to doggybee/nanoclaw-upstream that referenced this pull request Mar 5, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
jenskock pushed a commit to jenskock/nanoclaw that referenced this pull request Mar 6, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
terrylica pushed a commit to terrylica/nanoclaw that referenced this pull request Mar 8, 2026
# [1.3.0](v1.2.0...v1.3.0) (2026-03-08)

### Bug Fixes

* add-voice-transcription skill drops WhatsApp registerChannel call ([nanocoai#766](https://github.com/terrylica/nanoclaw/issues/766)) ([47ad2e6](47ad2e6))
* aggressive false positive prevention — 5-layer MiniMax pipeline, devil's advocate round, FP learning ([8bfa372](8bfa372))
* atomic claim prevents scheduled tasks from executing twice ([nanocoai#657](https://github.com/terrylica/nanoclaw/issues/657)) ([f794185](f794185)), closes [nanocoai#138](https://github.com/terrylica/nanoclaw/issues/138) [nanocoai#138](https://github.com/terrylica/nanoclaw/issues/138) [nanocoai#211](https://github.com/terrylica/nanoclaw/issues/211) [nanocoai#300](https://github.com/terrylica/nanoclaw/issues/300) [nanocoai#578](https://github.com/terrylica/nanoclaw/issues/578) [nanocoai#601](https://github.com/terrylica/nanoclaw/issues/601) [nanocoai#138](https://github.com/terrylica/nanoclaw/issues/138) [nanocoai#300](https://github.com/terrylica/nanoclaw/issues/300) [nanocoai#138](https://github.com/terrylica/nanoclaw/issues/138)
* cc-skills now reads label strategy + content types; Claude JSON parsing hardened ([fd7fc7f](fd7fc7f))
* correct misleading send_message tool description for scheduled tasks ([nanocoai#729](https://github.com/terrylica/nanoclaw/issues/729)) ([ec0e42b](ec0e42b))
* **db:** add LIMIT to unbounded message history queries ([nanocoai#692](https://github.com/terrylica/nanoclaw/issues/692)) ([nanocoai#735](https://github.com/terrylica/nanoclaw/issues/735)) ([74b02c8](74b02c8))
* format src/index.ts to pass CI prettier check ([nanocoai#711](https://github.com/terrylica/nanoclaw/issues/711)) ([df2bac6](df2bac6)), closes [nanocoai#710](https://github.com/terrylica/nanoclaw/issues/710)
* grant write permissions to CLAUDE.md maintenance claude -p call ([9ddb433](9ddb433))
* rename _chatJid to chatJid in onMessage callback ([1436186](1436186))
* use 'state' instead of 'stateReason' for gh compatibility on bigblack ([a4f2e92](a4f2e92))
* **whatsapp:** add error handling to messages.upsert handler ([nanocoai#695](https://github.com/terrylica/nanoclaw/issues/695)) ([5e3d8b6](5e3d8b6))
* **whatsapp:** write pairing code to file for immediate access ([nanocoai#745](https://github.com/terrylica/nanoclaw/issues/745)) ([be19911](be19911))

### Features

* add /add-ollama skill for local model inference ([nanocoai#712](https://github.com/terrylica/nanoclaw/issues/712)) ([298c3ea](298c3ea))
* add ast-grep rules for Python static analysis ([a548761](a548761))
* add mise deploy task for bigblack deployment ([c39a1f4](c39a1f4))
* add NDJSON telemetry logging for all Telegram messages ([7f64ea6](7f64ea6))
* add update_task tool and return task ID from schedule_task ([68123fd](68123fd))
* cc-skills integration — enhanced issue creation with taxonomy-aware labels, type-specific templates, and discovery provenance ([602e65d](602e65d))
* CLAUDE.md maintenance creates GitHub issues with full link to Telegram ([ba34620](ba34620))
* CLAUDE.md maintenance, devil's advocate fix, OpenGrep + proactive scanning ([ce66e88](ce66e88))
* confidence scoring, verification scripts, log rotation — 3 more FP prevention layers ([0ff2c3c](0ff2c3c))
* iterative MiniMax self-validation (3 adversarial rounds) ([fc05aff](fc05aff))
* Phase 0 — enable Telegram channel and Docker Compose deployment ([ebbf59c](ebbf59c))
* Phase 2 — MiniMax orchestrator loop for continuous validation ([17e90a3](17e90a3))
* proactive algo correctness scanning with full Telegram + GitHub issue reporting ([4b68c3e](4b68c3e))
* **skills:** add image vision skill for WhatsApp ([nanocoai#770](https://github.com/terrylica/nanoclaw/issues/770)) ([af937d6](af937d6))
* **skills:** add pdf-reader skill ([nanocoai#772](https://github.com/terrylica/nanoclaw/issues/772)) ([0b260ec](0b260ec))
* **skills:** add use-local-whisper skill package ([nanocoai#702](https://github.com/terrylica/nanoclaw/issues/702)) ([03f792b](03f792b))
* timezone-aware context injection for agent prompts ([nanocoai#691](https://github.com/terrylica/nanoclaw/issues/691)) ([632713b](632713b)), closes [nanocoai#483](https://github.com/terrylica/nanoclaw/issues/483) [nanocoai#483](https://github.com/terrylica/nanoclaw/issues/483) [nanocoai#526](https://github.com/terrylica/nanoclaw/issues/526)
* whole-repo scanning instead of 3-file batches ([1ace951](1ace951))
* wire trace UUIDs into all Telegram notifications ([b48f0e9](b48f0e9))
ortalis97 pushed a commit to ortalis97/alfred that referenced this pull request Mar 8, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
idgmatrix pushed a commit to Gurufin-AI/nanoclaw that referenced this pull request Mar 9, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
bebekim pushed a commit to bebekim/goodclaw that referenced this pull request Mar 14, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
bogdano2 pushed a commit to bogdano2/nanoclaw that referenced this pull request Mar 17, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
moey823 added a commit to moey823/nanoclaw that referenced this pull request Mar 26, 2026
Merged upstream changes including:
- DB query limits (nanocoai#692/nanocoai#735)
- Timezone-aware context injection (nanocoai#691)
- SDK bump to 0.2.68
- update_task tool and task ID passthrough
- Sender allowlist (nanocoai#705)
- Atomic task claiming (nanocoai#657)

Resolved conflicts in index.ts (sender allowlist + our steering/cancel),
ipc.ts (idempotency check + taskId passthrough), and task-scheduler.ts
(computeNextRun refactor + our cost tracking/alerting).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
onlyforart referenced this pull request in onlyforart/nanoclaw Mar 27, 2026
* fix: atomic claim prevents scheduled tasks from executing twice (#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes #138, #211, #300, #578

Co-authored-by: @taslim (PR #601)
Co-authored-by: @baijunjie (Issue #138)
Co-authored-by: @Michaelliv (Issue #300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes #138, #211, #300, #578

Co-authored-by: @taslim (PR #601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
XiRoSe pushed a commit to XiRoSe/nova-agent that referenced this pull request Apr 9, 2026
dm-j pushed a commit to dm-j/nanoclaw that referenced this pull request Apr 13, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
talmosko-code pushed a commit to talmosko-code/nanoclaw that referenced this pull request Apr 28, 2026
…coai#657)

* fix: atomic claim prevents scheduled tasks from executing twice (nanocoai#138)

Replace the two-phase getDueTasks() + deferred updateTaskAfterRun() with
an atomic SQLite transaction (claimDueTasks) that advances next_run
BEFORE dispatching tasks to the queue. This eliminates the race window
where subsequent scheduler polls re-discover in-progress tasks.

Key changes:
- claimDueTasks(): SELECT + UPDATE in a single db.transaction(), so no
  poll can read stale next_run values. Once-tasks get next_run=NULL;
  recurring tasks get next_run advanced to the future.
- computeNextRun(): anchors interval tasks to the scheduled time (not
  Date.now()) to prevent cumulative drift. Includes a while-loop to
  skip missed intervals and a guard against invalid interval values.
- updateTaskAfterRun(): simplified to only record last_run/last_result
  since next_run is already handled by the claim.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-authored-by: @baijunjie (Issue nanocoai#138)
Co-authored-by: @Michaelliv (Issue nanocoai#300)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply prettier formatting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: track running task ID in GroupQueue to prevent duplicate execution (nanocoai#138)

Previous commits implemented an "atomic claim" approach (claimDueTasks)
that advanced next_run before execution. Per Gavriel's review, this
solved the symptom at the wrong layer and introduced crash-recovery
risks for once-tasks.

This commit reverts claimDueTasks and instead fixes the actual bug:
GroupQueue.enqueueTask() only checked pendingTasks for duplicates, but
running tasks had already been shifted out. Adding runningTaskId to
GroupState closes that gap with a 3-line fix at the correct layer.

The computeNextRun() drift fix is retained, applied post-execution
where it belongs.

Closes nanocoai#138, nanocoai#211, nanocoai#300, nanocoai#578

Co-authored-by: @taslim (PR nanocoai#601)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add changelog entry for scheduler duplicate fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add contributors for scheduler race condition fix

Co-Authored-By: Taslim <9999802+taslim@users.noreply.github.com>
Co-Authored-By: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-Authored-By: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-Authored-By: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: gavrielc <gabicohen22@yahoo.com>
Co-authored-by: Taslim <9999802+taslim@users.noreply.github.com>
Co-authored-by: BaiJunjie <7956480+baijunjie@users.noreply.github.com>
Co-authored-by: Michael <13676242+Michaelliv@users.noreply.github.com>
Co-authored-by: Kyle Zhike Chen <3477852+kk17@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scheduled tasks execute twice due to race between scheduler loop and queue

3 participants