Skip to content

GH#15317: fix dispatch claim system — 5 design flaws causing duplicate dispatch#15324

Closed
marcusquinn wants to merge 1 commit intomainfrom
bugfix/dispatch-claim-dedup
Closed

GH#15317: fix dispatch claim system — 5 design flaws causing duplicate dispatch#15324
marcusquinn wants to merge 1 commit intomainfrom
bugfix/dispatch-claim-dedup

Conversation

@marcusquinn
Copy link
Copy Markdown
Owner

@marcusquinn marcusquinn commented Apr 1, 2026

Summary

Fixes 5 design flaws in the dispatch claim system that caused duplicate worker dispatches, claim comment spam, and same-runner dispatch loops.

Closes #15317

Root Causes Fixed

Fix 1: Deterministic "Dispatching worker" comment (pulse-wrapper.sh)

Problem: The "Dispatching worker" comment was LLM-posted by the worker session. Workers could crash before posting, leaving no persistent signal for Layer 5 dedup. Without this signal, the issue would be re-dispatched every pulse cycle.

Fix: dispatch_with_dedup() now posts the comment deterministically after confirming the worker PID is alive. Workers no longer need to post this comment.

Evidence: #2051 had 29 DISPATCH_CLAIM comments over 6 hours because workers kept dying before posting.

Fix 2: Remove self-skip in Layer 5 (dispatch-dedup-helper.sh)

Problem: has_dispatch_comment() skipped comments from self_login. The same runner's dispatch comment was invisible to its own dedup check, allowing re-dispatch every cycle.

Fix: All dispatch comments block regardless of author. The self_login parameter is kept for backward compatibility but no longer filters.

Evidence: #2040 had alex-solovyev dispatch twice (20:33 and 22:57) because Layer 5 skipped its own dispatch comment.

Fix 3: Remove self-reclaim from claim protocol (dispatch-claim-helper.sh)

Problem: After 30s, the same runner could "reclaim" its own stale claim, enabling re-dispatch loops: claim → dispatch → worker dies → 30s → self-reclaim → dispatch again.

Fix: Stale same-runner claims are now treated as lost (exit 1). Both the stale claim and the fresh one are deleted. Re-dispatch requires explicit kill/failure comment from the pulse.

Evidence: #2051 had 25 claims from alex-solovyev over 6 hours via self-reclaim loop.

Fix 4: Clean up DISPATCH_CLAIM comment after dispatch (pulse-wrapper.sh)

Problem: Winner claim comments persisted forever as "audit trail", producing comment spam on issues.

Fix: After posting the deterministic dispatch comment, dispatch_with_dedup() deletes the DISPATCH_CLAIM comment. The claim served its purpose (8s consensus window); the dispatch comment is the persistent signal.

Fix 5: Document automatic dispatch comment posting (pulse.md, pulse-sweep.md, automate.md)

Problem: pulse.md instructed the LLM to post "Dispatching worker" manually — could produce duplicates or be skipped entirely.

Fix: Updated docs to note that dispatch_with_dedup() posts this comment automatically. LLM sessions no longer need to post it.

Files Changed

  • .agents/scripts/dispatch-claim-helper.sh — Remove self-reclaim, add stale-self detection
  • .agents/scripts/dispatch-dedup-helper.sh — Remove self-skip in has_dispatch_comment()
  • .agents/scripts/pulse-wrapper.sh — Post deterministic dispatch comment, clean up claim comment
  • .agents/scripts/commands/pulse.md — Document automatic dispatch comment
  • .agents/scripts/commands/pulse-sweep.md — Document automatic dispatch comment
  • .agents/automate.md — Document automatic dispatch comment
  • .agents/scripts/tests/test-dispatch-claim-helper.sh — Tests for new self-reclaim behavior

Runtime Testing

Risk level: High (dispatch coordination, state machines, cross-machine locking)

  • ShellCheck: zero violations on all modified scripts
  • Unit tests: 17/17 pass (test-dispatch-claim-helper.sh)
  • Logic verified: self-reclaim loop eliminated, Layer 5 now blocks self-dispatches
  • Integration test: requires live multi-runner environment (not available in this session)

Key Decisions

  • Self-reclaim → exit 1 (not exit 2): Treating stale self-claims as errors (fail-open) would recreate the loop. Explicit rejection forces the pulse to use the kill/failure comment flow for re-dispatch.
  • self_login kept in has_dispatch_comment signature: Backward compatibility — callers pass it; removing it would require updating all call sites.
  • Claim comment deleted, not minimized: Minimized comments still appear in the timeline. Deletion keeps the issue clean.

aidevops.sh v3.5.580 plugin for OpenCode v1.3.13 with claude-sonnet-4-6 spent 5m on this as a headless worker.

Summary by CodeRabbit

  • New Features

    • Dispatch comments are now posted automatically after worker launch instead of requiring manual creation.
  • Bug Fixes

    • Prevents duplicate "Dispatching worker" comments that could interfere with deduplication checks.
    • Improved detection and cleanup of stale dispatch claims.
  • Documentation

    • Updated dispatch workflow guidelines to reflect automatic comment posting and remove manual posting instructions.

…317)

Five design flaws caused duplicate dispatches, claim spam, and dispatch loops:

1. 'Dispatching worker' comment was LLM-posted, not deterministic — workers
   could die before posting, leaving no persistent signal for Layer 5 dedup.
   Now posted by dispatch_with_dedup() after confirming worker PID is alive.

2. Layer 5 (has_dispatch_comment) skipped self-posted comments — same runner's
   dispatch comment was invisible to its own dedup check. Removed self-skip.

3. Self-reclaim allowed same runner to bypass claim dedup after 30s — created
   dispatch loops. Replaced with stale-self detection that rejects + cleans up.

4. DISPATCH_CLAIM comments never cleaned up — winner claims persisted forever.
   Now deleted after deterministic dispatch comment is posted.

5. Pulse LLM instructed to post 'Dispatching worker' manually — could produce
   duplicates or be skipped. Updated pulse.md/pulse-sweep.md/automate.md to
   document automatic posting.

Evidence: awardsapp #2051 had 29 DISPATCH_CLAIM comments over 6 hours,
aidevops #15285 had duplicate workers dispatched.
@marcusquinn marcusquinn added bug Auto-created from TODO.md tag origin:worker Auto-created by pulse labelless backfill (t2112) labels Apr 1, 2026
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

The dispatch claim system is restructured to eliminate duplicate worker dispatches by eliminating self-claim reclaim logic, making dispatch comment posting deterministic via the wrapper rather than relying on worker sessions, blocking on all recent dispatch comments regardless of author, and cleaning up stale claim artifacts post-dispatch.

Changes

Cohort / File(s) Summary
Documentation Updates
.agents/automate.md, .agents/scripts/commands/pulse.md, .agents/scripts/commands/pulse-sweep.md
Updated audit-trail guidance to reflect that "Dispatching worker" comments are now posted automatically by dispatch_with_dedup() per GH#15317; removed instructions for manual dispatch comment posting and added warnings against duplicate comments interfering with Layer 5 dedup.
Claim Self-Reclaim Elimination
.agents/scripts/dispatch-claim-helper.sh
Replaced age-threshold-based self-reclaim logic with nonce-based stale detection; when oldest claim belongs to same runner but has different nonce, emits CLAIM_STALE_SELF, deletes both stale and fresh claims, and returns failure (1) instead of success (0).
Dedup Comment Blocking
.agents/scripts/dispatch-dedup-helper.sh
Removed author-based skipping in has_dispatch_comment(); now blocks on any recent "Dispatching worker" comment regardless of who posted it, eliminating self-posted comment bypass.
Deterministic Dispatch Posting
.agents/scripts/pulse-wrapper.sh
Captured claim output to extract comment IDs; moved dispatch comment posting from worker session to dispatch_with_dedup() after worker launch confirmation; added cleanup to delete stale DISPATCH_CLAIM comments post-dispatch.
Claim Behavior Tests
.agents/scripts/tests/test-dispatch-claim-helper.sh
Replaced self-reclaim and self-loss test cases with stale self-claim rejection tests; updated expectations to verify both stale and fresh claim deletion and CLAIM_STALE_SELF emission on same-runner nonce mismatch.

Sequence Diagram

sequenceDiagram
    actor Runner
    participant PulseWrapper as pulse-wrapper.sh
    participant ClaimHelper as dispatch-claim-helper.sh
    participant DedupHelper as dispatch-dedup-helper.sh
    participant GH as GitHub API
    
    Runner->>PulseWrapper: dispatch_with_dedup()
    PulseWrapper->>ClaimHelper: claim (with nonce)
    alt Same Runner, Stale Nonce
        ClaimHelper->>ClaimHelper: Detect CLAIM_STALE_SELF
        ClaimHelper->>GH: Delete stale claim
        ClaimHelper->>GH: Delete fresh claim
        ClaimHelper-->>PulseWrapper: return 1 (CLAIM_STALE_SELF)
    else Claim Success
        ClaimHelper->>GH: Create DISPATCH_CLAIM comment
        ClaimHelper-->>PulseWrapper: return 0 + comment_id
    end
    
    alt Stale Claim
        PulseWrapper-->>Runner: Fail-open, exit 2
    else Claim Won
        PulseWrapper->>DedupHelper: has_dispatch_comment()
        DedupHelper->>GH: Query recent dispatch comments
        alt Recent Dispatch Comment Exists (Any Author)
            DedupHelper-->>PulseWrapper: return 0 (blocked)
            PulseWrapper-->>Runner: Skip dispatch
        else No Dispatch Comment
            DedupHelper-->>PulseWrapper: return 1 (proceed)
            PulseWrapper->>GH: Launch worker
            PulseWrapper->>GH: Post deterministic "Dispatching worker" comment
            PulseWrapper->>GH: Delete DISPATCH_CLAIM (via comment_id)
            PulseWrapper-->>Runner: Success
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🔄 No more duplicates dancing in the light,
Claims now reject when their nonce isn't right.
Dispatch comments posted with deterministic grace—
No worker ghosts haunting our issue space. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing the dispatch claim system to prevent duplicate dispatch events, directly addressing the core issue #15317.
Linked Issues check ✅ Passed All five design flaw fixes match the linked issue #15317 objectives: deterministic dispatch comment posting, removal of self-skip in Layer 5, prevention of self-reclaim, claim comment cleanup, and documentation updates.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the dispatch claim system design flaws. Documentation updates to pulse.md, pulse-sweep.md, and automate.md directly support the objective of preventing manual dispatch bypasses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/dispatch-claim-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@marcusquinn
Copy link
Copy Markdown
Owner Author

Completion Summary

  • What: Fixed 5 design flaws in dispatch claim system causing duplicate workers, claim spam, and dispatch loops
  • Issue: fix: dispatch claim system fails to prevent duplicate dispatch — 5 design flaws #15317
  • Files changed: dispatch-claim-helper.sh, dispatch-dedup-helper.sh, pulse-wrapper.sh, pulse.md, pulse-sweep.md, automate.md, test-dispatch-claim-helper.sh
  • Testing: ShellCheck clean, 17/17 unit tests pass
  • Key decisions: Self-reclaim exits 1 (not fail-open) to prevent loops; dispatch comment posted deterministically by wrapper, not LLM worker; claim comment deleted after dispatch to prevent spam

This summary was written by the worker at PR creation time for the deterministic merge pass.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

SonarCloud: 0 bugs, 0 vulnerabilities, 1 code smells

Wed Apr 1 21:54:46 UTC 2026: Code review monitoring started
Wed Apr 1 21:54:46 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 1

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 1
  • VULNERABILITIES: 0

Generated on: Wed Apr 1 21:54:49 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 1, 2026

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Auto-created from TODO.md tag origin:worker Auto-created by pulse labelless backfill (t2112)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: dispatch claim system fails to prevent duplicate dispatch — 5 design flaws

1 participant