Description
The pulse system currently relies on the LLM supervisor to transition status:blocked → status:available when blockers resolve. The LLM supervisor is gated behind _should_run_llm_supervisor() which requires either a 1-hour backlog stall (PULSE_LLM_STALL_THRESHOLD=3600) or a 24-hour daily sweep. This creates a systemic delay in dependency chains: when a worker completes a task and closes its issue, downstream tasks that were blocked by it remain status:blocked for up to 1 hour — even though the resolution check is entirely deterministic.
Observed impact
A managed private repo has a 15-task dependency chain (sequential phases where each task is blocked-by its predecessor). When the pulse dispatches workers for the first available tasks and they complete successfully (PRs merged, issues closed), the downstream tasks should become available for dispatch immediately. Instead:
- Workers complete 3 tasks, closing issues and merging PRs (~20 min total)
_should_run_llm_supervisor() sees the backlog decreased (3 fewer issues) → records "progress" → skips the LLM supervisor
- The backlog count stabilizes (remaining work is all
status:blocked)
- The stall timer starts counting from the last snapshot update
- ~48 minutes later, the LLM supervisor finally runs, detects "status:blocked but blockers resolved", and transitions labels
- The next deterministic fill floor cycle (2 min later) dispatches workers
- Total delay: ~50 minutes per dependency layer
For a 15-task chain with 5 dependency layers, this means ~4 hours of idle time waiting for label transitions that are fundamentally a set membership check.
Why this is a deterministic operation
is_blocked_by_unresolved() (pulse-wrapper.sh:8630) already does the exact check: parse blocked-by:tNNN / blocked-by:#NNN from the issue body, check if the referenced issues are still open. This is a pure function — no judgment, no edge cases, no ambiguity. It belongs in the deterministic pass, not behind the LLM gate.
Proposed Solution: Cached Dependency Graph
A two-phase approach that separates the expensive graph construction (infrequent) from the cheap resolution check (every cycle).
Phase 1: Graph construction (infrequent, O(B) API calls)
Parse all status:blocked issue bodies for blocked-by references. Build a forward + reverse dependency map and cache it to disk:
{
"built_at": 1775658447,
"repo_slug": "owner/repo",
"forward": {
"105": [104],
"106": [104, 105],
"108": [101, 107],
"109": [108]
},
"reverse": {
"101": [108],
"104": [105, 106],
"105": [106],
"107": [108],
"108": [109]
}
}
When to rebuild:
- On LLM supervisor runs (daily sweep or stall — the supervisor already reads issue state)
- On a standalone 1-hour cadence via file-based timestamp gate (decouples from LLM scheduling)
- Incrementally: only read bodies for issues added to
status:blocked since last build
Data source options (in order of preference):
- GitHub's native sub-issues/blocked-by GraphQL API —
issue-sync-helper.sh already syncs addBlockedBy relationships (line 1094). Query these directly instead of parsing body text. Zero body reads needed.
- Prefetch body parsing — if the prefetch is extended to include issue bodies (currently fetches
number, title, url, assignees, labels, updatedAt only).
- Dedicated body fetch — O(B) API calls where B = blocked issues. Acceptable at <500 issues.
Cache location: ${PULSE_DIR}/dependency-graph.json (per-repo, written atomically via temp+rename).
Phase 2: Graph resolution (every 2-min cycle, zero additional API calls)
The prefetch already fetches all open issues every cycle (for build_ranked_dispatch_candidates_json). That data contains the set of all open issue numbers. Checking "is blocker X closed?" = "is X absent from the open issues set?" — a pure set membership check.
resolve_blocked_by_graph():
1. Read cached dependency graph from disk
2. If cache is missing or stale (>2h), skip (LLM supervisor will rebuild)
3. Build set of open issue numbers from prefetch data (already in memory)
4. For each entry in forward map:
- If ALL blockers are NOT in the open set → all resolved
- gh issue edit --remove-label "status:blocked" --add-label "status:available"
- Post comment: "Blockers resolved (#{X}, #{Y} closed). Unblocked for dispatch."
5. Remove resolved entries from the cached graph (avoid re-checking)
Cost at scale:
| Blocked issues |
Graph edges |
API calls per cycle |
Computation |
| 50 |
~80 |
0 (label swaps only when something changes) |
~80 set lookups |
| 500 |
~800 |
0 |
~800 set lookups |
| 1,000 |
~1,500 |
0 |
~1,500 set lookups |
| 5,000 |
~8,000 |
0 |
~8,000 set lookups |
The only API calls are the label transition itself (one gh issue edit per newly-unblocked issue), and those only fire when something actually changes — typically 1-3 per cycle at most.
Simpler Variant: Event-Driven Post-Merge Forward-Unblock
Instead of scanning the full graph every cycle, trigger resolution only when the merge pass closes an issue:
After merge_ready_prs_all_repos() closes issue X:
1. Look up X in the reverse map → get downstream issues [Y, Z]
2. For each downstream issue:
- Check if ALL its blockers (from forward map) are closed
- If yes → swap labels
3. Next fill floor cycle (2 min) dispatches worker for Y/Z
Cost per merge: O(D) where D = number of downstream issues of the closed blocker. Typically 1-3. Scales with merge volume, not total issue count.
Trade-off: More targeted (fewer checks) but only triggers on PR merges. Issues closed without a PR merge (manual close, duplicate, etc.) wouldn't trigger forward-unblock until the next graph scan. The full graph scan (Phase 2) could run at a lower frequency (every 10 min) as a catch-all.
Recommendation: Implement both — event-driven forward-unblock for the fast path, periodic graph scan as the safety net.
Integration point in main()
main()
├── _run_preflight_stages() # existing — fetches open issues
├── merge_ready_prs_all_repos() # existing — merges ready PRs
│ └── (event-driven forward-unblock after each merge) # NEW
├── resolve_blocked_by_graph() # NEW — periodic graph scan (~60 lines)
│ ├── read cached dependency graph
│ ├── build open-issue set from prefetch data
│ ├── for each blocked issue: check if all blockers ∉ open set
│ └── swap labels for newly-resolved issues
├── apply_deterministic_fill_floor() # existing — dispatches available issues
└── (LLM supervisor, if triggered)
└── rebuild_dependency_graph() as side effect # NEW (~40 lines)
Cache Maintenance and Staleness
- Rebuild triggers: LLM supervisor run, standalone 1-hour cadence, new
status:blocked issue detected in prefetch
- Incremental updates: Track
last_build_epoch and only read bodies for issues with updatedAt > last_build_epoch
- Staleness bound: Worst case, a newly-blocked issue waits one rebuild cycle (1 hour) before entering the graph. The LLM supervisor remains the safety net for anything the cache misses.
- Invalidation: When
resolve_blocked_by_graph() transitions labels, remove the entry from the forward map and write back. Prevents re-processing.
Risk Assessment
Low risk:
- Resolution is a pure deterministic check — no judgment, no model calls, no new edge cases
- Uses existing
is_blocked_by_unresolved() parsing logic for graph construction
- Label swaps are the same operation the LLM supervisor already performs
- Cache staleness is bounded and the LLM supervisor remains the safety net
- Zero impact on the LLM supervisor path — it can still handle blocked-by if it runs
Potential concerns and mitigations:
- Stale cache serves wrong data: Bounded by rebuild cadence (1h). Worst case: an issue stays blocked one extra hour — same as current behavior.
- Race condition on label swap: The dispatch dedup guards (7-layer) already handle this. A label swap during an active dispatch cycle is safe.
- Graph construction cost at scale: With GitHub native sub-issues API as the data source, graph construction is a single GraphQL query — not O(B) REST calls. Fall back to body parsing only if native API is unavailable.
Files to Modify
EDIT: .agents/scripts/pulse-wrapper.sh — add resolve_blocked_by_graph(), integrate into main() between merge pass and fill floor
EDIT: .agents/scripts/pulse-wrapper.sh — add rebuild_dependency_graph(), call from LLM supervisor post-run and standalone cadence
EDIT: .agents/scripts/pulse-wrapper.sh — extend merge_ready_prs_all_repos() with event-driven forward-unblock hook
NEW: ${PULSE_DIR}/dependency-graph.json — cached graph (runtime artifact, not committed)
Verification
# After implementation, on a repo with blocked-by chains:
# 1. Create two issues: A (no blockers) and B (blocked-by A)
# 2. Let the pulse dispatch a worker for A
# 3. Worker completes A, PR merged, issue closed
# 4. Within 2-4 minutes (not 1 hour): B should have status:available
# 5. Next fill floor cycle dispatches worker for B
# Verify graph cache:
cat "${PULSE_DIR}/dependency-graph.json" | jq .
# Verify resolution log:
grep "resolve_blocked_by_graph\|forward-unblock" "$LOGFILE"
Environment
- aidevops: 3.6.174
- AI Assistant: Claude Code (claude-opus-4-6)
- OS: Ubuntu 24.04.4 LTS
- Shell: bash 5.2.21
- gh CLI: 2.89.0
Related
pulse-wrapper.sh:8630 — is_blocked_by_unresolved() — existing blocked-by parser (reuse for graph construction)
pulse-wrapper.sh:9429 — _should_run_llm_supervisor() — the stall gate that causes the delay
pulse-wrapper.sh:10512-10526 — deterministic merge pass + fill floor — the integration point
issue-sync-helper.sh:1048-1102 — GitHub native sub-issues sync (addBlockedBy GraphQL) — potential data source
pulse.md:275 — "status:blocked but blockers resolved → remove label, add status:available" — the LLM instruction being moved to deterministic
- GH#17779 —
_is_task_committed_to_main() false positive fix — related blocked-by dispatch bug
Review Notes (Approved — tier:standard)
Reviewer: claude-opus-4-6 via /review-issue-pr
Validation
- Reproducible: Yes — code confirms
_should_run_llm_supervisor() (line 9598) gates blocked-by resolution behind a 1h stall threshold. is_blocked_by_unresolved() (line 8799) is only called defensively at dispatch time to skip blocked issues — it never transitions labels.
- Not duplicate: Confirmed. No prior issues address deterministic blocked-by resolution. GH#17779 is related but distinct.
- Classification: Enhancement (current behavior is by design; the issue correctly identifies it should be deterministic).
Design Corrections
-
addBlockedBy GraphQL claim is inaccurate. The issue states issue-sync-helper.sh already syncs addBlockedBy relationships at line 1094. This doesn't exist in the current codebase (1462-line file, no blocked/dependency/addBlocked references). Drop data source option 1 from the plan. Body parsing (option 3) is the realistic data source.
-
The "progress suppresses LLM" paradox is the real insight. Line 9654 shows that when total_now < total_before, the snapshot is updated and the LLM is skipped. Workers completing tasks = fewer open issues = "progress" = LLM suppressed. But the remaining issues are all status:blocked and can't progress without the LLM. This is the core bug — call it out in commit messages.
Implementation Guidance for Worker
Approach: self-maintaining dependency graph (per @robstiles addendum). The incremental maintenance makes the graph cheap enough that a cache-less intermediate step adds no value. Implement the full graph lifecycle directly:
- Cold start (no cache): Fetch ALL
status:blocked issue bodies (one-time O(B) cost). Build forward + reverse maps. Write to ${PULSE_DIR}/dependency-graph.json.
- Steady state (every 2-min cycle, ~0 API calls): Read cached graph. Diff prefetch labels against graph — fetch body only for newly
status:blocked issues (typically 0-1). Remove entries no longer status:blocked. Run resolution check against open-issue set from PULSE_PREFETCH_CACHE_FILE (already on disk, zero API calls). Swap labels for resolved issues.
- Supervisor backstop: Full graph rebuild on LLM supervisor runs. Catches body edits, manual label changes, drift. Overwrites cache — consistency check, not primary path.
- Event-driven forward-unblock: After
merge_ready_prs_all_repos() closes an issue, look up its reverse map entries and check if downstream issues are now fully unblocked.
Integration point: Between merge_ready_prs_all_repos() and apply_deterministic_fill_floor() in main() (~line 10688).
Data source: PULSE_PREFETCH_CACHE_FILE (~/.aidevops/logs/pulse-prefetch-cache.json) for the open-issue set. Issue bodies via gh issue view for blocked-by parsing (reuse is_blocked_by_unresolved() logic).
Known limitations (acceptable):
- Cross-repo
blocked-by not supported (existing limitation, separate enhancement)
- Body edits between graph rebuilds not detected until next supervisor run (rare in practice)
- Stall detection unaffected — label swaps don't change open issue count
Tier: tier:standard — straightforward engineering using existing functions and data sources, no novel design needed.
Description
The pulse system currently relies on the LLM supervisor to transition
status:blocked→status:availablewhen blockers resolve. The LLM supervisor is gated behind_should_run_llm_supervisor()which requires either a 1-hour backlog stall (PULSE_LLM_STALL_THRESHOLD=3600) or a 24-hour daily sweep. This creates a systemic delay in dependency chains: when a worker completes a task and closes its issue, downstream tasks that were blocked by it remainstatus:blockedfor up to 1 hour — even though the resolution check is entirely deterministic.Observed impact
A managed private repo has a 15-task dependency chain (sequential phases where each task is blocked-by its predecessor). When the pulse dispatches workers for the first available tasks and they complete successfully (PRs merged, issues closed), the downstream tasks should become available for dispatch immediately. Instead:
_should_run_llm_supervisor()sees the backlog decreased (3 fewer issues) → records "progress" → skips the LLM supervisorstatus:blocked)For a 15-task chain with 5 dependency layers, this means ~4 hours of idle time waiting for label transitions that are fundamentally a set membership check.
Why this is a deterministic operation
is_blocked_by_unresolved()(pulse-wrapper.sh:8630) already does the exact check: parseblocked-by:tNNN/blocked-by:#NNNfrom the issue body, check if the referenced issues are still open. This is a pure function — no judgment, no edge cases, no ambiguity. It belongs in the deterministic pass, not behind the LLM gate.Proposed Solution: Cached Dependency Graph
A two-phase approach that separates the expensive graph construction (infrequent) from the cheap resolution check (every cycle).
Phase 1: Graph construction (infrequent, O(B) API calls)
Parse all
status:blockedissue bodies forblocked-byreferences. Build a forward + reverse dependency map and cache it to disk:{ "built_at": 1775658447, "repo_slug": "owner/repo", "forward": { "105": [104], "106": [104, 105], "108": [101, 107], "109": [108] }, "reverse": { "101": [108], "104": [105, 106], "105": [106], "107": [108], "108": [109] } }When to rebuild:
status:blockedsince last buildData source options (in order of preference):
issue-sync-helper.shalready syncsaddBlockedByrelationships (line 1094). Query these directly instead of parsing body text. Zero body reads needed.number, title, url, assignees, labels, updatedAtonly).Cache location:
${PULSE_DIR}/dependency-graph.json(per-repo, written atomically via temp+rename).Phase 2: Graph resolution (every 2-min cycle, zero additional API calls)
The prefetch already fetches all open issues every cycle (for
build_ranked_dispatch_candidates_json). That data contains the set of all open issue numbers. Checking "is blocker X closed?" = "is X absent from the open issues set?" — a pure set membership check.Cost at scale:
The only API calls are the label transition itself (one
gh issue editper newly-unblocked issue), and those only fire when something actually changes — typically 1-3 per cycle at most.Simpler Variant: Event-Driven Post-Merge Forward-Unblock
Instead of scanning the full graph every cycle, trigger resolution only when the merge pass closes an issue:
Cost per merge: O(D) where D = number of downstream issues of the closed blocker. Typically 1-3. Scales with merge volume, not total issue count.
Trade-off: More targeted (fewer checks) but only triggers on PR merges. Issues closed without a PR merge (manual close, duplicate, etc.) wouldn't trigger forward-unblock until the next graph scan. The full graph scan (Phase 2) could run at a lower frequency (every 10 min) as a catch-all.
Recommendation: Implement both — event-driven forward-unblock for the fast path, periodic graph scan as the safety net.
Integration point in
main()Cache Maintenance and Staleness
status:blockedissue detected in prefetchlast_build_epochand only read bodies for issues withupdatedAt > last_build_epochresolve_blocked_by_graph()transitions labels, remove the entry from the forward map and write back. Prevents re-processing.Risk Assessment
Low risk:
is_blocked_by_unresolved()parsing logic for graph constructionPotential concerns and mitigations:
Files to Modify
EDIT: .agents/scripts/pulse-wrapper.sh— addresolve_blocked_by_graph(), integrate intomain()between merge pass and fill floorEDIT: .agents/scripts/pulse-wrapper.sh— addrebuild_dependency_graph(), call from LLM supervisor post-run and standalone cadenceEDIT: .agents/scripts/pulse-wrapper.sh— extendmerge_ready_prs_all_repos()with event-driven forward-unblock hookNEW: ${PULSE_DIR}/dependency-graph.json— cached graph (runtime artifact, not committed)Verification
Environment
Related
pulse-wrapper.sh:8630—is_blocked_by_unresolved()— existing blocked-by parser (reuse for graph construction)pulse-wrapper.sh:9429—_should_run_llm_supervisor()— the stall gate that causes the delaypulse-wrapper.sh:10512-10526— deterministic merge pass + fill floor — the integration pointissue-sync-helper.sh:1048-1102— GitHub native sub-issues sync (addBlockedByGraphQL) — potential data sourcepulse.md:275— "status:blocked but blockers resolved → remove label, add status:available" — the LLM instruction being moved to deterministic_is_task_committed_to_main()false positive fix — related blocked-by dispatch bugReview Notes (Approved — tier:standard)
Reviewer: claude-opus-4-6 via
/review-issue-prValidation
_should_run_llm_supervisor()(line 9598) gates blocked-by resolution behind a 1h stall threshold.is_blocked_by_unresolved()(line 8799) is only called defensively at dispatch time to skip blocked issues — it never transitions labels.Design Corrections
addBlockedByGraphQL claim is inaccurate. The issue statesissue-sync-helper.shalready syncsaddBlockedByrelationships at line 1094. This doesn't exist in the current codebase (1462-line file, noblocked/dependency/addBlockedreferences). Drop data source option 1 from the plan. Body parsing (option 3) is the realistic data source.The "progress suppresses LLM" paradox is the real insight. Line 9654 shows that when
total_now < total_before, the snapshot is updated and the LLM is skipped. Workers completing tasks = fewer open issues = "progress" = LLM suppressed. But the remaining issues are allstatus:blockedand can't progress without the LLM. This is the core bug — call it out in commit messages.Implementation Guidance for Worker
Approach: self-maintaining dependency graph (per @robstiles addendum). The incremental maintenance makes the graph cheap enough that a cache-less intermediate step adds no value. Implement the full graph lifecycle directly:
status:blockedissue bodies (one-time O(B) cost). Build forward + reverse maps. Write to${PULSE_DIR}/dependency-graph.json.status:blockedissues (typically 0-1). Remove entries no longerstatus:blocked. Run resolution check against open-issue set fromPULSE_PREFETCH_CACHE_FILE(already on disk, zero API calls). Swap labels for resolved issues.merge_ready_prs_all_repos()closes an issue, look up its reverse map entries and check if downstream issues are now fully unblocked.Integration point: Between
merge_ready_prs_all_repos()andapply_deterministic_fill_floor()inmain()(~line 10688).Data source:
PULSE_PREFETCH_CACHE_FILE(~/.aidevops/logs/pulse-prefetch-cache.json) for the open-issue set. Issue bodies viagh issue viewfor blocked-by parsing (reuseis_blocked_by_unresolved()logic).Known limitations (acceptable):
blocked-bynot supported (existing limitation, separate enhancement)Tier:
tier:standard— straightforward engineering using existing functions and data sources, no novel design needed.