enhancement: deterministic blocked-by resolution via cached dependency graph

## Description

The pulse system currently relies on the LLM supervisor to transition `status:blocked` → `status:available` when blockers resolve. The LLM supervisor is gated behind `_should_run_llm_supervisor()` which requires either a 1-hour backlog stall (`PULSE_LLM_STALL_THRESHOLD=3600`) or a 24-hour daily sweep. This creates a systemic delay in dependency chains: when a worker completes a task and closes its issue, downstream tasks that were blocked by it remain `status:blocked` for up to 1 hour — even though the resolution check is entirely deterministic.

### Observed impact

A managed private repo has a 15-task dependency chain (sequential phases where each task is blocked-by its predecessor). When the pulse dispatches workers for the first available tasks and they complete successfully (PRs merged, issues closed), the downstream tasks should become available for dispatch immediately. Instead:

1. Workers complete 3 tasks, closing issues and merging PRs (~20 min total)
2. `_should_run_llm_supervisor()` sees the backlog decreased (3 fewer issues) → records "progress" → **skips the LLM supervisor**
3. The backlog count stabilizes (remaining work is all `status:blocked`)
4. The stall timer starts counting from the last snapshot update
5. **~48 minutes later**, the LLM supervisor finally runs, detects "status:blocked but blockers resolved", and transitions labels
6. The next deterministic fill floor cycle (2 min later) dispatches workers
7. **Total delay: ~50 minutes per dependency layer**

For a 15-task chain with 5 dependency layers, this means ~4 hours of idle time waiting for label transitions that are fundamentally a set membership check.

### Why this is a deterministic operation

`is_blocked_by_unresolved()` (pulse-wrapper.sh:8630) already does the exact check: parse `blocked-by:tNNN` / `blocked-by:#NNN` from the issue body, check if the referenced issues are still open. This is a pure function — no judgment, no edge cases, no ambiguity. It belongs in the deterministic pass, not behind the LLM gate.

## Proposed Solution: Cached Dependency Graph

A two-phase approach that separates the expensive graph construction (infrequent) from the cheap resolution check (every cycle).

### Phase 1: Graph construction (infrequent, O(B) API calls)

Parse all `status:blocked` issue bodies for `blocked-by` references. Build a forward + reverse dependency map and cache it to disk:

```json
{
  "built_at": 1775658447,
  "repo_slug": "owner/repo",
  "forward": {
    "105": [104],
    "106": [104, 105],
    "108": [101, 107],
    "109": [108]
  },
  "reverse": {
    "101": [108],
    "104": [105, 106],
    "105": [106],
    "107": [108],
    "108": [109]
  }
}
```

**When to rebuild:**
- On LLM supervisor runs (daily sweep or stall — the supervisor already reads issue state)
- On a standalone 1-hour cadence via file-based timestamp gate (decouples from LLM scheduling)
- Incrementally: only read bodies for issues added to `status:blocked` since last build

**Data source options (in order of preference):**
1. GitHub's native sub-issues/blocked-by GraphQL API — `issue-sync-helper.sh` already syncs `addBlockedBy` relationships (line 1094). Query these directly instead of parsing body text. Zero body reads needed.
2. Prefetch body parsing — if the prefetch is extended to include issue bodies (currently fetches `number, title, url, assignees, labels, updatedAt` only).
3. Dedicated body fetch — O(B) API calls where B = blocked issues. Acceptable at <500 issues.

**Cache location:** `${PULSE_DIR}/dependency-graph.json` (per-repo, written atomically via temp+rename).

### Phase 2: Graph resolution (every 2-min cycle, zero additional API calls)

The prefetch already fetches all open issues every cycle (for `build_ranked_dispatch_candidates_json`). That data contains the set of all open issue numbers. Checking "is blocker X closed?" = "is X absent from the open issues set?" — a pure set membership check.

```
resolve_blocked_by_graph():
  1. Read cached dependency graph from disk
  2. If cache is missing or stale (>2h), skip (LLM supervisor will rebuild)
  3. Build set of open issue numbers from prefetch data (already in memory)
  4. For each entry in forward map:
     - If ALL blockers are NOT in the open set → all resolved
     - gh issue edit --remove-label "status:blocked" --add-label "status:available"
     - Post comment: "Blockers resolved (#{X}, #{Y} closed). Unblocked for dispatch."
  5. Remove resolved entries from the cached graph (avoid re-checking)
```

**Cost at scale:**

| Blocked issues | Graph edges | API calls per cycle | Computation |
|----------------|-------------|--------------------|-|
| 50 | ~80 | 0 (label swaps only when something changes) | ~80 set lookups |
| 500 | ~800 | 0 | ~800 set lookups |
| 1,000 | ~1,500 | 0 | ~1,500 set lookups |
| 5,000 | ~8,000 | 0 | ~8,000 set lookups |

The only API calls are the label transition itself (one `gh issue edit` per newly-unblocked issue), and those only fire when something actually changes — typically 1-3 per cycle at most.

### Simpler Variant: Event-Driven Post-Merge Forward-Unblock

Instead of scanning the full graph every cycle, trigger resolution only when the merge pass closes an issue:

```
After merge_ready_prs_all_repos() closes issue X:
  1. Look up X in the reverse map → get downstream issues [Y, Z]
  2. For each downstream issue:
     - Check if ALL its blockers (from forward map) are closed
     - If yes → swap labels
  3. Next fill floor cycle (2 min) dispatches worker for Y/Z
```

**Cost per merge:** O(D) where D = number of downstream issues of the closed blocker. Typically 1-3. Scales with merge volume, not total issue count.

**Trade-off:** More targeted (fewer checks) but only triggers on PR merges. Issues closed without a PR merge (manual close, duplicate, etc.) wouldn't trigger forward-unblock until the next graph scan. The full graph scan (Phase 2) could run at a lower frequency (every 10 min) as a catch-all.

**Recommendation:** Implement both — event-driven forward-unblock for the fast path, periodic graph scan as the safety net.

### Integration point in `main()`

```
main()
├── _run_preflight_stages()           # existing — fetches open issues
├── merge_ready_prs_all_repos()       # existing — merges ready PRs
│   └── (event-driven forward-unblock after each merge)  # NEW
├── resolve_blocked_by_graph()        # NEW — periodic graph scan (~60 lines)
│   ├── read cached dependency graph
│   ├── build open-issue set from prefetch data
│   ├── for each blocked issue: check if all blockers ∉ open set
│   └── swap labels for newly-resolved issues
├── apply_deterministic_fill_floor()  # existing — dispatches available issues
└── (LLM supervisor, if triggered)
        └── rebuild_dependency_graph() as side effect  # NEW (~40 lines)
```

### Cache Maintenance and Staleness

- **Rebuild triggers:** LLM supervisor run, standalone 1-hour cadence, new `status:blocked` issue detected in prefetch
- **Incremental updates:** Track `last_build_epoch` and only read bodies for issues with `updatedAt > last_build_epoch`
- **Staleness bound:** Worst case, a newly-blocked issue waits one rebuild cycle (1 hour) before entering the graph. The LLM supervisor remains the safety net for anything the cache misses.
- **Invalidation:** When `resolve_blocked_by_graph()` transitions labels, remove the entry from the forward map and write back. Prevents re-processing.

### Risk Assessment

**Low risk:**
- Resolution is a pure deterministic check — no judgment, no model calls, no new edge cases
- Uses existing `is_blocked_by_unresolved()` parsing logic for graph construction
- Label swaps are the same operation the LLM supervisor already performs
- Cache staleness is bounded and the LLM supervisor remains the safety net
- Zero impact on the LLM supervisor path — it can still handle blocked-by if it runs

**Potential concerns and mitigations:**
- **Stale cache serves wrong data:** Bounded by rebuild cadence (1h). Worst case: an issue stays blocked one extra hour — same as current behavior.
- **Race condition on label swap:** The dispatch dedup guards (7-layer) already handle this. A label swap during an active dispatch cycle is safe.
- **Graph construction cost at scale:** With GitHub native sub-issues API as the data source, graph construction is a single GraphQL query — not O(B) REST calls. Fall back to body parsing only if native API is unavailable.

### Files to Modify

- `EDIT: .agents/scripts/pulse-wrapper.sh` — add `resolve_blocked_by_graph()`, integrate into `main()` between merge pass and fill floor
- `EDIT: .agents/scripts/pulse-wrapper.sh` — add `rebuild_dependency_graph()`, call from LLM supervisor post-run and standalone cadence
- `EDIT: .agents/scripts/pulse-wrapper.sh` — extend `merge_ready_prs_all_repos()` with event-driven forward-unblock hook
- `NEW: ${PULSE_DIR}/dependency-graph.json` — cached graph (runtime artifact, not committed)

### Verification

```bash
# After implementation, on a repo with blocked-by chains:
# 1. Create two issues: A (no blockers) and B (blocked-by A)
# 2. Let the pulse dispatch a worker for A
# 3. Worker completes A, PR merged, issue closed
# 4. Within 2-4 minutes (not 1 hour): B should have status:available
# 5. Next fill floor cycle dispatches worker for B

# Verify graph cache:
cat "${PULSE_DIR}/dependency-graph.json" | jq .

# Verify resolution log:
grep "resolve_blocked_by_graph\|forward-unblock" "$LOGFILE"
```

## Environment

- aidevops: 3.6.174
- AI Assistant: Claude Code (claude-opus-4-6)
- OS: Ubuntu 24.04.4 LTS
- Shell: bash 5.2.21
- gh CLI: 2.89.0

## Related

- `pulse-wrapper.sh:8630` — `is_blocked_by_unresolved()` — existing blocked-by parser (reuse for graph construction)
- `pulse-wrapper.sh:9429` — `_should_run_llm_supervisor()` — the stall gate that causes the delay
- `pulse-wrapper.sh:10512-10526` — deterministic merge pass + fill floor — the integration point
- `issue-sync-helper.sh:1048-1102` — GitHub native sub-issues sync (`addBlockedBy` GraphQL) — potential data source
- `pulse.md:275` — "status:blocked but blockers resolved → remove label, add status:available" — the LLM instruction being moved to deterministic
- GH#17779 — `_is_task_committed_to_main()` false positive fix — related blocked-by dispatch bug

---

## Review Notes (Approved — tier:standard)

**Reviewer:** claude-opus-4-6 via `/review-issue-pr`

### Validation

- **Reproducible:** Yes — code confirms `_should_run_llm_supervisor()` (line 9598) gates blocked-by resolution behind a 1h stall threshold. `is_blocked_by_unresolved()` (line 8799) is only called defensively at dispatch time to *skip* blocked issues — it never *transitions* labels.
- **Not duplicate:** Confirmed. No prior issues address deterministic blocked-by resolution. GH#17779 is related but distinct.
- **Classification:** Enhancement (current behavior is by design; the issue correctly identifies it should be deterministic).

### Design Corrections

1. **`addBlockedBy` GraphQL claim is inaccurate.** The issue states `issue-sync-helper.sh` already syncs `addBlockedBy` relationships at line 1094. This doesn't exist in the current codebase (1462-line file, no `blocked`/`dependency`/`addBlocked` references). **Drop data source option 1 from the plan.** Body parsing (option 3) is the realistic data source.

2. **The "progress suppresses LLM" paradox is the real insight.** Line 9654 shows that when `total_now < total_before`, the snapshot is updated and the LLM is skipped. Workers completing tasks = fewer open issues = "progress" = LLM suppressed. But the remaining issues are all `status:blocked` and can't progress without the LLM. This is the core bug — call it out in commit messages.

### Implementation Guidance for Worker

**Approach: self-maintaining dependency graph** (per @robstiles addendum). The incremental maintenance makes the graph cheap enough that a cache-less intermediate step adds no value. Implement the full graph lifecycle directly:

1. **Cold start (no cache):** Fetch ALL `status:blocked` issue bodies (one-time O(B) cost). Build forward + reverse maps. Write to `${PULSE_DIR}/dependency-graph.json`.
2. **Steady state (every 2-min cycle, ~0 API calls):** Read cached graph. Diff prefetch labels against graph — fetch body only for newly `status:blocked` issues (typically 0-1). Remove entries no longer `status:blocked`. Run resolution check against open-issue set from `PULSE_PREFETCH_CACHE_FILE` (already on disk, zero API calls). Swap labels for resolved issues.
3. **Supervisor backstop:** Full graph rebuild on LLM supervisor runs. Catches body edits, manual label changes, drift. Overwrites cache — consistency check, not primary path.
4. **Event-driven forward-unblock:** After `merge_ready_prs_all_repos()` closes an issue, look up its reverse map entries and check if downstream issues are now fully unblocked.

**Integration point:** Between `merge_ready_prs_all_repos()` and `apply_deterministic_fill_floor()` in `main()` (~line 10688).

**Data source:** `PULSE_PREFETCH_CACHE_FILE` (`~/.aidevops/logs/pulse-prefetch-cache.json`) for the open-issue set. Issue bodies via `gh issue view` for blocked-by parsing (reuse `is_blocked_by_unresolved()` logic).

**Known limitations (acceptable):**
- Cross-repo `blocked-by` not supported (existing limitation, separate enhancement)
- Body edits between graph rebuilds not detected until next supervisor run (rare in practice)
- Stall detection unaffected — label swaps don't change open issue count

**Tier: `tier:standard`** — straightforward engineering using existing functions and data sources, no novel design needed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement: deterministic blocked-by resolution via cached dependency graph #17871

Description

Observed impact

Why this is a deterministic operation

Proposed Solution: Cached Dependency Graph

Phase 1: Graph construction (infrequent, O(B) API calls)

Phase 2: Graph resolution (every 2-min cycle, zero additional API calls)

Simpler Variant: Event-Driven Post-Merge Forward-Unblock

Integration point in `main()`

Cache Maintenance and Staleness

Risk Assessment

Files to Modify

Verification

Environment

Related

Review Notes (Approved — tier:standard)

Validation

Design Corrections

Implementation Guidance for Worker

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Blocked issues	Graph edges	API calls per cycle	Computation
50	~80	0 (label swaps only when something changes)	~80 set lookups
500	~800	0	~800 set lookups
1,000	~1,500	0	~1,500 set lookups
5,000	~8,000	0	~8,000 set lookups

enhancement: deterministic blocked-by resolution via cached dependency graph #17871

Description

Description

Observed impact

Why this is a deterministic operation

Proposed Solution: Cached Dependency Graph

Phase 1: Graph construction (infrequent, O(B) API calls)

Phase 2: Graph resolution (every 2-min cycle, zero additional API calls)

Simpler Variant: Event-Driven Post-Merge Forward-Unblock

Integration point in main()

Cache Maintenance and Staleness

Risk Assessment

Files to Modify

Verification

Environment

Related

Review Notes (Approved — tier:standard)

Validation

Design Corrections

Implementation Guidance for Worker

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Integration point in `main()`