t1160.6: Add claude to orphan process detection in pulse.sh Phase 5

**Task ID:** `t1160.6` | **Status:** completed | **Estimate:** `~15m`
**Assignee:** @marcusquinn | **Started:** 2026-02-21T06:53:18Z | **Completed:** 2026-02-21
**Tags:** `auto-dispatch`

## Description

Add claude to orphan process detection in pulse.sh Phase 5 — pgrep pattern only matches opencode.

<details><summary>Plan: Context &amp; Architecture</summary>


**Phases**

**Phase 0: No-Regression Refactor (t1160.1-t1160.7) ~5.5h**
Pure refactoring of the dispatch stack. No behavior change. All existing OpenCode dispatch continues to work identically.
- [x] Audit complete: 12+ CLI branches identified across 6 supervisor modules
- [ ] t1160.1 Create `build_cli_cmd()` abstraction — single function replaces all duplicated branches
- [ ] t1160.2 Add `SUPERVISOR_CLI` env var — explicit override, default auto-detect
- [ ] t1160.3 Claude CLI branching in runner-helper.sh (currently OpenCode-only)
- [ ] t1160.4 Claude CLI branching in contest-helper.sh (currently OpenCode-only)
- [ ] t1160.5 Fix email-signature-parser-helper.sh (currently Claude-only, should use resolve_ai_cli)
- [ ] t1160.6 Add `claude` to orphan process detection in pulse.sh Phase 5
- [ ] t1160.7 Integration test: `SUPERVISOR_CLI=claude` full dispatch cycle
**Verification gate:** Run existing supervisor test suite + manual pulse with both `SUPERVISOR_CLI=opencode` and `SUPERVISOR_CLI=claude`. Both must produce identical outcomes for the same task.
**Phase 1: Claude Code Config Parity in setup.sh (t1161) ~4h**
Make `aidevops setup` and `aidevops update` deploy equivalent configuration to Claude Code.
- [ ] t1161.1 `generate-claude-commands.sh` — slash commands to `~/.claude/commands/`
- [ ] t1161.2 Automated MCP registration via `claude mcp add-json`
- [ ] t1161.3 Enhanced `~/.claude/settings.json` with tool permissions (merge, don't overwrite hooks)
- [ ] t1161.4 Wire `update_claude_config()` into setup.sh (conditional on `claude` binary)
**Key design decisions:**
- Slash commands generated from same source as OpenCode commands, with minor format adaptation (OpenCode `agent: Build+` frontmatter ignored by Claude Code)
- MCP registration uses existing `configs/mcp-templates/` `claude_code_command` entries
- `settings.json` merge strategy: read existing, deep-merge new permissions, preserve hooks
- Entire phase conditional on `command -v claude` — no-op if Claude Code not installed
**Verification gate:** Fresh `aidevops setup` on a machine with both CLIs produces working configs for both. Claude Code interactive session has slash commands and MCPs available.
**Phase 2: Worker MCP Isolation for Claude CLI (t1162) ~2h**
When dispatching workers via `claude -p`, provide equivalent MCP isolation to OpenCode's `generate_worker_mcp_config()`.
- [ ] t1162 Create `generate_worker_mcp_config_claude()` — builds temporary JSON for `--mcp-config`
- [ ] Use `--strict-mcp-config` to prevent workers from using user's global MCP config
- [ ] Cleanup: remove temp config files after worker exits
**Verification gate:** Worker dispatched via Claude CLI gets exactly the MCPs specified, not the user's full set.
**Phase 3: OAuth-Aware Dispatch (t1163) ~2h**
The value proposition: workers on Max subscription = no per-token cost for Anthropic models.
- [ ] t1163 Detect OAuth: `claude -p "OK" --output-format text` succeeds without `ANTHROPIC_API_KEY`
- [ ] `SUPERVISOR_PREFER_OAUTH` env var (default: true)
- [ ] When true + dispatching Anthropic models + OAuth available → use `claude` CLI
- [ ] When dispatching non-Anthropic models (OpenRouter, Groq, etc.) → always use `opencode`
- [ ] Budget tracker: record Claude CLI dispatches as `subscription` billing type
- [ ] Leverage `--max-budget-usd` for per-worker cost caps
- [ ] Leverage `--fallback-model` for native fallback
- [ ] Auth failure detection: if Claude CLI returns auth error, fall back to OpenCode + API key
**Verification gate:** Mixed batch with Anthropic + non-Anthropic tasks routes correctly. Anthropic tasks go via Claude CLI (OAuth), non-Anthropic via OpenCode. Auth failure triggers automatic fallback.
**Phase 4: End-to-End Verification (t1164) ~2h**
Comprehensive testing of the complete dual-CLI architecture before proceeding to containerization.
- [ ] t1164 Full regression suite:
  - Pure OpenCode batch (existing behavior, must be identical)
  - Pure Claude CLI batch (all Anthropic models)
  - Mixed batch (Anthropic via Claude, non-Anthropic via OpenCode)
  - OAuth failure scenario (Claude CLI auth expires mid-batch → fallback to OpenCode)
  - Config parity check (both CLIs have equivalent slash commands, MCPs)
  - Cost tracking verification (subscription vs token billing recorded correctly)
**Verification gate:** All scenarios pass. No regressions to existing workflows. Cost tracking accurate.
**Phase 5: Containerized Multi-Subscription Scaling (t1165) ~6h**
Scale beyond a single subscription's rate limits by running Claude Code CLI instances in containers, each with its own OAuth token.
- [ ] t1165.1 Container image design:
  - Base: Node.js LTS (Claude CLI requires Node)
  - Install: `claude` CLI, `git`, `gh`, core unix tools
  - Volume mounts: repo checkout (read-write), `~/.aidevops/agents/` (read-only)
  - Token injection: `CLAUDE_CODE_OAUTH_TOKEN` env var from `claude setup-token`
  - Permissions: `--permission-mode bypassPermissions` (trusted container)
  - No MCP servers inside container (injected via `--mcp-config` per dispatch)
- [ ] t1165.2 Container pool manager:
  - `container-pool-helper.sh [create|destroy|list|dispatch|health|scale]`
  - Pool config: `~/.config/aidevops/container-pool.json` (image, count, tokens, hosts)
  - Dispatch strategy: round-robin across healthy containers, skip rate-limited ones
  - Health checks: periodic `claude -p "OK"` inside each container
  - Rate limit tracking: per-container request count + 429 detection
  - Auto-scaling: spawn new containers when all existing ones are rate-limited
- [ ] t1165.3 Remote container support:
  - OrbStack remote VMs or SSH to any Docker host
  - Tailscale for secure networking between hosts
  - Credential forwarding: OAuth tokens via encrypted env vars, never in image
  - Log collection: `docker logs` piped to supervisor log directory
  - Worktree sync: git push from host, git pull inside container (or bind mount for local)
- [ ] t1165.4 Integration test: multi-container batch
**Verification gate:** Batch of 6+ tasks dispatched across 3+ containers. Each container uses its own OAuth token. Rate-limited containers are skipped. Logs aggregated correctly. Workers produce valid PRs.

**Risks**

| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Claude CLI behavior differs from OpenCode in subtle ways | Medium | High | Phase 0.7 integration test catches differences before production use |
| OAuth token expires mid-batch | Medium | Medium | Auth failure detection + automatic fallback to OpenCode + API key |
| Claude Code updates break our generated config | Low | Medium | `update_claude_config()` is idempotent, re-runs on every `aidevops update` |
| Claude Code rewrites "OpenCode" in AGENTS.md at load time | Confirmed | Low | Cosmetic only — doesn't affect functionality. Documented as known behavior. |
| Container networking issues (DNS, port conflicts) | Medium | Medium | OrbStack handles networking; fallback to host-only dispatch |
| Multiple subscriptions = multiple billing accounts to manage | Low | Low | Container pool config tracks which token belongs to which account |
| Rate limit changes by Anthropic | Low | High | Per-container rate tracking adapts automatically; pool manager skips limited containers |

</details>

<details><summary>Plan: Decision Log</summary>

| Date | Decision | Rationale |
|------|----------|-----------|
| 2026-02-18 | OpenCode stays primary, Claude Code is fallback | OpenCode supports multi-provider routing (OpenRouter, Groq, DeepSeek). Claude CLI is Anthropic-only. Keep the broader capability as primary. |
| 2026-02-18 | Phase 0 is pure refactor with no behavior change | The 12+ duplicated CLI branches are a maintenance burden and bug risk. Centralizing into `build_cli_cmd()` is valuable regardless of Claude CLI support. |
| 2026-02-18 | `SUPERVISOR_CLI` env var for explicit override | Auto-detection is the default, but operators need a way to force a specific CLI for testing or when both are installed but one is preferred. |
| 2026-02-18 | Config parity is conditional on `command -v claude` | Users without Claude Code installed should not see errors or slowdowns. The entire Claude Code config path is a no-op if the binary is absent. |
| 2026-02-18 | `--strict-mcp-config` for worker MCP isolation | Prevents workers from accidentally using the user's full MCP set. Each worker gets exactly the MCPs it needs, nothing more. |
| 2026-02-18 | OAuth detection via test invocation, not token file inspection | Claude Code stores OAuth in the macOS keychain (not a file we can inspect). The only reliable test is whether `claude -p` succeeds without `ANTHROPIC_API_KEY`. Cache the result for the pulse cycle. |
| 2026-02-18 | Containerization as Phase 5 (after everything else is tested) | Containers add complexity (networking, volume mounts, token management). Only pursue after the single-host dual-CLI path is proven stable. |
| 2026-02-18 | `CLAUDE_CODE_OAUTH_TOKEN` env var for container auth | `claude setup-token` generates long-lived tokens specifically for headless/CI use. Each container gets a unique token from a separate subscription account. |
| 2026-02-18 | OrbStack as container runtime | Already installed (v2.0.5), supports both local containers and remote VMs, lighter than Docker Desktop on macOS. |
| 2026-02-18 | All tasks model:opus | Sensitive infrastructure work touching the dispatch core. Wrong decisions here break all autonomous orchestration. Opus-tier reasoning is warranted. |

</details>

<details><summary>Plan: Discoveries</summary>

- Claude Code CLI already supports inline agent definitions via `--agents JSON --agent name` — this is more flexible than OpenCode's file-based agent config for worker dispatch.
- `--output-format json` returns `total_cost_usd` and full `modelUsage` breakdown per invocation — better cost tracking than OpenCode provides natively.
- Claude Code rewrites "OpenCode" references to "Claude Code" when loading AGENTS.md files. This is a Claude Code behavior, not something in our codebase. The deployed file at `~/.aidevops/agents/AGENTS.md` correctly says "OpenCode". Confirmed by comparing on-disk content vs system prompt content.
- The t1022 revert (PR #1329) left a residual "Claude Code" reference in `.agents/aidevops/architecture.md:44` that should be corrected.
- `claude setup-token` is the key to containerized auth — generates long-lived tokens specifically for headless/CI environments, injected via `CLAUDE_CODE_OAUTH_TOKEN` env var.
---

</details>

---
*Synced from TODO.md by issue-sync-helper.sh*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1160.6: Add claude to orphan process detection in pulse.sh Phase 5 #1752

Description

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Risk	Likelihood	Impact	Mitigation
Claude CLI behavior differs from OpenCode in subtle ways	Medium	High	Phase 0.7 integration test catches differences before production use
OAuth token expires mid-batch	Medium	Medium	Auth failure detection + automatic fallback to OpenCode + API key
Claude Code updates break our generated config	Low	Medium	`update_claude_config()` is idempotent, re-runs on every `aidevops update`
Claude Code rewrites "OpenCode" in AGENTS.md at load time	Confirmed	Low	Cosmetic only — doesn't affect functionality. Documented as known behavior.
Container networking issues (DNS, port conflicts)	Medium	Medium	OrbStack handles networking; fallback to host-only dispatch
Multiple subscriptions = multiple billing accounts to manage	Low	Low	Container pool config tracks which token belongs to which account
Rate limit changes by Anthropic	Low	High	Per-container rate tracking adapts automatically; pool manager skips limited containers

Date	Decision	Rationale
2026-02-18	OpenCode stays primary, Claude Code is fallback	OpenCode supports multi-provider routing (OpenRouter, Groq, DeepSeek). Claude CLI is Anthropic-only. Keep the broader capability as primary.
2026-02-18	Phase 0 is pure refactor with no behavior change	The 12+ duplicated CLI branches are a maintenance burden and bug risk. Centralizing into `build_cli_cmd()` is valuable regardless of Claude CLI support.
2026-02-18	`SUPERVISOR_CLI` env var for explicit override	Auto-detection is the default, but operators need a way to force a specific CLI for testing or when both are installed but one is preferred.
2026-02-18	Config parity is conditional on `command -v claude`	Users without Claude Code installed should not see errors or slowdowns. The entire Claude Code config path is a no-op if the binary is absent.
2026-02-18	`--strict-mcp-config` for worker MCP isolation	Prevents workers from accidentally using the user's full MCP set. Each worker gets exactly the MCPs it needs, nothing more.
2026-02-18	OAuth detection via test invocation, not token file inspection	Claude Code stores OAuth in the macOS keychain (not a file we can inspect). The only reliable test is whether `claude -p` succeeds without `ANTHROPIC_API_KEY`. Cache the result for the pulse cycle.
2026-02-18	Containerization as Phase 5 (after everything else is tested)	Containers add complexity (networking, volume mounts, token management). Only pursue after the single-host dual-CLI path is proven stable.
2026-02-18	`CLAUDE_CODE_OAUTH_TOKEN` env var for container auth	`claude setup-token` generates long-lived tokens specifically for headless/CI use. Each container gets a unique token from a separate subscription account.
2026-02-18	OrbStack as container runtime	Already installed (v2.0.5), supports both local containers and remote VMs, lighter than Docker Desktop on macOS.
2026-02-18	All tasks model:opus	Sensitive infrastructure work touching the dispatch core. Wrong decisions here break all autonomous orchestration. Opus-tier reasoning is warranted.

t1160.6: Add claude to orphan process detection in pulse.sh Phase 5 #1752

Description

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions