Skip to content

t1160.6: Add claude to orphan process detection in pulse.sh Phase 5 #1752

@github-actions

Description

@github-actions

Task ID: t1160.6 | Status: completed | Estimate: ~15m
Assignee: @marcusquinn | Started: 2026-02-21T06:53:18Z | Completed: 2026-02-21
Tags: auto-dispatch

Description

Add claude to orphan process detection in pulse.sh Phase 5 — pgrep pattern only matches opencode.

Plan: Context & Architecture

Phases

Phase 0: No-Regression Refactor (t1160.1-t1160.7) ~5.5h
Pure refactoring of the dispatch stack. No behavior change. All existing OpenCode dispatch continues to work identically.

  • Audit complete: 12+ CLI branches identified across 6 supervisor modules
  • t1160.1 Create build_cli_cmd() abstraction — single function replaces all duplicated branches
  • t1160.2 Add SUPERVISOR_CLI env var — explicit override, default auto-detect
  • t1160.3 Claude CLI branching in runner-helper.sh (currently OpenCode-only)
  • t1160.4 Claude CLI branching in contest-helper.sh (currently OpenCode-only)
  • t1160.5 Fix email-signature-parser-helper.sh (currently Claude-only, should use resolve_ai_cli)
  • t1160.6 Add claude to orphan process detection in pulse.sh Phase 5
  • t1160.7 Integration test: SUPERVISOR_CLI=claude full dispatch cycle
    Verification gate: Run existing supervisor test suite + manual pulse with both SUPERVISOR_CLI=opencode and SUPERVISOR_CLI=claude. Both must produce identical outcomes for the same task.
    Phase 1: Claude Code Config Parity in setup.sh (t1161) ~4h
    Make aidevops setup and aidevops update deploy equivalent configuration to Claude Code.
  • t1161.1 generate-claude-commands.sh — slash commands to ~/.claude/commands/
  • t1161.2 Automated MCP registration via claude mcp add-json
  • t1161.3 Enhanced ~/.claude/settings.json with tool permissions (merge, don't overwrite hooks)
  • t1161.4 Wire update_claude_config() into setup.sh (conditional on claude binary)
    Key design decisions:
  • Slash commands generated from same source as OpenCode commands, with minor format adaptation (OpenCode agent: Build+ frontmatter ignored by Claude Code)
  • MCP registration uses existing configs/mcp-templates/ claude_code_command entries
  • settings.json merge strategy: read existing, deep-merge new permissions, preserve hooks
  • Entire phase conditional on command -v claude — no-op if Claude Code not installed
    Verification gate: Fresh aidevops setup on a machine with both CLIs produces working configs for both. Claude Code interactive session has slash commands and MCPs available.
    Phase 2: Worker MCP Isolation for Claude CLI (t1162) ~2h
    When dispatching workers via claude -p, provide equivalent MCP isolation to OpenCode's generate_worker_mcp_config().
  • t1162 Create generate_worker_mcp_config_claude() — builds temporary JSON for --mcp-config
  • Use --strict-mcp-config to prevent workers from using user's global MCP config
  • Cleanup: remove temp config files after worker exits
    Verification gate: Worker dispatched via Claude CLI gets exactly the MCPs specified, not the user's full set.
    Phase 3: OAuth-Aware Dispatch (t1163) ~2h
    The value proposition: workers on Max subscription = no per-token cost for Anthropic models.
  • t1163 Detect OAuth: claude -p "OK" --output-format text succeeds without ANTHROPIC_API_KEY
  • SUPERVISOR_PREFER_OAUTH env var (default: true)
  • When true + dispatching Anthropic models + OAuth available → use claude CLI
  • When dispatching non-Anthropic models (OpenRouter, Groq, etc.) → always use opencode
  • Budget tracker: record Claude CLI dispatches as subscription billing type
  • Leverage --max-budget-usd for per-worker cost caps
  • Leverage --fallback-model for native fallback
  • Auth failure detection: if Claude CLI returns auth error, fall back to OpenCode + API key
    Verification gate: Mixed batch with Anthropic + non-Anthropic tasks routes correctly. Anthropic tasks go via Claude CLI (OAuth), non-Anthropic via OpenCode. Auth failure triggers automatic fallback.
    Phase 4: End-to-End Verification (t1164) ~2h
    Comprehensive testing of the complete dual-CLI architecture before proceeding to containerization.
  • t1164 Full regression suite:
    • Pure OpenCode batch (existing behavior, must be identical)
    • Pure Claude CLI batch (all Anthropic models)
    • Mixed batch (Anthropic via Claude, non-Anthropic via OpenCode)
    • OAuth failure scenario (Claude CLI auth expires mid-batch → fallback to OpenCode)
    • Config parity check (both CLIs have equivalent slash commands, MCPs)
    • Cost tracking verification (subscription vs token billing recorded correctly)
      Verification gate: All scenarios pass. No regressions to existing workflows. Cost tracking accurate.
      Phase 5: Containerized Multi-Subscription Scaling (t1165) ~6h
      Scale beyond a single subscription's rate limits by running Claude Code CLI instances in containers, each with its own OAuth token.
  • t1165.1 Container image design:
    • Base: Node.js LTS (Claude CLI requires Node)
    • Install: claude CLI, git, gh, core unix tools
    • Volume mounts: repo checkout (read-write), ~/.aidevops/agents/ (read-only)
    • Token injection: CLAUDE_CODE_OAUTH_TOKEN env var from claude setup-token
    • Permissions: --permission-mode bypassPermissions (trusted container)
    • No MCP servers inside container (injected via --mcp-config per dispatch)
  • t1165.2 Container pool manager:
    • container-pool-helper.sh [create|destroy|list|dispatch|health|scale]
    • Pool config: ~/.config/aidevops/container-pool.json (image, count, tokens, hosts)
    • Dispatch strategy: round-robin across healthy containers, skip rate-limited ones
    • Health checks: periodic claude -p "OK" inside each container
    • Rate limit tracking: per-container request count + 429 detection
    • Auto-scaling: spawn new containers when all existing ones are rate-limited
  • t1165.3 Remote container support:
    • OrbStack remote VMs or SSH to any Docker host
    • Tailscale for secure networking between hosts
    • Credential forwarding: OAuth tokens via encrypted env vars, never in image
    • Log collection: docker logs piped to supervisor log directory
    • Worktree sync: git push from host, git pull inside container (or bind mount for local)
  • t1165.4 Integration test: multi-container batch
    Verification gate: Batch of 6+ tasks dispatched across 3+ containers. Each container uses its own OAuth token. Rate-limited containers are skipped. Logs aggregated correctly. Workers produce valid PRs.

Risks

Risk Likelihood Impact Mitigation
Claude CLI behavior differs from OpenCode in subtle ways Medium High Phase 0.7 integration test catches differences before production use
OAuth token expires mid-batch Medium Medium Auth failure detection + automatic fallback to OpenCode + API key
Claude Code updates break our generated config Low Medium update_claude_config() is idempotent, re-runs on every aidevops update
Claude Code rewrites "OpenCode" in AGENTS.md at load time Confirmed Low Cosmetic only — doesn't affect functionality. Documented as known behavior.
Container networking issues (DNS, port conflicts) Medium Medium OrbStack handles networking; fallback to host-only dispatch
Multiple subscriptions = multiple billing accounts to manage Low Low Container pool config tracks which token belongs to which account
Rate limit changes by Anthropic Low High Per-container rate tracking adapts automatically; pool manager skips limited containers
Plan: Decision Log
Date Decision Rationale
2026-02-18 OpenCode stays primary, Claude Code is fallback OpenCode supports multi-provider routing (OpenRouter, Groq, DeepSeek). Claude CLI is Anthropic-only. Keep the broader capability as primary.
2026-02-18 Phase 0 is pure refactor with no behavior change The 12+ duplicated CLI branches are a maintenance burden and bug risk. Centralizing into build_cli_cmd() is valuable regardless of Claude CLI support.
2026-02-18 SUPERVISOR_CLI env var for explicit override Auto-detection is the default, but operators need a way to force a specific CLI for testing or when both are installed but one is preferred.
2026-02-18 Config parity is conditional on command -v claude Users without Claude Code installed should not see errors or slowdowns. The entire Claude Code config path is a no-op if the binary is absent.
2026-02-18 --strict-mcp-config for worker MCP isolation Prevents workers from accidentally using the user's full MCP set. Each worker gets exactly the MCPs it needs, nothing more.
2026-02-18 OAuth detection via test invocation, not token file inspection Claude Code stores OAuth in the macOS keychain (not a file we can inspect). The only reliable test is whether claude -p succeeds without ANTHROPIC_API_KEY. Cache the result for the pulse cycle.
2026-02-18 Containerization as Phase 5 (after everything else is tested) Containers add complexity (networking, volume mounts, token management). Only pursue after the single-host dual-CLI path is proven stable.
2026-02-18 CLAUDE_CODE_OAUTH_TOKEN env var for container auth claude setup-token generates long-lived tokens specifically for headless/CI use. Each container gets a unique token from a separate subscription account.
2026-02-18 OrbStack as container runtime Already installed (v2.0.5), supports both local containers and remote VMs, lighter than Docker Desktop on macOS.
2026-02-18 All tasks model:opus Sensitive infrastructure work touching the dispatch core. Wrong decisions here break all autonomous orchestration. Opus-tier reasoning is warranted.
Plan: Discoveries
  • Claude Code CLI already supports inline agent definitions via --agents JSON --agent name — this is more flexible than OpenCode's file-based agent config for worker dispatch.
  • --output-format json returns total_cost_usd and full modelUsage breakdown per invocation — better cost tracking than OpenCode provides natively.
  • Claude Code rewrites "OpenCode" references to "Claude Code" when loading AGENTS.md files. This is a Claude Code behavior, not something in our codebase. The deployed file at ~/.aidevops/agents/AGENTS.md correctly says "OpenCode". Confirmed by comparing on-disk content vs system prompt content.
  • The t1022 revert (PR t1022: Make AGENTS.md tool-agnostic #1329) left a residual "Claude Code" reference in .agents/aidevops/architecture.md:44 that should be corrected.
  • claude setup-token is the key to containerized auth — generates long-lived tokens specifically for headless/CI environments, injected via CLAUDE_CODE_OAUTH_TOKEN env var.


Synced from TODO.md by issue-sync-helper.sh

Metadata

Metadata

Assignees

Labels

auto-dispatchAuto-created from TODO.md tagdispatched:opusTask dispatched to opus modelimplemented:opusTask implemented by opus model

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions