Skip to content

t1357: Mission system: autonomous long-running project orchestration #2494

@marcusquinn

Description

@marcusquinn

Task ID: t1357 | Status: open | Estimate: ~28h (ai:20h test:5h read:3h) | Plan: p034
Logged: 2026-02-27
Tags: plan feature orchestration mission

Description

Mission system: autonomous long-running project orchestration — /mission command and orchestration agent that takes a high-level goal, decomposes into milestones/features, manages resources (accounts, credentials, infrastructure), and drives autonomous execution over hours/days. Two modes: POC (skip ceremony, commit to main) and Full (standard worktree/PR/review). Budget analysis recommends outcome levels for given constraints. Missions start homeless in ~/.aidevops/missions/ and migrate to todo/missions/ when a repo exists. Self-organising folder structure with temporary agents/scripts. Inspired by Factory.ai Missions but extended to full project lifecycle (research, procurement, communication, infrastructure).

Subtasks

  • t1357.1 Create mission state file template — templates/mission-template.md with YAML frontmatter (id, title, status, mode, budget, model_routing, preferences), milestone/feature tracking, resource requirements, budget tracking table, decision log, mission agents section. #auto-dispatch ~2h model:sonnet ref:GH#2495 pr:t1357.1: Create mission state file template #2507 completed:2026-02-27
  • t1357.2 Create /mission command — scripts/commands/mission.md with interactive scoping interview (reuses /define probe techniques), mode selection (POC/Full), budget input, constraint gathering, milestone decomposition using opus-tier reasoning, mission file creation, optional repo creation (aidevops init + git init). Headless mode for supervisor dispatch. #auto-dispatch ~6h model:opus ref:GH#2496 pr:t1357.2: Create /mission command #2508 completed:2026-02-27
  • t1357.3 Create mission orchestrator agent — agent doc with self-organisation guidance, file/folder management patterns, temporary agent creation (draft tier), improvement feedback to aidevops, reference patterns for using existing aidevops capabilities, research guidance for unknown domains. #auto-dispatch ~4h model:opus blocked-by:t1357.1 ref:GH#2497 pr:t1357.3: Add mission orchestrator agent for autonomous multi-day project execution #2510 completed:2026-02-27
  • t1357.4 Add POC mode to /full-loop — skip worktrees (commit to main for dedicated repos, single branch for existing repos), skip brief requirement, skip PR review loops, skip postflight, informal commits. Flag: --poc or detected from mission mode. #auto-dispatch ~2h model:sonnet blocked-by:t1357.2 ref:GH#2498
  • t1357.5 Integrate mission awareness into pulse supervisor — add "check active missions" phase to pulse cycle. For each active mission: check current milestone status, dispatch undispatched features, detect milestone completion and trigger validation, advance milestones, track budget spend. Mission features become regular TODO entries with mission:mNNN tag. #auto-dispatch ~4h model:opus blocked-by:t1357.2,t1357.3 ref:GH#2499 pr:t1357.3: Add mission orchestrator agent for autonomous multi-day project execution #2510 completed:2026-02-28
  • t1357.6 Create milestone validation worker — specialised worker dispatched after all features in a milestone complete. Pulls mission branch, runs full test suite + build, optionally runs Playwright browser tests (UI missions), reports pass/fail with specific issues, creates fix tasks on failure linked to milestone. #auto-dispatch ~4h model:sonnet blocked-by:t1357.5 ref:GH#2500 pr:t1357.6: Add milestone validation worker for mission orchestrator #2519 completed:2026-02-28
  • t1357.7 Implement budget analysis and recommendation engine — mission agent analyses likely outcomes for requested budget (time/money/tokens). Recommends budget scales: "For $X/Yh you get basic MVP; for $A/Bh you get production-ready with tests; for $C/Dh you get polished with docs and monitoring." Uses model routing cost data, pattern history for estimation, and task complexity heuristics. Integrates with budget-tracker-helper.sh. #auto-dispatch ~4h model:opus blocked-by:t1357.2 ref:GH#2501 pr:t1357.7: Add budget analysis and recommendation engine for mission system #2516 completed:2026-02-27

Plan: Purpose

Close the gap between "I have an idea" and "autonomous execution." Current aidevops handles task-level work (/full-loop) and supervisor dispatch (/pulse), but nothing takes a high-level goal and drives it to completion over days. Missions extend beyond code into research, procurement, infrastructure setup, and 3rd-party communication — making aidevops a true autonomous project agent.
Inspired by Factory.ai Missions (multi-day autonomous coding, Feb 2026) but significantly broader in scope. Factory solves "multi-day coding tasks." This solves "autonomous project lifecycle from idea to delivery."

Plan: Context & Architecture

Context

Key design decisions:

  • Mission state in git (markdown), not a database — consistent with "GitHub + TODO.md are the database" principle
  • Orchestrator as pulse extension, not separate daemon — avoids new process management
  • POC mode is a flag, not a separate system — same pipeline, fewer gates
  • Milestones sequential, features within milestones parallelisable — Factory found this works better than broad parallelism
  • One orchestrator layer, not recursive — Factory notes recursive depth as open question; one layer suffices for our scale
  • Budget analysis before execution — mission agent should tell you what you'll get for your budget before starting
  • Mission-specific agents are draft-tier — temporary tools, promoted if generally useful
    Factory.ai Missions analysis (Feb 2026):
  • Median mission: ~2 hours. 65% run >1 hour. 37% run >4 hours. 14% run >24 hours.
  • Missions use ~2x token weight per message vs normal sessions (19K vs 11K)
  • Multi-model: orchestrator (opus), workers (sonnet), validators (varies), research (cheapest)
  • Key insight: "serial execution with targeted parallelization has worked better than broad parallelism"
  • Open questions they identified: parallelization balance, correctness over long horizons, worker scope, recursive management depth
    What aidevops already has (strong overlap):
  • Worker dispatch, task decomposition, fresh context per worker, multi-model routing, git as source of truth, validation (preflight/postflight), failure recovery, skill/memory, browser QA, task briefs, worker efficiency, autonomous operation
    What's genuinely new:
  • Mission-level orchestration (goal → milestones → features → validation → completion)
  • Milestone validation (pause after milestone N, validate integration, then proceed)
  • Mission state persistence (durable entity grouping tasks into a coherent goal)
  • Automatic re-planning (validation failure → create fix tasks)
  • POC/shortcut mode
  • Budget feasibility analysis and outcome-level recommendations
  • Self-organising mission folders
  • Autonomous procurement (payment agent)
  • 3rd-party communication (email agent)

Context from Discussion

Key design decisions:

  • Mission state in git (markdown), not a database — consistent with "GitHub + TODO.md are the database" principle
  • Orchestrator as pulse extension, not separate daemon — avoids new process management
  • POC mode is a flag, not a separate system — same pipeline, fewer gates
  • Milestones sequential, features within milestones parallelisable — Factory found this works better than broad parallelism
  • One orchestrator layer, not recursive — Factory notes recursive depth as open question; one layer suffices for our scale
  • Budget analysis before execution — mission agent should tell you what you'll get for your budget before starting
  • Mission-specific agents are draft-tier — temporary tools, promoted if generally useful
    Factory.ai Missions analysis (Feb 2026):
  • Median mission: ~2 hours. 65% run >1 hour. 37% run >4 hours. 14% run >24 hours.
  • Missions use ~2x token weight per message vs normal sessions (19K vs 11K)
  • Multi-model: orchestrator (opus), workers (sonnet), validators (varies), research (cheapest)
  • Key insight: "serial execution with targeted parallelization has worked better than broad parallelism"
  • Open questions they identified: parallelization balance, correctness over long horizons, worker scope, recursive management depth
    What aidevops already has (strong overlap):
  • Worker dispatch, task decomposition, fresh context per worker, multi-model routing, git as source of truth, validation (preflight/postflight), failure recovery, skill/memory, browser QA, task briefs, worker efficiency, autonomous operation
    What's genuinely new:
  • Mission-level orchestration (goal → milestones → features → validation → completion)
  • Milestone validation (pause after milestone N, validate integration, then proceed)
  • Mission state persistence (durable entity grouping tasks into a coherent goal)
  • Automatic re-planning (validation failure → create fix tasks)
  • POC/shortcut mode
  • Budget feasibility analysis and outcome-level recommendations
  • Self-organising mission folders
  • Autonomous procurement (payment agent)
  • 3rd-party communication (email agent)

Architecture

/mission "Build a CRM with contacts, deals, and email"
    │
    ▼
Phase 1: SCOPING (interactive interview, opus-tier)
    ├── Goal, mode (POC/Full), budget, constraints, preferences
    ├── Existing repo / new repo / homeless (no repo yet)
    └── Budget analysis: "For $X you get Y; for $A you get B"
    │
    ▼
Phase 2: DECOMPOSITION (opus-tier)
    ├── Research phase (if needed)
    ├── 3-7 milestones (sequential)
    ├── 2-5 features per milestone (parallelisable)
    ├── Resource requirements (accounts, services, credentials)
    └── Creates mission.md + TODO entries + GitHub issues
    │
    ▼
Phase 3: EXECUTION (autonomous, pulse-integrated)
    ├── For each milestone (sequential):
    │   ├── Dispatch features as workers
    │   ├── Self-organise: create agents/scripts as needed
    │   ├── Track budget (time, money, tokens)
    │   └── On complete → milestone validation
    │       ├── Pass → advance
    │       └── Fail → create fix tasks, re-validate
    │
    ▼
Phase 4: COMPLETION
    ├── Final validation, budget reconciliation
    ├── Offer improvements back to aidevops
    └── Summary report
Plan: Progress
  • (2026-02-27) Phase 1: Foundation — template, command, orchestrator agent ~12h
    • t1357.1 Mission state file template ~2h
    • t1357.2 /mission command ~6h
    • t1357.3 Mission orchestrator agent ~4h
  • Phase 2: Execution modes — POC mode, pulse integration ~6h
    • t1357.4 POC mode in /full-loop ~2h
    • t1357.5 Pulse integration ~4h
  • Phase 3: Validation & budget — milestone validation, budget engine ~8h
    • t1357.6 Milestone validation worker ~4h
    • t1357.7 Budget analysis engine ~4h
  • Dependent features ~24h
    • t1358 Payment agent ~8h
    • t1359 Browser QA in validation ~4h
    • t1360 Email agent for missions ~4h
    • t1361 Skill learning ~4h
    • t1362 Progress dashboard ~4h
Plan: Decision Log

(To be populated during implementation)

Task Brief

t1357: Mission System — Autonomous Long-Running Project Orchestration

Origin

  • Created: 2026-02-27
  • Session: Claude Code interactive session
  • Created by: marcusquinn (human) + ai-interactive
  • Conversation context: Analysis of Factory.ai Missions (multi-day autonomous coding) led to a broader vision: an autonomous project agent that can research, procure, communicate, build, and self-organise across days/weeks. Not just "multi-day coding" but a full project lifecycle from idea to delivery.

What

A /mission command and mission orchestration agent that takes a high-level goal ("Build a CRM", "Migrate this codebase to TypeScript", "Research and prototype a recommendation engine"), decomposes it into milestones and features, manages resources (accounts, credentials, payments, infrastructure), and drives autonomous execution over hours to days — with two modes:

  1. POC mode — fast iteration, skip ceremony (briefs, PRs, reviews), commit to main or a single branch
  2. Full mode — production-quality with standard worktree/PR/review workflows

The mission agent must:

  • Analyse budget feasibility and recommend budget scales for various outcome levels
  • Self-organise its files and folders as needs are discovered
  • Create temporary agents and scripts for the mission, and offer improvements back to aidevops
  • Use browser automation for reviewing its own progress and visual research
  • Handle email, secrets, and account management for 3rd-party interactions
  • Know and respect budgets (time, money, tokens) with model provider options
  • Reference aidevops patterns for how to do things it already has working examples for
  • Research its own examples when aidevops doesn't have them
  • Know user preferences and constraints

Mission homes:

  • ~/.aidevops/missions/{id}/ — homeless missions (no repo yet, POC drafting)
  • todo/missions/{id}/ — missions attached to a project repo

Mission folder structure:

{mission-id}/
├── mission.md          # State file (source of truth)
├── research/           # Gathered research, comparisons, references
├── agents/             # Mission-specific temporary agents
├── scripts/            # Mission-specific temporary scripts
└── assets/             # Screenshots, PDFs, exports, visual research

Why

Current aidevops handles task-level work (/full-loop) and supervisor-level dispatch (/pulse), but nothing takes a high-level goal and drives it to completion over days. The gap between "I have an idea" and "tasks are in TODO.md ready for dispatch" requires manual decomposition. Missions close this gap and extend beyond code into research, procurement, and infrastructure setup — making aidevops a true autonomous project agent.

Factory.ai's Missions validates the market need but their scope is narrower (coding only). Our vision includes the full project lifecycle.

How (Approach)

Phase 1: Foundation (t1357.1-t1357.3)

  • Create mission state file template (templates/mission-template.md)
  • Create /mission command (scripts/commands/mission.md) with interactive scoping interview
  • Create mission orchestrator agent doc with self-organisation guidance

Phase 2: Execution Modes (t1357.4-t1357.5)

  • Add POC mode to /full-loop (skip worktrees, skip review, commit to main/branch)
  • Integrate mission awareness into pulse supervisor

Phase 3: Validation & Budget (t1357.6-t1357.7)

  • Create milestone validation worker
  • Implement budget analysis and recommendation engine (time/money/tokens)

Dependent Features (t1358-t1362)

  • Payment agent for autonomous procurement
  • Mission-aware browser QA in milestone validation
  • Email agent for 3rd-party communication during missions
  • Mission skill learning (auto-capture reusable patterns)
  • Mission progress dashboard (CLI + browser)

Key patterns to follow:

  • scripts/commands/define.md — interview technique for scoping
  • scripts/commands/pulse.md — supervisor dispatch pattern
  • scripts/commands/full-loop.md — worker execution pattern
  • workflows/plans.md — planning and task decomposition
  • tools/build-agent/build-agent.md — agent creation lifecycle (draft tier)
  • reference/orchestration.md — model routing and dispatch

Key design decisions:

  • Mission state in git (markdown), not a database — consistent with "GitHub + TODO.md are the database"
  • Orchestrator as pulse extension, not separate daemon
  • POC mode is a flag, not a separate system
  • Milestones sequential, features within milestones parallelisable
  • One orchestrator layer (no recursive sub-orchestrators)
  • Missions start homeless in ~/.aidevops/missions/, migrate to todo/missions/ when a repo exists
  • Mission agents/scripts are temporary (draft tier), with promotion path to aidevops shared

Acceptance Criteria

  • /mission "description" starts an interactive scoping interview
    verify:
      method: bash
      run: "test -f ~/.aidevops/agents/scripts/commands/mission.md"
  • Mission state file created in correct location (homeless or repo-attached)
    verify:
      method: codebase
      pattern: "status: planning"
      path: "templates/mission-template.md"
  • POC mode commits directly to main (dedicated repo) or single branch (existing repo)
  • Full mode uses standard worktree + PR workflow
  • Budget analysis recommends outcome levels for given budget
  • Mission self-organises its folder (research/, agents/, scripts/, assets/)
  • Pulse supervisor dispatches mission features as workers
  • Milestone validation runs after all features in a milestone complete
  • Mission agents created in draft tier with promotion path
  • Budget tracking (time, money, tokens) with alerts at thresholds

Context & Decisions

  • Factory.ai Missions validated the concept but their scope is coding-only. Our vision extends to full project lifecycle (research, procurement, communication, infrastructure).
  • Milestones are sequential with parallel features within — Factory found "serial execution with targeted parallelization has worked better than broad parallelism."
  • One orchestrator layer, not recursive — Factory notes recursive management depth as an open question; for our scale, one layer suffices.
  • POC mode exists because most missions start as proof-of-concept. The ceremony of briefs/PRs/reviews is valuable for production work but counterproductive for exploration.
  • Budget analysis is critical — the mission agent should tell you "for $200 and 40h, you'll get X; for $500 and 80h, you'll get Y" before starting.
  • Mission-specific agents are draft-tier by design. They're temporary tools for the mission. If they prove generally useful, they get promoted to custom/ or shared/.

Relevant Files

  • scripts/commands/define.md — interview pattern to reuse for mission scoping
  • scripts/commands/pulse.md — supervisor dispatch to extend with mission awareness
  • scripts/commands/full-loop.md — worker execution to add POC mode
  • workflows/plans.md — planning patterns for milestone/feature decomposition
  • templates/brief-template.md — brief format (used in full mode, skipped in POC)
  • tools/build-agent/build-agent.md — agent lifecycle tiers (draft for mission agents)
  • reference/orchestration.md — model routing for mission orchestrator/workers
  • tools/ai-assistants/headless-dispatch.md — worker dispatch patterns
  • tools/browser/browser-automation.md — browser QA for milestone validation
  • services/email/ — email capabilities for 3rd-party communication
  • tools/credentials/ — secret management for mission accounts

Dependencies

  • Blocked by: None (greenfield)
  • Blocks: None directly, but enables a new class of autonomous work
  • External: None for MVP; payment agent (t1358) needs virtual card provider; email agent (t1360) needs SES or similar configured

Estimate Breakdown

Phase Time Notes
Research/read 2h Existing patterns, Factory analysis
t1357.1 Mission template 2h State file format
t1357.2 /mission command 6h Interactive scoping + decomposition
t1357.3 Mission orchestrator agent 4h Self-organisation, guidance
t1357.4 POC mode in /full-loop 2h Skip ceremony flags
t1357.5 Pulse integration 4h Mission-aware dispatch
t1357.6 Milestone validation 4h Integration testing worker
t1357.7 Budget analysis engine 4h Feasibility + recommendations
Total ~28h
Dependent features Time Notes
t1358 Payment agent 8h Virtual cards, budget enforcement
t1359 Mission browser QA 4h Visual validation in milestones
t1360 Email agent for missions 4h 3rd-party communication
t1361 Mission skill learning 4h Auto-capture reusable patterns
t1362 Mission progress dashboard 4h CLI + browser progress view
Dependent total ~24h

Synced from TODO.md by issue-sync-helper.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAuto-created from TODO.md tagmissionAuto-created from TODO.md tagorchestrationAuto-created from TODO.md tagplanAuto-created from TODO.md tagstatus:blockedWaiting on blocker task

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions