Skip to content

t1359: Mission-aware browser QA in milestone validation #2503

@marcusquinn

Description

@marcusquinn

Task ID: t1359 | Status: open | Estimate: ~4h
Logged: 2026-02-27
Tags: feature browser mission auto-dispatch

Description

Mission-aware browser QA in milestone validation — extend milestone validation worker (t1357.6) with Playwright-based visual testing. Launch app, navigate flows, screenshot key pages, compare against acceptance criteria. Detect layout bugs, broken links, missing content. Integrates with existing browser automation stack (Playwright > Stagehand > DevTools).

Blocked by: t1357.6

Task Brief

t1359: Mission-Aware Browser QA in Milestone Validation

Origin

  • Created: 2026-02-28
  • Session: Claude Code headless worker
  • Created by: ai-worker (dispatched from issue t1359: Mission-aware browser QA in milestone validation #2503)
  • Conversation context: Part of the mission system (t1357) dependent features. The milestone validation worker (t1357.6) runs tests/build/lint after milestone features complete, but has only a stub for browser testing — it runs the project's existing Playwright test suite if present. t1359 adds mission-aware visual QA: launch the app, navigate flows defined in the mission's acceptance criteria, screenshot key pages, detect layout bugs, broken links, and missing content.

What

A browser-qa-worker.sh script and workflows/browser-qa.md documentation that provide Playwright-based visual QA for milestone validation. The worker:

  1. Launches the app (or connects to a running instance at a given URL)
  2. Navigates key user flows (from mission acceptance criteria or a configurable flow list)
  3. Screenshots every page visited (stored in {mission-dir}/assets/qa/)
  4. Checks for broken links (HTTP status on all <a> hrefs)
  5. Checks for missing/error content (empty body, error states, 404 pages)
  6. Detects console errors (JS exceptions, failed network requests)
  7. Captures ARIA snapshots for structural validation
  8. Outputs a structured pass/fail report with screenshot paths
  9. Integrates with milestone-validation-worker.sh via a new --browser-qa flag

Why

The existing --browser-tests flag in milestone-validation-worker.sh only runs the project's own Playwright test suite (if it has one). Many projects — especially POC-mode missions — won't have a pre-written test suite. The browser QA worker provides zero-config visual validation: give it a URL and acceptance criteria, and it checks that the app renders correctly, links work, and key flows are navigable. This closes the gap between "tests pass" and "the app actually works visually."

How (Approach)

New files

  1. .agents/scripts/browser-qa-worker.sh (~400-600 lines)

    • Uses Playwright via npx playwright (Node.js) for headless browser automation
    • Generates a temporary .mjs script that Playwright executes
    • Checks: page loads, screenshots, broken links, console errors, ARIA snapshots
    • Configurable via CLI flags and mission file criteria
    • Outputs JSON report + human-readable summary
  2. .agents/workflows/browser-qa.md (~100-150 lines)

    • Documents the browser QA system
    • Integration guide with milestone validation
    • Configuration options and examples

Modified files

  1. .agents/scripts/milestone-validation-worker.sh

    • Add --browser-qa flag (distinct from existing --browser-tests)
    • Add --browser-qa-flows flag for custom flow definitions
    • Add run_browser_qa() function that invokes browser-qa-worker.sh
    • Keep existing run_browser_tests() unchanged (backward compatible)
  2. .agents/workflows/milestone-validation.md

    • Add browser QA to the "Optional Checks" table
    • Document the new flags
  3. .agents/subagent-index.toon

    • Add browser-qa-worker.sh to scripts section

Key design decisions

  • Separate script, not inline: Browser QA is complex enough to warrant its own script. The milestone validation worker delegates to it, keeping both scripts focused.
  • Playwright via npx: No new dependencies — Playwright is already in the browser automation stack. Uses npx playwright which auto-installs if needed.
  • Temporary .mjs script: The QA logic runs as a Node.js script generated by the shell wrapper. This gives full Playwright API access while keeping the orchestration in bash.
  • Zero-config default: With just --browser-qa --browser-url http://localhost:3000, the worker crawls the homepage, checks links, screenshots pages, and reports errors. Custom flows are optional.
  • Mission-aware: When invoked from milestone validation, reads the mission file's acceptance criteria to determine what flows to test.

Acceptance Criteria

  • browser-qa-worker.sh --url http://localhost:3000 runs headless Playwright, screenshots the page, checks for console errors and broken links
    verify:
      method: bash
      run: "test -f ~/.aidevops/agents/scripts/browser-qa-worker.sh && bash ~/.aidevops/agents/scripts/browser-qa-worker.sh --help | grep -q 'browser-qa-worker'"
  • milestone-validation-worker.sh accepts --browser-qa flag and delegates to browser-qa-worker.sh
    verify:
      method: bash
      run: "grep -q 'browser-qa' ~/.aidevops/agents/scripts/milestone-validation-worker.sh"
  • Screenshots are saved to configurable output directory (default: {mission-dir}/assets/qa/)
  • Broken link detection reports HTTP status codes for all <a> hrefs on visited pages
  • Console error detection captures JS exceptions and failed network requests
  • ShellCheck passes on all new/modified .sh files with zero violations

Context & Decisions

  • The existing --browser-tests flag runs the project's own Playwright test suite. --browser-qa is complementary — it provides generic visual QA that works even without a test suite.
  • Playwright is the fastest browser engine in the stack (1.4s navigate+screenshot). Stagehand would add AI overhead (~7s) that isn't needed for deterministic checks.
  • ARIA snapshots (~0.01s, 50-200 tokens) are preferred over screenshots for structural validation. Screenshots are captured for human review but not used for automated pass/fail decisions.
  • The worker generates a temporary .mjs file rather than shipping a permanent JS file because the QA logic needs to be parameterised per-run (URLs, flows, output paths).

Relevant Files

  • scripts/milestone-validation-worker.sh — existing validation worker to extend
  • workflows/milestone-validation.md — existing validation docs to update
  • workflows/mission-orchestrator.md — orchestrator that invokes validation
  • tools/browser/browser-automation.md — browser tool selection guide
  • tools/browser/playwright.md — Playwright reference
  • scripts/accessibility/playwright-contrast.mjs — existing Playwright script pattern

Dependencies

  • Blocked by: t1357.6 (milestone validation worker) — MERGED
  • Blocks: None
  • External: Playwright (npx playwright) — already in the stack

Synced from TODO.md by issue-sync-helper.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    auto-dispatchAuto-created from TODO.md tagbrowserAuto-created from TODO.md tagenhancementAuto-created from TODO.md tagmissionAuto-created from TODO.md tagstatus:doneTask is complete

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions