Quality Gates

Quality gates prevent agents from degrading codebase health. They address the "diffusion of responsibility" problem where every agent assumes test/typecheck/lint failures are pre-existing and ignores them, leading to cumulative quality drift.

The system has three layers:

Baseline-Diff Gate — captures quality metrics before the agent starts, compares after, blocks promotion if the agent made things worse
Quality Ratchet — monotonic thresholds committed to the repo that can only tighten (improve), never loosen
Prompt-Level Guidance — TDD workflow, boy scout rule, and self-check instructions baked into agent templates

How It Works

┌──────────────────────────────────────────────────────────────────┐
│                    Agent Session Lifecycle                        │
│                                                                  │
│  1. createWorktree()                                             │
│     └─ Capture quality baseline on main                          │
│        (test counts, typecheck errors, lint errors)              │
│        → .agent/quality-baseline.json                            │
│                                                                  │
│  2. spawnAgent()                                                 │
│     └─ Inject baseline into template context                     │
│        Agent sees: "You started with 0 failing tests,            │
│        3 type errors. Don't make it worse."                      │
│                                                                  │
│  3. Agent works (TDD: red → green → refactor)                    │
│     └─ Self-checks delta before committing                       │
│                                                                  │
│  4. Post-session backstop                                        │
│     └─ Quality delta check:                                      │
│        captureCurrentQuality() → computeQualityDelta()           │
│        If delta > 0 → BLOCK promotion, post diagnostic           │
│        If delta < 0 → Log boy-scout improvement                  │
│                                                                  │
│  5. QA agent (baseline-aware)                                    │
│     └─ Distinguishes agent-caused vs pre-existing failures       │
│                                                                  │
│  6. Merge queue                                                  │
│     └─ Quality ratchet check before merge                        │
│        If thresholds violated → reject                           │
│        If metrics improved → tighten ratchet                     │
│                                                                  │
│  7. CI                                                           │
│     └─ quality-ratchet job validates every PR                    │
└──────────────────────────────────────────────────────────────────┘

Configuration

Enable quality gates in .agentfactory/config.yaml:

quality:
  baselineEnabled: true    # Capture baseline + post-session delta check
  ratchetEnabled: true     # Enforce quality ratchet in merge queue
  boyscoutRule: true       # Include "fix issues in files you touch" instructions
  tddWorkflow: true        # Include red/green/refactor workflow instructions

All settings default to false/true respectively. The quality section is optional — omitting it disables baseline capture and ratchet enforcement while keeping prompt-level guidance enabled.

Baseline-Diff Gate

What It Captures

The orchestrator runs three commands on the worktree and parses their output:

Metric	Command	Parser
Test counts (total, passed, failed, skipped)	`{testCommand} -- --reporter=json`	Vitest/Jest JSON output
Typecheck error count	`{validateCommand}`	Count `error TS\d+` lines in stderr
Lint error count	`{lintCommand}` (if configured)	ESLint summary line or error line count

Commands are resolved from the repository config (testCommand, validateCommand) or default to pnpm test, pnpm typecheck.

When It Runs

Baseline capture happens once per worktree, at the end of createWorktree(), after the worktree is initialized on main. The result is stored in .agent/quality-baseline.json inside the worktree.

Delta check happens after the session backstop, before auto-transition. The orchestrator re-runs the same commands on the agent's branch, computes the delta, and decides:

Delta	Result
All metrics same or better	Gate passes, promotion proceeds
Any failure metric increased	Gate fails, `agent.status = 'failed'`, diagnostic comment posted
Metrics improved	Gate passes, improvement logged

Delta Computation

testFailuresDelta  = current.tests.failed    - baseline.tests.failed
typeErrorsDelta    = current.typecheck.errors - baseline.typecheck.errors
lintErrorsDelta    = current.lint.errors      - baseline.lint.errors
testCountDelta     = current.tests.total      - baseline.tests.total

passed = testFailuresDelta <= 0
      && typeErrorsDelta   <= 0
      && lintErrorsDelta   <= 0

A negative testCountDelta (tests were removed) generates a warning in the report but does not fail the gate.

Failure Handling

When the quality gate fails, the orchestrator:

Posts a markdown comment to the issue with a comparison table
Sets agent.status = 'failed' and agent.workResult = 'failed'
The auto-transition block transitions the issue to the failure status
The escalation governor triggers a retry — the retry agent receives the baseline in its prompt and knows not to regress

Graceful Degradation

If baseline capture fails during worktree creation, a warning is logged but the worktree is created normally. The agent simply won't have baseline data in its prompt and no post-session delta check runs.
If the post-session delta check fails (e.g., test command hangs), the error is caught and logged. The session proceeds without quality gating.
If no .agent/quality-baseline.json exists (older worktree, manual worktree), the delta check is skipped.

Quality Ratchet

The quality ratchet is a committed JSON file that stores the best-known quality thresholds. It lives at .agentfactory/quality-ratchet.json:

{
  "version": 1,
  "updatedAt": "2026-03-30T00:00:00Z",
  "updatedBy": "SUP-456",
  "thresholds": {
    "testCount": { "min": 847 },
    "testFailures": { "max": 0 },
    "typecheckErrors": { "max": 12 },
    "lintErrors": { "max": 34 }
  }
}

Rules

Threshold	Direction	Meaning
`testCount.min`	Can only go up	Agents cannot delete tests without adding replacements
`testFailures.max`	Can only go down	Each merge that fixes a failure tightens the floor
`typecheckErrors.max`	Can only go down	Type errors ratchet toward zero
`lintErrors.max`	Can only go down	Lint errors ratchet toward zero

Enforcement Points

Merge queue (merge-worker.ts): After tests pass but before finalize, the worker loads the ratchet, checks thresholds, and rejects the merge if any are violated. If metrics improved, it auto-tightens the ratchet and commits the updated file (included in the merge).

CI (.github/workflows/ci.yml): The quality-ratchet job runs on every PR. It reads the ratchet file, runs test/typecheck, and fails if thresholds are violated. This job only runs when the ratchet file exists (if: hashFiles(...)).

Initializing the Ratchet

To create the initial ratchet file, use the initializeQualityRatchet() function from @renseiai/agentfactory:

import { captureQualityBaseline, initializeQualityRatchet } from '@renseiai/agentfactory'

const baseline = captureQualityBaseline('/path/to/repo', { packageManager: 'pnpm' })
const ratchet = initializeQualityRatchet('/path/to/repo', baseline)
// Commit .agentfactory/quality-ratchet.json

Or capture baseline metrics manually and create the file by hand.

Prompt-Level Guidance

TDD Workflow

Development agents receive explicit TDD instructions:

Write failing tests for the acceptance criteria (RED)
Implement the minimum code to pass (GREEN)
Refactor, keeping tests green (REFACTOR)
Run the full suite before committing

This is injected via the {{> partials/quality-baseline}} partial, conditional on qualityBaseline being present in the template context.

Boy Scout Rule

Agents are instructed: "If you modify a file that has pre-existing lint warnings, type errors, or failing tests, fix them while you are there." This is scoped to files the agent is already touching to prevent merge conflicts with concurrent agents.

Self-Check Before Commit

The commit-push-pr partial includes a quality delta self-check step that reminds the agent to verify it hasn't worsened metrics before committing. This catches regressions early, saving a full QA cycle.

QA Baseline Awareness

QA agents receive the same baseline data and are instructed to distinguish agent-caused failures from pre-existing ones:

If failures exceed baseline → HARD FAIL (agent made it worse)
If failures equal baseline → pre-existing, do not fail QA for these

Baseline vs Ratchet

These are complementary, not alternatives:

Aspect	Baseline-Diff Gate	Quality Ratchet
Scope	Per-session	Repository-wide
Prevents	Agent making things worse	Cumulative drift across all agents
Baseline source	Captured at session start	Committed in repo
Update frequency	Every session	On merge (auto-tighten)
Enforcement	Orchestrator post-session	Merge queue + CI

Use both: the baseline-diff catches regressions in-session, and the ratchet prevents drift across many agents and branches.

Key Files

File	Purpose
`packages/core/src/orchestrator/quality-baseline.ts`	Capture metrics, compute deltas, format reports
`packages/core/src/orchestrator/quality-ratchet.ts`	Load, check, update, initialize ratchet file
`packages/core/src/templates/defaults/partials/quality-baseline.yaml`	TDD + boy scout + baseline instructions
`packages/core/src/orchestrator/orchestrator.ts`	Integration: capture, inject, check
`packages/core/src/merge-queue/merge-worker.ts`	Ratchet enforcement at merge time
`packages/core/src/config/repository-config.ts`	`quality` config section schema
`.agentfactory/config.yaml`	Per-repo quality gate settings
`.agentfactory/quality-ratchet.json`	Committed ratchet thresholds
`.github/workflows/ci.yml`	`quality-ratchet` CI job

API Reference

`captureQualityBaseline(worktreePath, config?)`

Runs test/typecheck/lint commands and returns structured metrics.

interface QualityBaseline {
  timestamp: string
  commitSha: string
  tests: { total: number; passed: number; failed: number; skipped: number }
  typecheck: { errorCount: number; exitCode: number }
  lint: { errorCount: number; warningCount: number }
}

`computeQualityDelta(baseline, current)`

Pure arithmetic comparison. Returns { passed: boolean, testFailuresDelta, typeErrorsDelta, lintErrorsDelta, testCountDelta }.

`formatQualityReport(baseline, current, delta)`

Markdown table comparing baseline vs current with pass/fail badge.

`loadQualityRatchet(repoRoot)`

Reads and validates .agentfactory/quality-ratchet.json. Returns null if not found.

`checkQualityRatchet(ratchet, current)`

Compares current metrics against ratchet thresholds. Returns { passed: boolean, violations: [...] }.

`updateQualityRatchet(repoRoot, current, identifier)`

Tightens thresholds if current metrics are better. Returns true if ratchet was updated.

`initializeQualityRatchet(repoRoot, baseline)`

Creates the initial ratchet file from a baseline snapshot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality Gates

How It Works

Configuration

Baseline-Diff Gate

What It Captures

When It Runs

Delta Computation

Failure Handling

Graceful Degradation

Quality Ratchet

Rules

Enforcement Points

Initializing the Ratchet

Prompt-Level Guidance

TDD Workflow

Boy Scout Rule

Self-Check Before Commit

QA Baseline Awareness

Baseline vs Ratchet

Key Files

API Reference

`captureQualityBaseline(worktreePath, config?)`

`computeQualityDelta(baseline, current)`

`formatQualityReport(baseline, current, delta)`

`loadQualityRatchet(repoRoot)`

`checkQualityRatchet(ratchet, current)`

`updateQualityRatchet(repoRoot, current, identifier)`

`initializeQualityRatchet(repoRoot, baseline)`

FilesExpand file tree

quality-gates.md

Latest commit

History

quality-gates.md

File metadata and controls

Quality Gates

How It Works

Configuration

Baseline-Diff Gate

What It Captures

When It Runs

Delta Computation

Failure Handling

Graceful Degradation

Quality Ratchet

Rules

Enforcement Points

Initializing the Ratchet

Prompt-Level Guidance

TDD Workflow

Boy Scout Rule

Self-Check Before Commit

QA Baseline Awareness

Baseline vs Ratchet

Key Files

API Reference

captureQualityBaseline(worktreePath, config?)

computeQualityDelta(baseline, current)

formatQualityReport(baseline, current, delta)

loadQualityRatchet(repoRoot)

checkQualityRatchet(ratchet, current)

updateQualityRatchet(repoRoot, current, identifier)

initializeQualityRatchet(repoRoot, baseline)

`captureQualityBaseline(worktreePath, config?)`

`computeQualityDelta(baseline, current)`

`formatQualityReport(baseline, current, delta)`

`loadQualityRatchet(repoRoot)`

`checkQualityRatchet(ratchet, current)`

`updateQualityRatchet(repoRoot, current, identifier)`

`initializeQualityRatchet(repoRoot, baseline)`