Quality gates prevent agents from degrading codebase health. They address the "diffusion of responsibility" problem where every agent assumes test/typecheck/lint failures are pre-existing and ignores them, leading to cumulative quality drift.
The system has three layers:
- Baseline-Diff Gate — captures quality metrics before the agent starts, compares after, blocks promotion if the agent made things worse
- Quality Ratchet — monotonic thresholds committed to the repo that can only tighten (improve), never loosen
- Prompt-Level Guidance — TDD workflow, boy scout rule, and self-check instructions baked into agent templates
┌──────────────────────────────────────────────────────────────────┐
│ Agent Session Lifecycle │
│ │
│ 1. createWorktree() │
│ └─ Capture quality baseline on main │
│ (test counts, typecheck errors, lint errors) │
│ → .agent/quality-baseline.json │
│ │
│ 2. spawnAgent() │
│ └─ Inject baseline into template context │
│ Agent sees: "You started with 0 failing tests, │
│ 3 type errors. Don't make it worse." │
│ │
│ 3. Agent works (TDD: red → green → refactor) │
│ └─ Self-checks delta before committing │
│ │
│ 4. Post-session backstop │
│ └─ Quality delta check: │
│ captureCurrentQuality() → computeQualityDelta() │
│ If delta > 0 → BLOCK promotion, post diagnostic │
│ If delta < 0 → Log boy-scout improvement │
│ │
│ 5. QA agent (baseline-aware) │
│ └─ Distinguishes agent-caused vs pre-existing failures │
│ │
│ 6. Merge queue │
│ └─ Quality ratchet check before merge │
│ If thresholds violated → reject │
│ If metrics improved → tighten ratchet │
│ │
│ 7. CI │
│ └─ quality-ratchet job validates every PR │
└──────────────────────────────────────────────────────────────────┘
Enable quality gates in .agentfactory/config.yaml:
quality:
baselineEnabled: true # Capture baseline + post-session delta check
ratchetEnabled: true # Enforce quality ratchet in merge queue
boyscoutRule: true # Include "fix issues in files you touch" instructions
tddWorkflow: true # Include red/green/refactor workflow instructionsAll settings default to false/true respectively. The quality section is optional — omitting it disables baseline capture and ratchet enforcement while keeping prompt-level guidance enabled.
The orchestrator runs three commands on the worktree and parses their output:
| Metric | Command | Parser |
|---|---|---|
| Test counts (total, passed, failed, skipped) | {testCommand} -- --reporter=json |
Vitest/Jest JSON output |
| Typecheck error count | {validateCommand} |
Count error TS\d+ lines in stderr |
| Lint error count | {lintCommand} (if configured) |
ESLint summary line or error line count |
Commands are resolved from the repository config (testCommand, validateCommand) or default to pnpm test, pnpm typecheck.
Baseline capture happens once per worktree, at the end of createWorktree(), after the worktree is initialized on main. The result is stored in .agent/quality-baseline.json inside the worktree.
Delta check happens after the session backstop, before auto-transition. The orchestrator re-runs the same commands on the agent's branch, computes the delta, and decides:
| Delta | Result |
|---|---|
| All metrics same or better | Gate passes, promotion proceeds |
| Any failure metric increased | Gate fails, agent.status = 'failed', diagnostic comment posted |
| Metrics improved | Gate passes, improvement logged |
testFailuresDelta = current.tests.failed - baseline.tests.failed
typeErrorsDelta = current.typecheck.errors - baseline.typecheck.errors
lintErrorsDelta = current.lint.errors - baseline.lint.errors
testCountDelta = current.tests.total - baseline.tests.total
passed = testFailuresDelta <= 0
&& typeErrorsDelta <= 0
&& lintErrorsDelta <= 0
A negative testCountDelta (tests were removed) generates a warning in the report but does not fail the gate.
When the quality gate fails, the orchestrator:
- Posts a markdown comment to the issue with a comparison table
- Sets
agent.status = 'failed'andagent.workResult = 'failed' - The auto-transition block transitions the issue to the failure status
- The escalation governor triggers a retry — the retry agent receives the baseline in its prompt and knows not to regress
- If baseline capture fails during worktree creation, a warning is logged but the worktree is created normally. The agent simply won't have baseline data in its prompt and no post-session delta check runs.
- If the post-session delta check fails (e.g., test command hangs), the error is caught and logged. The session proceeds without quality gating.
- If no
.agent/quality-baseline.jsonexists (older worktree, manual worktree), the delta check is skipped.
The quality ratchet is a committed JSON file that stores the best-known quality thresholds. It lives at .agentfactory/quality-ratchet.json:
{
"version": 1,
"updatedAt": "2026-03-30T00:00:00Z",
"updatedBy": "SUP-456",
"thresholds": {
"testCount": { "min": 847 },
"testFailures": { "max": 0 },
"typecheckErrors": { "max": 12 },
"lintErrors": { "max": 34 }
}
}| Threshold | Direction | Meaning |
|---|---|---|
testCount.min |
Can only go up | Agents cannot delete tests without adding replacements |
testFailures.max |
Can only go down | Each merge that fixes a failure tightens the floor |
typecheckErrors.max |
Can only go down | Type errors ratchet toward zero |
lintErrors.max |
Can only go down | Lint errors ratchet toward zero |
Merge queue (merge-worker.ts): After tests pass but before finalize, the worker loads the ratchet, checks thresholds, and rejects the merge if any are violated. If metrics improved, it auto-tightens the ratchet and commits the updated file (included in the merge).
CI (.github/workflows/ci.yml): The quality-ratchet job runs on every PR. It reads the ratchet file, runs test/typecheck, and fails if thresholds are violated. This job only runs when the ratchet file exists (if: hashFiles(...)).
To create the initial ratchet file, use the initializeQualityRatchet() function from @renseiai/agentfactory:
import { captureQualityBaseline, initializeQualityRatchet } from '@renseiai/agentfactory'
const baseline = captureQualityBaseline('/path/to/repo', { packageManager: 'pnpm' })
const ratchet = initializeQualityRatchet('/path/to/repo', baseline)
// Commit .agentfactory/quality-ratchet.jsonOr capture baseline metrics manually and create the file by hand.
Development agents receive explicit TDD instructions:
- Write failing tests for the acceptance criteria (RED)
- Implement the minimum code to pass (GREEN)
- Refactor, keeping tests green (REFACTOR)
- Run the full suite before committing
This is injected via the {{> partials/quality-baseline}} partial, conditional on qualityBaseline being present in the template context.
Agents are instructed: "If you modify a file that has pre-existing lint warnings, type errors, or failing tests, fix them while you are there." This is scoped to files the agent is already touching to prevent merge conflicts with concurrent agents.
The commit-push-pr partial includes a quality delta self-check step that reminds the agent to verify it hasn't worsened metrics before committing. This catches regressions early, saving a full QA cycle.
QA agents receive the same baseline data and are instructed to distinguish agent-caused failures from pre-existing ones:
- If failures exceed baseline → HARD FAIL (agent made it worse)
- If failures equal baseline → pre-existing, do not fail QA for these
These are complementary, not alternatives:
| Aspect | Baseline-Diff Gate | Quality Ratchet |
|---|---|---|
| Scope | Per-session | Repository-wide |
| Prevents | Agent making things worse | Cumulative drift across all agents |
| Baseline source | Captured at session start | Committed in repo |
| Update frequency | Every session | On merge (auto-tighten) |
| Enforcement | Orchestrator post-session | Merge queue + CI |
Use both: the baseline-diff catches regressions in-session, and the ratchet prevents drift across many agents and branches.
| File | Purpose |
|---|---|
packages/core/src/orchestrator/quality-baseline.ts |
Capture metrics, compute deltas, format reports |
packages/core/src/orchestrator/quality-ratchet.ts |
Load, check, update, initialize ratchet file |
packages/core/src/templates/defaults/partials/quality-baseline.yaml |
TDD + boy scout + baseline instructions |
packages/core/src/orchestrator/orchestrator.ts |
Integration: capture, inject, check |
packages/core/src/merge-queue/merge-worker.ts |
Ratchet enforcement at merge time |
packages/core/src/config/repository-config.ts |
quality config section schema |
.agentfactory/config.yaml |
Per-repo quality gate settings |
.agentfactory/quality-ratchet.json |
Committed ratchet thresholds |
.github/workflows/ci.yml |
quality-ratchet CI job |
Runs test/typecheck/lint commands and returns structured metrics.
interface QualityBaseline {
timestamp: string
commitSha: string
tests: { total: number; passed: number; failed: number; skipped: number }
typecheck: { errorCount: number; exitCode: number }
lint: { errorCount: number; warningCount: number }
}Pure arithmetic comparison. Returns { passed: boolean, testFailuresDelta, typeErrorsDelta, lintErrorsDelta, testCountDelta }.
Markdown table comparing baseline vs current with pass/fail badge.
Reads and validates .agentfactory/quality-ratchet.json. Returns null if not found.
Compares current metrics against ratchet thresholds. Returns { passed: boolean, violations: [...] }.
Tightens thresholds if current metrics are better. Returns true if ratchet was updated.
Creates the initial ratchet file from a baseline snapshot.