Skip to content

Add quality-playbook skill#1168

Merged
aaronpowell merged 1 commit intogithub:stagedfrom
andrewstellman:add-quality-playbook-skill
Mar 25, 2026
Merged

Add quality-playbook skill#1168
aaronpowell merged 1 commit intogithub:stagedfrom
andrewstellman:add-quality-playbook-skill

Conversation

@andrewstellman
Copy link
Copy Markdown
Contributor

@andrewstellman andrewstellman commented Mar 25, 2026

Pull Request Checklist

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have read and followed the Guidance for submissions involving paid services.
  • My contribution adds a new instruction, prompt, agent, skill, or workflow file in the correct directory.
  • The file follows the required naming convention.
  • The content is clearly structured and follows the example format.
  • I have tested my instructions, prompt, agent, skill, or workflow with GitHub Copilot.
  • I have run npm start and verified that README.md is up to date.
  • I am targeting the staged branch for this pull request.

Quality Playbook skill

A skill that revives traditional quality engineering practices — the kind most teams cut decades ago — and uses AI to make them cheap enough to run on every project.

Most AI testing tools work from source code: they generate test stubs based on what the code does. This skill does something different. It explores a codebase first — reading specs, schemas, defensive patterns, architecture, even developer chat history — to understand what the code is supposed to do, then generates a complete quality infrastructure to verify it. That's the gap between "does this function return the right value?" and "does this system fulfill its purpose?" — and it's the problem quality engineering was invented to solve.

###What it generates

Six deliverables that together form a repeatable quality system:

Deliverable What It Does
QUALITY.md Quality constitution — defines what "correct" means for this specific project, with coverage targets tied to rationale so future sessions can't argue standards down
Functional tests Spec-traced tests in the project's native language and test framework, derived from what the spec says should happen — not from what the code does
RUN_CODE_REVIEW.md Code review protocol with anti-hallucination guardrails: cite line numbers, grep before claiming something is missing, flag uncertainty as QUESTION not BUG
RUN_INTEGRATION_TESTS.md Integration test protocol with a runnable test matrix — an AI agent can read this file and execute it as an autonomous test runner
RUN_SPEC_AUDIT.md Council of Three protocol: three independent AI models audit code against specs (in testing, 74% of defects were caught by only one model — single-model review misses most problems)
AGENTS.md Bootstrap file so every future AI session inherits the quality system instead of starting from scratch

What it doesn't do

This is not a test stub generator, a linter, or a code review bot. It doesn't mechanically produce tests from source code. It doesn't scan for vulnerabilities. It doesn't score your existing test suite. It builds the quality system that tells you what to test, why, and how to know when it's working — the piece that sits between "here are specifications" and "here are tests" that no existing tool generates.

Existing tools fall into four buckets, none of which do what this skill does:

  • Test generators (Diffblue, Qodo, Early.ai) generate tests mechanically from source code, including tests that verify bugs
  • Code review tools (CodeRabbit, Greptile) flag problems in code already written, single-model
  • Spec-driven frameworks (GitHub Spec Kit, Kiro) manage the process but don't generate the testing infrastructure
  • Open-source skills (agentic-bootstrap, Star Chamber) generate personas or do standalone review, not quality infrastructure

Three things this does that nothing else does

Each maps to a traditional quality practice that got cut for cost:

1. Forensic inversion of defensive patterns (root cause analysis)
Every tool I surveyed treats try/except blocks and null checks as evidence of robustness. This skill inverts that: defensive code is scar tissue from past failures. Every try/except block, every null check, every retry loop points to a failure mode that belongs in the test plan. Instead of proving the code is safe, the skill reads the confessions.

2. Coverage theater prevention (test plan review)
Explicitly defines and flags fake tests: existence-only checks that prove something is there without verifying what it is, presence-only assertions, mocked validators that bypass the thing you're testing, single-variant testing. The difference between 95% coverage that catches nothing and tests that actually find bugs.

3. Generated protocols that AI agents execute autonomously (quality planning)
The integration test protocol isn't just documentation. When you tell Copilot to run the tests, it reads the protocol, follows the execution plan step by step, reports progress, and produces a final summary with a ship-or-no-ship recommendation. The documentation becomes the test runner's instructions.

Testing

Tested against five codebases across four languages. The skill had never seen any of them:

Project Language Tests Key Result
Octobatch (batch orchestrator) Python Full playbook All passing
BlazorMatchGame (game UI) C# 30 functional Found 2 real bugs: ghost match vulnerability, timer lifecycle leak
Spring PetClinic (REST API) Java 44 functional Found telephone validation gap — unreported bug
Gson (Google's JSON library) Java 53 new + 4,638 existing All pass, zero regressions
Javalin (web framework) Kotlin/Java 48 functional + 13 integration groups 309 total tests, zero failures

Also submitted to anthropics/skills#659. The skill follows the shared Agent Skills specification and works with both Copilot and Claude Code.

Background

This skill emerged from my experiments with agentic engineering, which I cover in my Radar series on AI-driven development. The full story covers how it was built, why it works, and the quality engineering theory behind it.


Type of Contribution

  • New instruction file.
  • New prompt file.
  • New agent file.
  • New plugin.
  • New skill file.
  • New agentic workflow.
  • Update to existing instruction, prompt, agent, plugin, skill, or workflow.
  • Other (please specify):

Additional Notes

  • Skill validated with npm run skill:validate (247 skills valid, 0 invalid)
  • Built with npm run build successfully
  • No paid services or API keys required — the skill itself is just prompt engineering and reference docs

By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.

Copilot AI review requested due to automatic review settings March 25, 2026 16:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new quality-playbook skill under skills/ that guides an agent through exploring a codebase and generating a multi-artifact quality system (constitution, functional tests, review/integration/spec-audit protocols, and bootstrap docs), and registers it in the skills index.

Changes:

  • Introduces skills/quality-playbook/SKILL.md defining the skill’s purpose, phases, and generated artifacts.
  • Adds bundled reference documentation under skills/quality-playbook/references/ to support the skill’s workflows and verification steps.
  • Updates docs/README.skills.md to include the new skill and list its bundled assets.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
skills/quality-playbook/SKILL.md Defines the new quality-playbook skill instructions, phases, and outputs.
skills/quality-playbook/LICENSE.txt Bundles license terms for the skill content.
skills/quality-playbook/references/constitution.md Template/guidance for generating QUALITY.md.
skills/quality-playbook/references/defensive_patterns.md Guidance for finding defensive patterns and converting them into scenarios/tests.
skills/quality-playbook/references/functional_tests.md Detailed guide for generating spec-traced functional tests.
skills/quality-playbook/references/review_protocols.md Templates for code review and integration test protocols.
skills/quality-playbook/references/schema_mapping.md Guidance for schema-valid mutation mapping for boundary tests.
skills/quality-playbook/references/spec_audit.md “Council of Three” spec audit protocol template.
skills/quality-playbook/references/verification.md Verification checklist/benchmarks for generated artifacts.
docs/README.skills.md Registers the new skill in the skills catalog and lists bundled assets.

// Java — RIGHT: tests the requirement using a schema-valid mutation
@Test
void testBadValueNotInOutput() {
fixture.setField(null); // Optional<String> accepts null
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is misleading: Optional<String> should not be null. If the field is optional, the schema-valid mutation should usually be Optional.empty() (or null only if the actual field type is a nullable String). Adjust the example/comment to match the intended type.

Suggested change
fixture.setField(null); // Optional<String> accepts null
fixture.setField(Optional.empty()); // Optional<String> uses Optional.empty() for missing values

Copilot uses AI. Check for mistakes.

```python
# Python (pytest)
def test_defensive_pattern_name(self, fixture):
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python pytest example is shown as a standalone function but includes a self parameter. Either remove self or wrap the snippet in a class Test...: block so it matches valid pytest patterns.

Suggested change
def test_defensive_pattern_name(self, fixture):
def test_defensive_pattern_name(fixture):

Copilot uses AI. Check for mistakes.

```python
# Python
def test_config_validation(self, tmp_path):
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python pytest example is shown as a top-level function but includes a self parameter. In pytest, top-level tests should not take self; either remove self or wrap the example in a class Test...: block (and keep self only in that case).

Suggested change
def test_config_validation(self, tmp_path):
def test_config_validation(tmp_path):

Copilot uses AI. Check for mistakes.
```python
# Python (pytest)
@pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c])
def test_feature_works(self, variant):
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pytest parametrization example defines a top-level test function but includes a self parameter. Either remove self or move the example into a test class to avoid generating invalid pytest code.

Suggested change
def test_feature_works(self, variant):
def test_feature_works(variant):

Copilot uses AI. Check for mistakes.
Comment on lines +369 to +374
# Python — WRONG: tests the validation mechanism
def test_bad_value_rejected(self, fixture):
fixture.field = "invalid" # Schema rejects this!
try:
process(fixture)
fail("Expected validation error")
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python snippet uses fail("Expected validation error"), but fail is not defined in pytest. Consider using pytest.fail(...) or with pytest.raises(ValidationError): ... so the example is runnable if copied.

Suggested change
# Python — WRONG: tests the validation mechanism
def test_bad_value_rejected(self, fixture):
fixture.field = "invalid" # Schema rejects this!
try:
process(fixture)
fail("Expected validation error")
import pytest
# Python — WRONG: tests the validation mechanism
def test_bad_value_rejected(self, fixture):
fixture.field = "invalid" # Schema rejects this!
try:
process(fixture)
pytest.fail("Expected validation error")

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +41
```python
# Python — WRONG: tests the validation mechanism, not the requirement
def test_bad_value_rejected(self, fixture):
fixture.field = "invalid" # Pydantic rejects this before processing!
try:
process(fixture)
fail("Expected validation error")
except ValidationError:
pass # Tells you nothing about the output

# Python — RIGHT: tests the requirement using a schema-valid mutation
def test_bad_value_not_in_output(self, fixture):
fixture.field = None # Schema accepts None for Optional fields
output = process(fixture)
assert field_property not in output # Bad data absent
assert expected_type in output # Rest still works
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python examples are written as top-level pytest tests but include a self parameter and call fail(...) which is not defined. Either remove self and use pytest.fail(...)/pytest.raises(...), or wrap the examples in a class Test...: block if you want to demonstrate method-style tests.

Copilot uses AI. Check for mistakes.
@andrewstellman andrewstellman force-pushed the add-quality-playbook-skill branch from 1a9dae9 to 3157098 Compare March 25, 2026 18:55
@andrewstellman andrewstellman force-pushed the add-quality-playbook-skill branch from 58c300c to ad4a92e Compare March 25, 2026 20:52
@aaronpowell aaronpowell merged commit 50f87bd into github:staged Mar 25, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants