Add quality-playbook skill by andrewstellman · Pull Request #1168 · github/awesome-copilot

andrewstellman · 2026-03-25T16:49:49Z

Pull Request Checklist

I have read and followed the CONTRIBUTING.md guidelines.
I have read and followed the Guidance for submissions involving paid services.
My contribution adds a new instruction, prompt, agent, skill, or workflow file in the correct directory.
The file follows the required naming convention.
The content is clearly structured and follows the example format.
I have tested my instructions, prompt, agent, skill, or workflow with GitHub Copilot.
I have run npm start and verified that README.md is up to date.
I am targeting the staged branch for this pull request.

Quality Playbook skill

A skill that revives traditional quality engineering practices — the kind most teams cut decades ago — and uses AI to make them cheap enough to run on every project.

Most AI testing tools work from source code: they generate test stubs based on what the code does. This skill does something different. It explores a codebase first — reading specs, schemas, defensive patterns, architecture, even developer chat history — to understand what the code is supposed to do, then generates a complete quality infrastructure to verify it. That's the gap between "does this function return the right value?" and "does this system fulfill its purpose?" — and it's the problem quality engineering was invented to solve.

###What it generates

Six deliverables that together form a repeatable quality system:

Deliverable	What It Does
QUALITY.md	Quality constitution — defines what "correct" means for this specific project, with coverage targets tied to rationale so future sessions can't argue standards down
Functional tests	Spec-traced tests in the project's native language and test framework, derived from what the spec says should happen — not from what the code does
RUN_CODE_REVIEW.md	Code review protocol with anti-hallucination guardrails: cite line numbers, grep before claiming something is missing, flag uncertainty as QUESTION not BUG
RUN_INTEGRATION_TESTS.md	Integration test protocol with a runnable test matrix — an AI agent can read this file and execute it as an autonomous test runner
RUN_SPEC_AUDIT.md	Council of Three protocol: three independent AI models audit code against specs (in testing, 74% of defects were caught by only one model — single-model review misses most problems)
AGENTS.md	Bootstrap file so every future AI session inherits the quality system instead of starting from scratch

What it doesn't do

This is not a test stub generator, a linter, or a code review bot. It doesn't mechanically produce tests from source code. It doesn't scan for vulnerabilities. It doesn't score your existing test suite. It builds the quality system that tells you what to test, why, and how to know when it's working — the piece that sits between "here are specifications" and "here are tests" that no existing tool generates.

Existing tools fall into four buckets, none of which do what this skill does:

Test generators (Diffblue, Qodo, Early.ai) generate tests mechanically from source code, including tests that verify bugs
Code review tools (CodeRabbit, Greptile) flag problems in code already written, single-model
Spec-driven frameworks (GitHub Spec Kit, Kiro) manage the process but don't generate the testing infrastructure
Open-source skills (agentic-bootstrap, Star Chamber) generate personas or do standalone review, not quality infrastructure

Three things this does that nothing else does

Each maps to a traditional quality practice that got cut for cost:

1. Forensic inversion of defensive patterns (root cause analysis)
Every tool I surveyed treats try/except blocks and null checks as evidence of robustness. This skill inverts that: defensive code is scar tissue from past failures. Every try/except block, every null check, every retry loop points to a failure mode that belongs in the test plan. Instead of proving the code is safe, the skill reads the confessions.

2. Coverage theater prevention (test plan review)
Explicitly defines and flags fake tests: existence-only checks that prove something is there without verifying what it is, presence-only assertions, mocked validators that bypass the thing you're testing, single-variant testing. The difference between 95% coverage that catches nothing and tests that actually find bugs.

3. Generated protocols that AI agents execute autonomously (quality planning)
The integration test protocol isn't just documentation. When you tell Copilot to run the tests, it reads the protocol, follows the execution plan step by step, reports progress, and produces a final summary with a ship-or-no-ship recommendation. The documentation becomes the test runner's instructions.

Testing

Tested against five codebases across four languages. The skill had never seen any of them:

Project	Language	Tests	Key Result
Octobatch (batch orchestrator)	Python	Full playbook	All passing
BlazorMatchGame (game UI)	C#	30 functional	Found 2 real bugs: ghost match vulnerability, timer lifecycle leak
Spring PetClinic (REST API)	Java	44 functional	Found telephone validation gap — unreported bug
Gson (Google's JSON library)	Java	53 new + 4,638 existing	All pass, zero regressions
Javalin (web framework)	Kotlin/Java	48 functional + 13 integration groups	309 total tests, zero failures

Also submitted to anthropics/skills#659. The skill follows the shared Agent Skills specification and works with both Copilot and Claude Code.

Background

This skill emerged from my experiments with agentic engineering, which I cover in my Radar series on AI-driven development. The full story covers how it was built, why it works, and the quality engineering theory behind it.

Type of Contribution

Additional Notes

Skill validated with npm run skill:validate (247 skills valid, 0 invalid)
Built with npm run build successfully
No paid services or API keys required — the skill itself is just prompt engineering and reference docs

By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.

Copilot

Pull request overview

Adds a new quality-playbook skill under skills/ that guides an agent through exploring a codebase and generating a multi-artifact quality system (constitution, functional tests, review/integration/spec-audit protocols, and bootstrap docs), and registers it in the skills index.

Changes:

Introduces skills/quality-playbook/SKILL.md defining the skill’s purpose, phases, and generated artifacts.
Adds bundled reference documentation under skills/quality-playbook/references/ to support the skill’s workflows and verification steps.
Updates docs/README.skills.md to include the new skill and list its bundled assets.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
skills/quality-playbook/SKILL.md	Defines the new quality-playbook skill instructions, phases, and outputs.
skills/quality-playbook/LICENSE.txt	Bundles license terms for the skill content.
skills/quality-playbook/references/constitution.md	Template/guidance for generating `QUALITY.md`.
skills/quality-playbook/references/defensive_patterns.md	Guidance for finding defensive patterns and converting them into scenarios/tests.
skills/quality-playbook/references/functional_tests.md	Detailed guide for generating spec-traced functional tests.
skills/quality-playbook/references/review_protocols.md	Templates for code review and integration test protocols.
skills/quality-playbook/references/schema_mapping.md	Guidance for schema-valid mutation mapping for boundary tests.
skills/quality-playbook/references/spec_audit.md	“Council of Three” spec audit protocol template.
skills/quality-playbook/references/verification.md	Verification checklist/benchmarks for generated artifacts.
docs/README.skills.md	Registers the new skill in the skills catalog and lists bundled assets.

Copilot · 2026-03-25T16:53:47Z

skills/quality-playbook/references/schema_mapping.md

+// Java — RIGHT: tests the requirement using a schema-valid mutation
+@Test
+void testBadValueNotInOutput() {
+    fixture.setField(null);  // Optional<String> accepts null


This comment is misleading: Optional<String> should not be null. If the field is optional, the schema-valid mutation should usually be Optional.empty() (or null only if the actual field type is a nullable String). Adjust the example/comment to match the intended type.

Suggested change

fixture.setField(null); // Optional<String> accepts null

fixture.setField(Optional.empty()); // Optional<String> uses Optional.empty() for missing values

Copilot · 2026-03-25T16:53:48Z

skills/quality-playbook/references/defensive_patterns.md

+
+```python
+# Python (pytest)
+def test_defensive_pattern_name(self, fixture):


The Python pytest example is shown as a standalone function but includes a self parameter. Either remove self or wrap the snippet in a class Test...: block so it matches valid pytest patterns.

Suggested change

def test_defensive_pattern_name(self, fixture):

def test_defensive_pattern_name(fixture):

Copilot · 2026-03-25T16:53:48Z

skills/quality-playbook/references/functional_tests.md

+
+```python
+# Python
+def test_config_validation(self, tmp_path):


The Python pytest example is shown as a top-level function but includes a self parameter. In pytest, top-level tests should not take self; either remove self or wrap the example in a class Test...: block (and keep self only in that case).

Suggested change

def test_config_validation(self, tmp_path):

def test_config_validation(tmp_path):

Copilot · 2026-03-25T16:53:49Z

skills/quality-playbook/references/functional_tests.md

+```python
+# Python (pytest)
+@pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c])
+def test_feature_works(self, variant):


This pytest parametrization example defines a top-level test function but includes a self parameter. Either remove self or move the example into a test class to avoid generating invalid pytest code.

Suggested change

def test_feature_works(self, variant):

def test_feature_works(variant):

Copilot · 2026-03-25T16:53:49Z

skills/quality-playbook/references/functional_tests.md

+# Python — WRONG: tests the validation mechanism
+def test_bad_value_rejected(self, fixture):
+    fixture.field = "invalid"  # Schema rejects this!
+    try:
+        process(fixture)
+        fail("Expected validation error")


The Python snippet uses fail("Expected validation error"), but fail is not defined in pytest. Consider using pytest.fail(...) or with pytest.raises(ValidationError): ... so the example is runnable if copied.

Suggested change

# Python — WRONG: tests the validation mechanism

def test_bad_value_rejected(self, fixture):

fixture.field = "invalid" # Schema rejects this!

try:

process(fixture)

fail("Expected validation error")

import pytest

# Python — WRONG: tests the validation mechanism

def test_bad_value_rejected(self, fixture):

fixture.field = "invalid" # Schema rejects this!

try:

process(fixture)

pytest.fail("Expected validation error")

Copilot · 2026-03-25T16:53:49Z

skills/quality-playbook/references/schema_mapping.md

+```python
+# Python — WRONG: tests the validation mechanism, not the requirement
+def test_bad_value_rejected(self, fixture):
+    fixture.field = "invalid"  # Pydantic rejects this before processing!
+    try:
+        process(fixture)
+        fail("Expected validation error")
+    except ValidationError:
+        pass  # Tells you nothing about the output
+
+# Python — RIGHT: tests the requirement using a schema-valid mutation
+def test_bad_value_not_in_output(self, fixture):
+    fixture.field = None  # Schema accepts None for Optional fields
+    output = process(fixture)
+    assert field_property not in output  # Bad data absent
+    assert expected_type in output  # Rest still works


The Python examples are written as top-level pytest tests but include a self parameter and call fail(...) which is not defined. Either remove self and use pytest.fail(...)/pytest.raises(...), or wrap the examples in a class Test...: block if you want to demonstrate method-style tests.

andrewstellman requested a review from aaronpowell as a code owner March 25, 2026 16:49

Copilot AI review requested due to automatic review settings March 25, 2026 16:49

Copilot started reviewing on behalf of andrewstellman March 25, 2026 16:50 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

andrewstellman force-pushed the add-quality-playbook-skill branch from 1a9dae9 to 3157098 Compare March 25, 2026 18:55

Add quality-playbook skill

ad4a92e

andrewstellman force-pushed the add-quality-playbook-skill branch from 58c300c to ad4a92e Compare March 25, 2026 20:52

aaronpowell approved these changes Mar 25, 2026

View reviewed changes

aaronpowell merged commit 50f87bd into github:staged Mar 25, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quality-playbook skill#1168

Add quality-playbook skill#1168
aaronpowell merged 1 commit intogithub:stagedfrom
andrewstellman:add-quality-playbook-skill

andrewstellman commented Mar 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	fixture.setField(null); // Optional<String> accepts null
	fixture.setField(Optional.empty()); // Optional<String> uses Optional.empty() for missing values

	def test_defensive_pattern_name(self, fixture):
	def test_defensive_pattern_name(fixture):

	def test_config_validation(self, tmp_path):
	def test_config_validation(tmp_path):

	def test_feature_works(self, variant):
	def test_feature_works(variant):

Conversation

andrewstellman commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Quality Playbook skill

What it doesn't do

Three things this does that nothing else does

Testing

Background

Type of Contribution

Additional Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewstellman commented Mar 25, 2026 •

edited

Loading