Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new quality-playbook skill under skills/ that guides an agent through exploring a codebase and generating a multi-artifact quality system (constitution, functional tests, review/integration/spec-audit protocols, and bootstrap docs), and registers it in the skills index.
Changes:
- Introduces
skills/quality-playbook/SKILL.mddefining the skill’s purpose, phases, and generated artifacts. - Adds bundled reference documentation under
skills/quality-playbook/references/to support the skill’s workflows and verification steps. - Updates
docs/README.skills.mdto include the new skill and list its bundled assets.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/quality-playbook/SKILL.md | Defines the new quality-playbook skill instructions, phases, and outputs. |
| skills/quality-playbook/LICENSE.txt | Bundles license terms for the skill content. |
| skills/quality-playbook/references/constitution.md | Template/guidance for generating QUALITY.md. |
| skills/quality-playbook/references/defensive_patterns.md | Guidance for finding defensive patterns and converting them into scenarios/tests. |
| skills/quality-playbook/references/functional_tests.md | Detailed guide for generating spec-traced functional tests. |
| skills/quality-playbook/references/review_protocols.md | Templates for code review and integration test protocols. |
| skills/quality-playbook/references/schema_mapping.md | Guidance for schema-valid mutation mapping for boundary tests. |
| skills/quality-playbook/references/spec_audit.md | “Council of Three” spec audit protocol template. |
| skills/quality-playbook/references/verification.md | Verification checklist/benchmarks for generated artifacts. |
| docs/README.skills.md | Registers the new skill in the skills catalog and lists bundled assets. |
| // Java — RIGHT: tests the requirement using a schema-valid mutation | ||
| @Test | ||
| void testBadValueNotInOutput() { | ||
| fixture.setField(null); // Optional<String> accepts null |
There was a problem hiding this comment.
This comment is misleading: Optional<String> should not be null. If the field is optional, the schema-valid mutation should usually be Optional.empty() (or null only if the actual field type is a nullable String). Adjust the example/comment to match the intended type.
| fixture.setField(null); // Optional<String> accepts null | |
| fixture.setField(Optional.empty()); // Optional<String> uses Optional.empty() for missing values |
|
|
||
| ```python | ||
| # Python (pytest) | ||
| def test_defensive_pattern_name(self, fixture): |
There was a problem hiding this comment.
The Python pytest example is shown as a standalone function but includes a self parameter. Either remove self or wrap the snippet in a class Test...: block so it matches valid pytest patterns.
| def test_defensive_pattern_name(self, fixture): | |
| def test_defensive_pattern_name(fixture): |
|
|
||
| ```python | ||
| # Python | ||
| def test_config_validation(self, tmp_path): |
There was a problem hiding this comment.
The Python pytest example is shown as a top-level function but includes a self parameter. In pytest, top-level tests should not take self; either remove self or wrap the example in a class Test...: block (and keep self only in that case).
| def test_config_validation(self, tmp_path): | |
| def test_config_validation(tmp_path): |
| ```python | ||
| # Python (pytest) | ||
| @pytest.mark.parametrize("variant", [variant_a, variant_b, variant_c]) | ||
| def test_feature_works(self, variant): |
There was a problem hiding this comment.
This pytest parametrization example defines a top-level test function but includes a self parameter. Either remove self or move the example into a test class to avoid generating invalid pytest code.
| def test_feature_works(self, variant): | |
| def test_feature_works(variant): |
| # Python — WRONG: tests the validation mechanism | ||
| def test_bad_value_rejected(self, fixture): | ||
| fixture.field = "invalid" # Schema rejects this! | ||
| try: | ||
| process(fixture) | ||
| fail("Expected validation error") |
There was a problem hiding this comment.
The Python snippet uses fail("Expected validation error"), but fail is not defined in pytest. Consider using pytest.fail(...) or with pytest.raises(ValidationError): ... so the example is runnable if copied.
| # Python — WRONG: tests the validation mechanism | |
| def test_bad_value_rejected(self, fixture): | |
| fixture.field = "invalid" # Schema rejects this! | |
| try: | |
| process(fixture) | |
| fail("Expected validation error") | |
| import pytest | |
| # Python — WRONG: tests the validation mechanism | |
| def test_bad_value_rejected(self, fixture): | |
| fixture.field = "invalid" # Schema rejects this! | |
| try: | |
| process(fixture) | |
| pytest.fail("Expected validation error") |
| ```python | ||
| # Python — WRONG: tests the validation mechanism, not the requirement | ||
| def test_bad_value_rejected(self, fixture): | ||
| fixture.field = "invalid" # Pydantic rejects this before processing! | ||
| try: | ||
| process(fixture) | ||
| fail("Expected validation error") | ||
| except ValidationError: | ||
| pass # Tells you nothing about the output | ||
|
|
||
| # Python — RIGHT: tests the requirement using a schema-valid mutation | ||
| def test_bad_value_not_in_output(self, fixture): | ||
| fixture.field = None # Schema accepts None for Optional fields | ||
| output = process(fixture) | ||
| assert field_property not in output # Bad data absent | ||
| assert expected_type in output # Rest still works |
There was a problem hiding this comment.
The Python examples are written as top-level pytest tests but include a self parameter and call fail(...) which is not defined. Either remove self and use pytest.fail(...)/pytest.raises(...), or wrap the examples in a class Test...: block if you want to demonstrate method-style tests.
1a9dae9 to
3157098
Compare
58c300c to
ad4a92e
Compare
Pull Request Checklist
npm startand verified thatREADME.mdis up to date.stagedbranch for this pull request.Quality Playbook skill
A skill that revives traditional quality engineering practices — the kind most teams cut decades ago — and uses AI to make them cheap enough to run on every project.
Most AI testing tools work from source code: they generate test stubs based on what the code does. This skill does something different. It explores a codebase first — reading specs, schemas, defensive patterns, architecture, even developer chat history — to understand what the code is supposed to do, then generates a complete quality infrastructure to verify it. That's the gap between "does this function return the right value?" and "does this system fulfill its purpose?" — and it's the problem quality engineering was invented to solve.
###What it generates
Six deliverables that together form a repeatable quality system:
What it doesn't do
This is not a test stub generator, a linter, or a code review bot. It doesn't mechanically produce tests from source code. It doesn't scan for vulnerabilities. It doesn't score your existing test suite. It builds the quality system that tells you what to test, why, and how to know when it's working — the piece that sits between "here are specifications" and "here are tests" that no existing tool generates.
Existing tools fall into four buckets, none of which do what this skill does:
Three things this does that nothing else does
Each maps to a traditional quality practice that got cut for cost:
1. Forensic inversion of defensive patterns (root cause analysis)
Every tool I surveyed treats try/except blocks and null checks as evidence of robustness. This skill inverts that: defensive code is scar tissue from past failures. Every try/except block, every null check, every retry loop points to a failure mode that belongs in the test plan. Instead of proving the code is safe, the skill reads the confessions.
2. Coverage theater prevention (test plan review)
Explicitly defines and flags fake tests: existence-only checks that prove something is there without verifying what it is, presence-only assertions, mocked validators that bypass the thing you're testing, single-variant testing. The difference between 95% coverage that catches nothing and tests that actually find bugs.
3. Generated protocols that AI agents execute autonomously (quality planning)
The integration test protocol isn't just documentation. When you tell Copilot to run the tests, it reads the protocol, follows the execution plan step by step, reports progress, and produces a final summary with a ship-or-no-ship recommendation. The documentation becomes the test runner's instructions.
Testing
Tested against five codebases across four languages. The skill had never seen any of them:
Also submitted to anthropics/skills#659. The skill follows the shared Agent Skills specification and works with both Copilot and Claude Code.
Background
This skill emerged from my experiments with agentic engineering, which I cover in my Radar series on AI-driven development. The full story covers how it was built, why it works, and the quality engineering theory behind it.
Type of Contribution
Additional Notes
npm run skill:validate(247 skills valid, 0 invalid)npm run buildsuccessfullyBy submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.