This skill’s evaluation criteria are derived from the official Anthropic guide The Complete Guide to Building Skills for Claude.
skill-metric is a utility skill that runs static quality checks on agent skills.
It scores each target skill directory on three dimensions:
- 2.1.1 Format
- 2.1.2 Completeness
- 2.1.3 Writing
Each dimension has a maximum of 8 points, for a total of 24 points, and the tool supports text reports, JSON, CSV, and radar charts (for a single skill).
The main implementation script lives at skill-metric/scripts/skill_quality_eval.py.
- Batch quality audit: Run a health check on all skills under
skills/to detect format or content issues. - Deep-dive on a single skill: Get detailed Format / Completeness / Writing scores plus per-check explanations.
- Export for analysis: Export scores as CSV or JSON for dashboards, BI tools, or further processing.
- Generate radar charts: For a single skill, create a radar chart over the three dimensions for reports or slide decks.
- Python: Python 3.10+ is recommended.
- Third‑party libraries:
- Core scoring logic uses only the standard library.
- Radar chart generation via
--figurerequires:pip install matplotlib.
Use the script under skill-metric/scripts/skill_quality_eval.py:
python skill-metric/scripts/skill_quality_eval.py <skill_path> [skill_path ...] [options]| Argument | Description |
|---|---|
skill_path |
One or more skill directory paths, or a path to the corresponding SKILL.md. Examples: skills/uniprot-database or skills/uniprot-database/SKILL.md. |
Important: Each path must be either a skill directory or its SKILL.md file. Passing the parent skills/ directory itself will treat it as a single skill named skills, which is almost never what you want.
| Option | Description |
|---|---|
-q, --quiet |
Print only the total score for each skill. |
-j, --json |
Print JSON output (single object for one skill, array for multiple). |
--csv [file] |
Emit CSV. Without a path, CSV is written to stdout; with a path, CSV is saved to that file. |
--figure [file] |
Generate a radar chart PNG for one skill only. Without a path, saves as <skill_name>_radar.png; with a path, saves to that file. Requires matplotlib. |
# Score a single skill with a full verbose report
python skill-metric/scripts/skill_quality_eval.py skills/uniprot-database
# Score a single skill and print only the total score
python skill-metric/scripts/skill_quality_eval.py skills/uniprot-database -q
# Score a single skill and generate a radar chart (default <skill_name>_radar.png)
python skill-metric/scripts/skill_quality_eval.py skills/uniprot-database --figure
# Score a single skill and save the radar chart to a custom file
python skill-metric/scripts/skill_quality_eval.py skills/uniprot-database --figure report/radar.png
# Batch‑score all skills under skills/
python skill-metric/scripts/skill_quality_eval.py skills/*/
# Batch‑score and write CSV to a file
python skill-metric/scripts/skill_quality_eval.py skills/*/ --csv skill_scores.csv
# Batch‑score and emit CSV to stdout (can be redirected)
python skill-metric/scripts/skill_quality_eval.py skills/*/ --csv > report.csv
# JSON output
python skill-metric/scripts/skill_quality_eval.py skills/uniprot-database -j
python skill-metric/scripts/skill_quality_eval.py skills/*/ -jFor full details see skill-metric/references/scoring_criteria.md.
This section summarizes the rubric:
| Dimension | What is checked (summary) | Max |
|---|---|---|
| 2.1.1 Format | SKILL.md existence and exact name, directory naming rules, YAML frontmatter, name/description presence and validity, description length and no XML tags; one point deducted per violation. |
8 |
| 2.1.2 Completeness | Presence of license, compatibility, metadata; existence of non‑empty scripts/, references/, assets/ dirs; code examples; error‑handling guidance; one point awarded per satisfied item. |
8 |
| 2.1.3 Writing | Clear task boundary and trigger, progressive disclosure (body ≤ 5000 chars), English‑first content, consistency between body references and actual files, non‑placeholder license, version information, etc.; one point awarded per satisfied item. | 8 |
2.1.1 Format (8 pts max; −1 per violation)
[skill_name]/SKILL.mdmust exist and be named exactlySKILL.md(notskill.md,SKILL.MD, etc.).[skill_name]must use kebab-case: no spaces, no underscores (e.g.notion-project-setup✓,NotionProjectSetup✗).- Do not include
README.mdinside the skill directory. SKILL.mdmust have YAML frontmatter delimited by---.- Frontmatter must include
namematching the directory name exactly. - Frontmatter must include
descriptionstating (a) what the skill does, (b) when to use it. descriptionmust be under 1024 characters.descriptionmust not contain XML tags (e.g.<a>).
2.1.2 Completeness (0 base; +1 per satisfied item)
- Has
licensefield? - Has
compatibilityfield (≤500 chars for environment requirements)? - Has
metadatafield (author, version, etc.)? - Has
[skill_name]/scripts/with at least one file? - Has
[skill_name]/references/with at least one file? - Has
[skill_name]/assets/with at least one file? - Does the body provide concrete examples (e.g. code blocks or example paragraphs)?
- Does the body describe error/exception handling?
2.1.3 Writing (0 base; +1 per satisfied item)
- Does
descriptionhave a clear task boundary? (e.g. “Analyzes Figma design files and generates developer handoff documentation.” ✓ vs “Helps with projects.” ✗) - Does
descriptionhave clear trigger phrasing? (e.g. “Use when user uploads .fig files.”) - Progressive disclosure:
SKILL.mdbody ≤5000 chars; details inreferences/, runnable code inscripts/. - Is the content primarily in English?
- Reference consistency: every
references/orscripts/path mentioned in the body points to an existing file? - Reverse consistency: if
references/orscripts/exist, does the body reference at least one file in them? - Is
licensenon-placeholder? (exclude "Unknown", empty, "N/A", etc.) - Is version information present? (in frontmatter or body, e.g. “Biopython 1.85”)
- One section per skill, showing: skill name, path, Format / Completeness / Writing scores, and total score.
- Unless
-qis used, each individual check is listed with ✓/✗ and an explanatory message, making it easy to see where the skill fails the rubric.
- Single skill: A JSON object containing
skill_name,skill_dir,format_score,completeness_score,writing_score,total_score, and adetailsobject grouping per‑check results underformat,completeness, andwriting. - Multiple skills: An array of such objects, convenient for downstream processing and visualization.
- Columns:
skill_name,skill_dir(relative to the current working directory),format_score,completeness_score,writing_score,total_score,error, plusformat_1…format_8,completeness_1…completeness_8,writing_1…writing_8. - Each per‑check column contains
PASS: <message>orFAIL: <message>.
This makes it easy to filter, aggregate, or pivot on particular checks in tools like Excel or data warehouses.
- Only generated when exactly one skill is evaluated and that skill completes without errors.
- The chart has three axes — Format (2.1.1), Completeness (2.1.2), Writing (2.1.3) — each on a 0–8 scale.
The title includes the skill name and total score. - When multiple skills are passed together with
--figure, the script prints a message indicating that radar charts are supported for single‑skill evaluation only.
When invoking the script from Python, remember that shell glob patterns are not expanded if you pass a list to subprocess. Use glob.glob() first:
import glob
import json
import subprocess
# Batch‑score all skills and write CSV
skill_dirs = sorted(glob.glob("skills/*/"))
subprocess.run(
["python", "skill-metric/scripts/skill_quality_eval.py"]
+ skill_dirs
+ ["--csv", "skill_scores.csv"],
check=True,
)
# Score a single skill and parse JSON output
result = subprocess.run(
["python", "skill-metric/scripts/skill_quality_eval.py",
"skills/uniprot-database", "-j"],
capture_output=True, text=True, check=True,
)
data = json.loads(result.stdout)
print(data["total_score"])For more Python examples and a precise description of all fields, see skill-metric/references/scoring_criteria.md.
- Valid paths only: Each path must be a skill directory or its
SKILL.md. Passing the parentskills/directory will treat it as a single skill calledskills. - Relative
skill_dirin CSV: Theskill_dircolumn in CSV is relative to the current working directory, so running the tool from different locations will change this value. --csvvs-j: If both--csvand-jare provided, only CSV is emitted (JSON is suppressed).- Radar chart requirements:
--figureis supported for one skill at a time and requiresmatplotlib; otherwise, the option may be ignored or an error message will be shown.