feat(skills): add rules-distill — extract cross-cutting principles from skills into rules#561
feat(skills): add rules-distill — extract cross-cutting principles from skills into rules#561shimo4228 wants to merge 6 commits intoaffaan-m:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new rules-distill command doc and a Rules Distill skill with documentation plus two new scanner scripts that enumerate rules and skills, extract metadata/frontmatter and file metrics, and emit structured JSON for downstream cross-reading and human review. Also updates docs counts and project structure entries. Changes
Sequence Diagram(s)sequenceDiagram
participant RulesScanner as "scan-rules.sh"
participant SkillsScanner as "scan-skills.sh"
participant LLM as "Cross-read LLM"
participant User as "Reviewer"
participant Store as "results.json"
RulesScanner->>SkillsScanner: provide rules index (JSON)
SkillsScanner->>LLM: send skill metadata and file metrics
LLM->>LLM: cross-read skills + rules, extract candidate rules & verdicts
LLM->>User: present candidate summaries and drafts
User->>LLM: approve/modify/skip candidates
User->>Store: persist final statuses, metadata, and report
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds Key remaining concerns:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["/rules-distill invoked"] --> B["Phase 1a: scan-skills.sh\n(global + project SKILL.md)"]
B --> C["Phase 1b: scan-rules.sh\n(~/.claude/rules *.md)"]
C --> D["Present inventory to user\nSkills: N | Rules: M files"]
D --> E["Phase 2: Thematic batching\n(group skills by description)"]
E --> F["Launch subagent per batch\n(metadata JSON + full rules text)"]
F --> G["Cross-batch merge\n(deduplicate, re-check 2+ skills)"]
G --> H["Candidates with verdicts\nAppend / Revise / New Section /\nNew File / Already Covered / Too Specific"]
H --> I["Phase 3: Summary table\npresented to user"]
I --> J{User action per candidate}
J -->|Approve| K["Apply draft to rule file"]
J -->|Modify| L["⚠ Undefined protocol\n(no interaction spec)"]
J -->|Skip| M["No change"]
K --> N["Save results.json\n(kebab-case IDs, UTC timestamp)"]
L --> N
M --> N
Last reviewed commit: "fix(skills): address..." |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
skills/rules-distill/SKILL.md (2)
93-101: Add language identifier to code block.The JSON structure at lines 93-101 is missing a language identifier. Adding
jsonto the code fence improves syntax highlighting and clarity.✨ Suggested fix
## Output Format (per candidate) +```json { "principle": "1-2 sentences in 'do X' / 'don't do Y' form",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/rules-distill/SKILL.md` around lines 93 - 101, The fenced code block containing the JSON example (keys like "principle", "evidence", "violation_risk", "verdict", "target_rule", "confidence", "draft") in SKILL.md is missing a language identifier; update the opening code fence to "```json" so the block reads as a JSON code block to enable proper syntax highlighting and clarity.
19-184: Consider grouping phases under "How It Works" section.The three phases (Phase 1, 2, 3) describe how the skill works, but they're not explicitly grouped under a "How It Works" heading. While the current structure is clear, adding an explicit "How It Works" parent section would improve compliance with the skill documentation guidelines.
♻️ Suggested restructure
+## How It Works + +The rules distillation process follows three phases: + ## Phase 1: Inventory (Deterministic Collection) ### 1a. Collect skill inventoryAs per coding guidelines: Skills should be formatted as Markdown with clear sections for When to Use, How It Works, and Examples.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/rules-distill/SKILL.md` around lines 19 - 184, The document SKILL.md is missing a top-level "How It Works" parent heading that groups the existing Phase 1: Inventory, Phase 2: Cross-read, Match & Verdict, and Phase 3: User Review & Execution sections; add a "## How It Works" (or appropriate H2) above the "Phase 1" heading and indent/organize the three phase headings (Phase 1: Inventory, Phase 2: Cross-read, Match & Verdict, Phase 3: User Review & Execution) under that section so the skill follows the required "When to Use / How It Works / Examples" structure while keeping all existing content and headings intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/rules-distill/SKILL.md`:
- Line 167: The README currently hardcodes the results path as
`~/.claude/skills/rules-distill/results.json`; replace that hardcoded path with
a configurable/reference placeholder and document how to resolve it (e.g., use
an environment variable or resolved home directory). Update the SKILL.md line
that contains the literal `~/.claude/skills/rules-distill/results.json` to show
a configurable token like `${CLAUDE_HOME}/skills/rules-distill/results.json` or
instruct users to use their home directory (e.g., expand with os.homedir or
$HOME) and add a short note on how the application determines the final path
(env var name or config key).
- Line 35: Replace the hardcoded path in the shell snippet "find ~/.claude/rules
-name '*.md' -not -path '*/_archived/*' | while read f; do" with a configurable
variable (e.g. "${CLAUDE_RULES_DIR:-$HOME/.claude/rules}") so callers can
override it; update the snippet to use that variable in the find invocation and
ensure any README or comments document the new CLAUDE_RULES_DIR env var and its
default.
- Line 26: Replace the hardcoded path "bash
~/.claude/skills/skill-stocktake/scripts/scan.sh" with a configurable reference:
update the invocation in SKILL.md to use either a relative path like
"./scripts/scan.sh" (if the README sits at repo root) or an environment variable
placeholder such as "$SKILL_DIR/scripts/scan.sh" (and document setting
SKILL_DIR), and remove the tilde expansion; ensure the example shows the generic
form and that any CI or startup scripts set the env var if needed.
- Around line 1-191: Add a new "Examples" Markdown section immediately after the
"Design Principles" heading that includes: a complete end-to-end run showing the
command used to start rules-distill and the Phase 1 banner (e.g., "Rules
Distillation — Phase 1: Inventory" with counts), representative sample outputs
for Phase 2 candidate JSON entries (principle, evidence, violation_risk,
verdict, draft) and the Phase 3 "Rules Distillation Report" summary table, plus
a short example of the user approval interaction (e.g., "User: Approve 1, Modify
2") — ensure headings mirror existing ones (Phase 1/Phase 2/Phase 3, "Rules
Distillation Report") and include "See skill: [name]" links in draft examples to
comply with the Design Principles.
---
Nitpick comments:
In `@skills/rules-distill/SKILL.md`:
- Around line 93-101: The fenced code block containing the JSON example (keys
like "principle", "evidence", "violation_risk", "verdict", "target_rule",
"confidence", "draft") in SKILL.md is missing a language identifier; update the
opening code fence to "```json" so the block reads as a JSON code block to
enable proper syntax highlighting and clarity.
- Around line 19-184: The document SKILL.md is missing a top-level "How It
Works" parent heading that groups the existing Phase 1: Inventory, Phase 2:
Cross-read, Match & Verdict, and Phase 3: User Review & Execution sections; add
a "## How It Works" (or appropriate H2) above the "Phase 1" heading and
indent/organize the three phase headings (Phase 1: Inventory, Phase 2:
Cross-read, Match & Verdict, Phase 3: User Review & Execution) under that
section so the skill follows the required "When to Use / How It Works /
Examples" structure while keeping all existing content and headings intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 001b4032-f29a-4b10-836c-5f925443dc47
📒 Files selected for processing (2)
commands/rules-distill.mdskills/rules-distill/SKILL.md
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="skills/rules-distill/SKILL.md">
<violation number="1" location="skills/rules-distill/SKILL.md:100">
P3: The output schema doesn’t define fields for Revise “reason” and “before/after” text, yet the Verdict Reference requires them. This mismatch leaves Revise proposals underspecified for Phase 3 review.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/rules-distill/SKILL.md`:
- Line 13: Rename the Markdown section heading "When to Activate" to "When to
Use" in SKILL.md so the skill follows the documentation standard; locate the
heading string "When to Activate" and replace it with "When to Use", ensuring
the rest of the Markdown structure (other sections like "How It Works" and
"Examples") remains unchanged.
- Line 41: The SKILL discovery fallback uses the wrong env var name—replace
CLAUDE_RULES_DIR with a correctly named CLAUDE_SKILLS_DIR in the find command so
it searches "${CLAUDE_SKILLS_DIR:-$HOME/.claude/skills}" for SKILL.md and other
.md files; update any references to CLAUDE_RULES_DIR in the SKILL.md discovery
logic to CLAUDE_SKILLS_DIR to keep semantics consistent (look for the find
invocation that mentions SKILL.md and CLAUDE_RULES_DIR).
- Line 21: The Prerequisites line in SKILL.md contains a hardcoded path
(~/.claude/skills/skill-stocktake/scripts/scan.sh); update the Prerequisites
section so it does not reference a fixed filesystem location — either remove the
explicit path or replace it with an environment-variable placeholder (e.g.,
$SKILL_STOCKTAKE_SCRIPT or $CLAUDE_SKILLS_DIR) and describe how to set that
variable or fall back to the find command used in the same sentence; edit the
sentence mentioning "skill-stocktake" and the scan script accordingly to
reference the placeholder or the alternative manual enumeration command.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8fdb1476-73e8-49e1-b8bc-9da8eeebb0c8
📒 Files selected for processing (1)
skills/rules-distill/SKILL.md
There was a problem hiding this comment.
3 issues found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="skills/rules-distill/SKILL.md">
<violation number="1" location="skills/rules-distill/SKILL.md:41">
P2: Fallback inventory command points at the rules directory, matches arbitrary `.md` files, and truncates with `head -100`, so Phase 1 can scan the wrong tree and produce a non‑exhaustive/incorrect skill inventory.</violation>
<violation number="2" location="skills/rules-distill/SKILL.md:70">
P2: Batch-local 2+ filtering is specified without a global cross-batch merge step, which can drop valid cross-cutting principles split across clusters and undermine the stated exhaustive collection goal.</violation>
<violation number="3" location="skills/rules-distill/SKILL.md:107">
P2: Nested triple-backtick fences inside the subagent prompt will terminate the outer fence early; use a longer outer fence (e.g., ````) to keep the prompt intact.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
6 issues found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="skills/rules-distill/SKILL.md">
<violation number="1" location="skills/rules-distill/SKILL.md:60">
P2: Cross-batch promotion is unreachable because per-batch extraction still suppresses singleton-in-batch candidates, leaving nothing for merge to promote.</violation>
</file>
<file name="skills/rules-distill/scripts/scan-skills.sh">
<violation number="1" location="skills/rules-distill/scripts/scan-skills.sh:100">
P2: GNU-specific `date -r "$file"` will fail on macOS/BSD (expects epoch seconds), and `set -e` will abort the scan. Use a portable stat/date fallback for file mtimes.</violation>
<violation number="2" location="skills/rules-distill/scripts/scan-skills.sh:103">
P2: Observation count matching breaks for file paths containing spaces because `uniq -c` output is split on whitespace and `awk` compares only field 2 to the full path, causing `use_7d`/`use_30d` to silently return 0 for such files.</violation>
</file>
<file name="skills/rules-distill/scripts/scan-rules.sh">
<violation number="1" location="skills/rules-distill/scripts/scan-rules.sh:30">
P2: `sort -z` is a GNU-only option; on macOS/BSD `sort` rejects `-z`, causing the pipeline to fail and the script to exit under `set -euo pipefail`. This makes the script non-portable for common developer environments.</violation>
<violation number="2" location="skills/rules-distill/scripts/scan-rules.sh:43">
P2: The script always rewrites paths as home-relative, which corrupts paths when RULES_DIR is outside $HOME (e.g., "~//tmp/rules/foo.md"), so the emitted JSON path no longer identifies the scanned file.</violation>
<violation number="3" location="skills/rules-distill/scripts/scan-rules.sh:67">
P2: JSON output is assembled with unescaped string interpolation, so headings or filenames containing quotes/backslashes will produce invalid JSON.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| for i in "${!files[@]}"; do | ||
| file="${files[$i]}" | ||
| rel_path="${file#"$HOME"/}" | ||
| rel_path="~/$rel_path" |
There was a problem hiding this comment.
P2: The script always rewrites paths as home-relative, which corrupts paths when RULES_DIR is outside $HOME (e.g., "~//tmp/rules/foo.md"), so the emitted JSON path no longer identifies the scanned file.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At skills/rules-distill/scripts/scan-rules.sh, line 43:
<comment>The script always rewrites paths as home-relative, which corrupts paths when RULES_DIR is outside $HOME (e.g., "~//tmp/rules/foo.md"), so the emitted JSON path no longer identifies the scanned file.</comment>
<file context>
@@ -0,0 +1,72 @@
+for i in "${!files[@]}"; do
+ file="${files[$i]}"
+ rel_path="${file#"$HOME"/}"
+ rel_path="~/$rel_path"
+
+ # Extract H2 headings (## Title)
</file context>
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/rules-distill/scripts/scan-rules.sh`:
- Around line 13-15: Replace manual string concatenation JSON prints in
scan-rules.sh (e.g., the RULES_DIR existence error block and the rule-emission
logic around functions that use basename and extracted headings) with jq-driven
JSON construction: use jq -n --arg for each string value (RULES_DIR, basename
output, each extracted heading) and --argjson where needed so all strings are
properly escaped, emit the error object via jq -n --arg path "$RULES_DIR"
'{"error":"rules directory not found","path":$path}', and similarly build each
rule object with jq -n --arg name "$basename" --arg heading "$heading"
'{name:$name,heading:$heading,...}' instead of echoing concatenated strings;
apply the same jq approach to the blocks covering lines ~35-68 that produce rule
objects.
- Around line 28-30: The use of sort -z in the find pipeline is not portable
(BSD/macos sort lacks -z) and causes the script to fail under set -euo pipefail;
update the pipeline that builds the files array (the while IFS= read -r -d '' f;
do files+=("$f"); done < <(find "$RULES_DIR" -name '*.md' -not -path
'*/_archived/*' -print0 | sort -z)) to a portable approach: remove sort -z and
instead convert NULs to newlines, sort normally, then read lines into the files
array (e.g., use find ... -print0 | tr '\0' '\n' | sort | while IFS= read -r f;
do files+=("$f"); done) so RULES_DIR scanning and ordering work on macOS and
Linux.
In `@skills/rules-distill/scripts/scan-skills.sh`:
- Around line 87-92: The jq selectors in obs_7d_counts and obs_30d_counts still
look for top-level .tool and .path but the current observation schema nests
skill metadata under .skill and only provides timestamp + skill.id/skill.path;
update both jq invocations that read from $OBSERVATIONS (variables obs_7d_counts
and obs_30d_counts) to filter by timestamp and emit the nested skill path (or
id) instead, e.g. change the filter to 'select(.timestamp >= $c and .skill.path
!= null) | .skill.path' (or use .skill.id where you index by id) so the counts
reflect current observations and still pipe through sort | uniq -c as before.
- Line 119: The scan currently inventories any Markdown under "$dir" by using
find "$dir" -name "*.md", which picks up non-canonical files; change the find
invocation to only match SKILL.md (e.g., find "$dir" -type f -name "SKILL.md")
so the script only counts canonical skill records; update the pipeline that
feeds the while/read loop (the line using done < <(find "$dir" -name "*.md" ...
| sort)) to use the new find pattern and keep the sort/redirect behavior intact.
- Around line 47-52: Create a portable file_mtime_utc() helper to replace the
incorrect use of date -r with a file path: use stat (e.g., stat -c %Y or stat -f
%m) to get the file's epoch mtime, then pass that epoch to the
platform-appropriate date command (date -u -r EPOCH on macOS/BSD or date -u -d
"@EPOCH" on GNU) to produce the UTC ISO timestamp; update callers that
previously used date -u -r "$file" to call file_mtime_utc "$file" and keep the
existing date_ago() helper unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5339a74d-f52d-4a0e-a6ab-4db64b15cfee
📒 Files selected for processing (3)
skills/rules-distill/SKILL.mdskills/rules-distill/scripts/scan-rules.shskills/rules-distill/scripts/scan-skills.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- skills/rules-distill/SKILL.md
| obs_7d_counts=$(jq -r --arg c "$c7" \ | ||
| 'select(.tool=="Read" and .timestamp>=$c) | .path' \ | ||
| "$OBSERVATIONS" 2>/dev/null | sort | uniq -c) | ||
| obs_30d_counts=$(jq -r --arg c "$c30" \ | ||
| 'select(.tool=="Read" and .timestamp>=$c) | .path' \ | ||
| "$OBSERVATIONS" 2>/dev/null | sort | uniq -c) |
There was a problem hiding this comment.
The usage rollups still target the old observation schema.
scripts/lib/skill-improvement/observations.js:51-74 emits timestamp plus skill.id / skill.path; there is no top-level .tool or .path in the current schema. As written, these selectors never match current observations, so every emitted skill ends up with use_7d: 0 and use_30d: 0.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/rules-distill/scripts/scan-skills.sh` around lines 87 - 92, The jq
selectors in obs_7d_counts and obs_30d_counts still look for top-level .tool and
.path but the current observation schema nests skill metadata under .skill and
only provides timestamp + skill.id/skill.path; update both jq invocations that
read from $OBSERVATIONS (variables obs_7d_counts and obs_30d_counts) to filter
by timestamp and emit the nested skill path (or id) instead, e.g. change the
filter to 'select(.timestamp >= $c and .skill.path != null) | .skill.path' (or
use .skill.id where you index by id) so the counts reflect current observations
and still pipe through sort | uniq -c as before.
leegonzales
left a comment
There was a problem hiding this comment.
Servitor Review — PR #561
Good work here, Shimo. This is a well-designed meta-tool that follows the project's formats and conventions. Let me give you the full picture.
What's right
- Clean, focused PR: One skill, one command, two support scripts. This is how PRs should look.
- Proper formatting: Frontmatter on both files is correct. File naming follows conventions.
- Required sections present: When to Use, How It Works, Example — all accounted for.
- Smart design: The 3-phase workflow (deterministic collection → LLM judgment → user review) is solid. The anti-abstraction safeguard (2+ skills, actionable behavior test, violation risk) is exactly the kind of thoughtfulness that prevents rules from becoming abstract noise.
- Never auto-modifies rules: This is the right call. Always requiring user approval keeps humans in the loop.
- Security: No hardcoded paths or secrets.
set -euo pipefailin scripts. Cleanup traps. Defensive.
Things to address
-
scan-skills.shenv var naming: The script usesSKILL_STOCKTAKE_GLOBAL_DIRandSKILL_STOCKTAKE_PROJECT_DIRenv var names, but this is the rules-distill skill, not skill-stocktake. Should useRULES_DISTILL_GLOBAL_DIR/RULES_DISTILL_PROJECT_DIRfor clarity. Thescan-rules.shalready usesRULES_DISTILL_DIR— be consistent across both scripts. -
scan-skills.shcarries unnecessary scope: Lines 64–100 implement observation counting (use_7d,use_30d) fromobservations.jsonl. Rules-distill only needs skill names, descriptions, and content for cross-reading — not usage metrics. Consider trimming this to keep the script focused on what rules-distill actually needs. -
Header comment inconsistency:
scan-skills.shheader says# scan.sh— should say# scan-skills.sh. -
scan-rules.shJSON construction: Headings are embedded directly into echo'd JSON strings without escaping. A heading containing"or\would produce invalid JSON. Consider usingjqfor JSON construction (asscan-skills.shalready does) or at minimum escape special characters.
CI failures
All 3 CI failures are pre-existing on main — your PR doesn't introduce any new ones:
- Lint:
buildTemplateVariablesunused,provenanceunused (both in main's code) - Test: orchestrator slug validation, catalog counts (both known issues since heartbeat 3)
You're clean here.
Verdict
This is a good contribution. Address the 4 items above and it's ready. Hit it. ⭐
— Servitor (Christopher Pike), Keeper of everything-claude-code
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 3: The skills count in AGENTS.md is inconsistent: the header block ("This
is a **production-ready AI coding plugin**..." which currently states "109
skills") does not match the Project Structure entry ("skills/ — 102..."); verify
the actual number of skill modules in the repo and update both places in
AGENTS.md so they match (either change the header to 102 or update the Project
Structure entry to 109), ensuring the exact numeric value is used in the header
sentence and the "skills/ — ..." line under Project Structure.
In `@README.md`:
- Line 194: Update the README so the counts are consistent: find the sentence "✨
**That's it!** You now have access to 25 agents, 109 skills, and 58 commands."
and update the corresponding entries in the "Cross-Tool Feature Parity" table
(Claude Code counts and any other cells showing agent/skill/command totals) to
match 25/109/58; ensure any other mentions of those numbers in the document
(table captions, badges, or summary lines) are changed to the same values so the
README is one source of truth.
There was a problem hiding this comment.
2 issues found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="README.md">
<violation number="1" location="README.md:1047">
P2: README catalog totals are internally inconsistent: updated Claude Code counts (109 skills/58 commands) conflict with older values still shown in the later cross-tool parity table.</violation>
</file>
<file name="AGENTS.md">
<violation number="1" location="AGENTS.md:3">
P2: AGENTS.md now reports two different skill totals (109 vs 102), leaving the catalog counts internally inconsistent after this PR.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
Analysis Failed
Troubleshooting
Retry: |
|
Addressed all 4 items from the Servitor review in b62cbbf:
Note: CI failures (ESLint |
There was a problem hiding this comment.
2 issues found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="skills/rules-distill/scripts/scan-rules.sh">
<violation number="1" location="skills/rules-distill/scripts/scan-rules.sh:36">
P2: Heading extraction now exits the script when a markdown file has no `##` matches due to `set -euo pipefail` + missing `|| true` around `grep`.</violation>
<violation number="2" location="skills/rules-distill/scripts/scan-rules.sh:56">
P2: Rules are reassembled from lexicographically globbed temp files, which reorders entries for 10+ files (`10.json` before `2.json`).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
skills/rules-distill/scripts/scan-rules.sh (2)
13-15:⚠️ Potential issue | 🟠 MajorUse
jqfor the error object serialization.Line 14 still builds JSON via string concatenation; paths containing
"or\can emit invalid JSON.Proposed fix
-if [[ ! -d "$RULES_DIR" ]]; then - echo '{"error":"rules directory not found","path":"'"$RULES_DIR"'"}' >&2 +if [[ ! -d "$RULES_DIR" ]]; then + jq -n --arg path "$RULES_DIR" '{"error":"rules directory not found","path":$path}' >&2 exit 1 fi#!/bin/bash set -euo pipefail RULES_DIR='/tmp/rules-with-"quote' echo '{"error":"rules directory not found","path":"'"$RULES_DIR"'"}' | jq . >/dev/null🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/rules-distill/scripts/scan-rules.sh` around lines 13 - 15, Replace the manual JSON string construction that uses RULES_DIR with a safe jq-based serialization: when the directory check in scan-rules.sh (the if [[ ! -d "$RULES_DIR" ]] branch) fails, invoke jq -n with --arg to set "error" and "path" fields so paths with quotes/backslashes are escaped correctly, and write the jq output to stderr before exiting; update the echo/exit sequence to use this jq-generated JSON instead of concatenating strings.
20-22:⚠️ Potential issue | 🟠 MajorAvoid GNU-only
sort -zin a cross-platform scanner.Line 22 uses
sort -z, which is not supported by BSD/macOSsortand can hard-fail underset -euo pipefail.Portable fix (newline-delimited)
-while IFS= read -r -d '' f; do +while IFS= read -r f; do files+=("$f") -done < <(find "$RULES_DIR" -name '*.md' -not -path '*/_archived/*' -print0 | sort -z) +done < <(find "$RULES_DIR" -name '*.md' -not -path '*/_archived/*' -print | LC_ALL=C sort)Does BSD/macOS `sort` support the `-z` (`--zero-terminated`) option?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/rules-distill/scripts/scan-rules.sh` around lines 20 - 22, The loop uses GNU-only sort -z which breaks on BSD/macOS; replace the null-terminated pipeline with a newline-delimited one: run find "$RULES_DIR" -name '*.md' -not -path '*/_archived/*' -print | sort and change the read loop to while IFS= read -r f; do files+=("$f"); done < <( ... ); this keeps the use of RULES_DIR and the files array and avoids sort -z while preserving the sorted list of markdown files for the rest of the script.skills/rules-distill/scripts/scan-skills.sh (2)
71-71:⚠️ Potential issue | 🟠 MajorRestrict scan target to canonical
SKILL.mdfiles.Line 71 inventories every Markdown file, which can include non-skill docs and distort skill counts used by rules-distill.
Proposed fix
- done < <(find "$dir" -name "*.md" -type f 2>/dev/null | sort) + done < <(find "$dir" -type f -name "SKILL.md" 2>/dev/null | sort)Based on learnings, the canonical skill file in this repository is
SKILL.mdinside each skill folder.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/rules-distill/scripts/scan-skills.sh` at line 71, The loop currently feeds every Markdown file into the scanner via the find invocation (the line using done < <(find "$dir" -name "*.md" ...)), which pulls non-canonical docs; change the find invocation to only match canonical SKILL.md files (e.g., use -name "SKILL.md" or a path filter matching "*/SKILL.md") so the script scans only each skill's SKILL.md. Update the find call inside scan-skills.sh (the pipeline feeding the while/read loop) to restrict results to SKILL.md and keep the existing sort and error-redirection.
60-60:⚠️ Potential issue | 🟠 MajorMake mtime extraction portable across GNU/BSD
date.Line 60 uses
date -u -r "$file"with a file path, which is not portable across environments.Proposed fix
+file_mtime_utc() { + local file="$1" epoch + epoch=$(stat -f %m "$file" 2>/dev/null || stat -c %Y "$file") + date -u -r "$epoch" +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -d "@$epoch" +%Y-%m-%dT%H:%M:%SZ +} + ... - mtime=$(date -u -r "$file" +%Y-%m-%dT%H:%M:%SZ) + mtime=$(file_mtime_utc "$file")On macOS/BSD `date`, does `date -r` accept a file path or epoch seconds?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/rules-distill/scripts/scan-skills.sh` at line 60, Replace the non-portable mtime extraction with a small portable routine: detect which stat/date flags are available and compute mtime by first getting the file's epoch seconds (try stat -c %Y "$file" for GNU, fallback to stat -f %m "$file" for BSD) into a variable (e.g., secs) and then format it to UTC ISO8601 using date with the appropriate option (use date -u -d "@$secs" +%Y-%m-%dT%H:%M:%SZ for GNU date, fallback to date -u -r "$secs" +%Y-%m-%dT%H:%M:%SZ for BSD); update the code that sets mtime=$(...) to call this logic (you can implement as a helper function like get_mtime and replace the mtime=$(date -u -r "$file" +%Y-%m-%dT%H:%M:%SZ) line).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/rules-distill/scripts/scan-rules.sh`:
- Around line 32-33: The current rel_path rewriting always prefixes "~/" which
yields malformed "~//..." when $file is not under $HOME; change scan-rules.sh to
only rewrite to a home-relative path when the file actually starts with $HOME
(check using the $file value and $HOME), e.g. if $file begins with "$HOME/" then
set rel_path="${file#"$HOME"/}" and rel_path="~/$rel_path", otherwise leave
rel_path equal to the original $file; update the logic around the rel_path
assignment in the script (the lines where rel_path is computed) so non-$HOME
paths are not rewritten.
- Around line 35-36: Replace the failing grep pipeline used to build
headings_json with an awk-based extraction that doesn't exit nonzero when no
matches are found: locate the assignment to headings_json (the line using grep
-E '^## ' "$file" ... | jq -R . | jq -s '.') and change it to invoke awk to emit
the matched title text (stripping the leading "## ") so the rest of the pipeline
(jq -R . | jq -s '.') can run successfully and produce [] for files without
headings; ensure you update the same variable name headings_json and preserve
the jq steps so behavior for files with headings remains identical.
---
Duplicate comments:
In `@skills/rules-distill/scripts/scan-rules.sh`:
- Around line 13-15: Replace the manual JSON string construction that uses
RULES_DIR with a safe jq-based serialization: when the directory check in
scan-rules.sh (the if [[ ! -d "$RULES_DIR" ]] branch) fails, invoke jq -n with
--arg to set "error" and "path" fields so paths with quotes/backslashes are
escaped correctly, and write the jq output to stderr before exiting; update the
echo/exit sequence to use this jq-generated JSON instead of concatenating
strings.
- Around line 20-22: The loop uses GNU-only sort -z which breaks on BSD/macOS;
replace the null-terminated pipeline with a newline-delimited one: run find
"$RULES_DIR" -name '*.md' -not -path '*/_archived/*' -print | sort and change
the read loop to while IFS= read -r f; do files+=("$f"); done < <( ... ); this
keeps the use of RULES_DIR and the files array and avoids sort -z while
preserving the sorted list of markdown files for the rest of the script.
In `@skills/rules-distill/scripts/scan-skills.sh`:
- Line 71: The loop currently feeds every Markdown file into the scanner via the
find invocation (the line using done < <(find "$dir" -name "*.md" ...)), which
pulls non-canonical docs; change the find invocation to only match canonical
SKILL.md files (e.g., use -name "SKILL.md" or a path filter matching
"*/SKILL.md") so the script scans only each skill's SKILL.md. Update the find
call inside scan-skills.sh (the pipeline feeding the while/read loop) to
restrict results to SKILL.md and keep the existing sort and error-redirection.
- Line 60: Replace the non-portable mtime extraction with a small portable
routine: detect which stat/date flags are available and compute mtime by first
getting the file's epoch seconds (try stat -c %Y "$file" for GNU, fallback to
stat -f %m "$file" for BSD) into a variable (e.g., secs) and then format it to
UTC ISO8601 using date with the appropriate option (use date -u -d "@$secs"
+%Y-%m-%dT%H:%M:%SZ for GNU date, fallback to date -u -r "$secs"
+%Y-%m-%dT%H:%M:%SZ for BSD); update the code that sets mtime=$(...) to call
this logic (you can implement as a helper function like get_mtime and replace
the mtime=$(date -u -r "$file" +%Y-%m-%dT%H:%M:%SZ) line).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3b979900-6adc-4eb8-b03e-d8afc2c49ff7
📒 Files selected for processing (2)
skills/rules-distill/scripts/scan-rules.shskills/rules-distill/scripts/scan-skills.sh
| rel_path="${file#"$HOME"/}" | ||
| rel_path="~/$rel_path" |
There was a problem hiding this comment.
Fix home-relative path rewriting for non-$HOME files.
Lines 32-33 produce malformed values like ~//tmp/... when RULES_DIR is outside $HOME (common in tests via RULES_DISTILL_DIR).
Proposed fix
- rel_path="${file#"$HOME"/}"
- rel_path="~/$rel_path"
+ if [[ "$file" == "$HOME/"* ]]; then
+ rel_path="~/${file#"$HOME"/}"
+ else
+ rel_path="$file"
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rel_path="${file#"$HOME"/}" | |
| rel_path="~/$rel_path" | |
| if [[ "$file" == "$HOME/"* ]]; then | |
| rel_path="~/${file#"$HOME"/}" | |
| else | |
| rel_path="$file" | |
| fi |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 33-33: Tilde does not expand in quotes. Use $HOME.
(SC2088)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/rules-distill/scripts/scan-rules.sh` around lines 32 - 33, The current
rel_path rewriting always prefixes "~/" which yields malformed "~//..." when
$file is not under $HOME; change scan-rules.sh to only rewrite to a
home-relative path when the file actually starts with $HOME (check using the
$file value and $HOME), e.g. if $file begins with "$HOME/" then set
rel_path="${file#"$HOME"/}" and rel_path="~/$rel_path", otherwise leave rel_path
equal to the original $file; update the logic around the rel_path assignment in
the script (the lines where rel_path is computed) so non-$HOME paths are not
rewritten.
|
Thanks for this contribution! The rules-distill skill and command look well-structured and useful. However, this PR now has merge conflicts (likely from recent merges updating AGENTS.md/README.md counts). Could you rebase on the latest |
b62cbbf to
981ff3a
Compare
|
Analysis Failed
Troubleshooting
Retry: |
|
Thanks for the review and sorry for the trouble — I know you're busy. Rebased on latest main and addressed all feedback: Merge conflict resolution:
CodeRabbit review fixes (981ff3a):
Previous Servitor review fixes (still included):
|
|
Thanks for the rebase and addressing the review feedback. However, CI tests are still failing across all platforms (Ubuntu, macOS, Windows). This looks like the component validation or test suite is catching an issue with the new files. Please check the test output — likely the catalog counts or component validation needs updating after your changes. Once tests pass, this is good to merge. |
…om skills into rules Applies the skill-stocktake pattern to rules maintenance: scan skills → extract shared principles → propose rule changes. Key design decisions: - Deterministic collection (scan scripts) + LLM judgment (cross-read & verdict) - 6 verdict types: Append, Revise, New Section, New File, Already Covered, Too Specific - Anti-abstraction safeguard: 2+ skills evidence, actionable behavior test, violation risk - Rules full text passed to LLM (no grep pre-filter) for accurate matching - Never modifies rules automatically — always requires user approval
Fixes raised by CodeRabbit, Greptile, and cubic: - Add Prerequisites section documenting skill-stocktake dependency - Add fallback command when skill-stocktake is not installed - Fix shell quoting: add IFS= and -r to while-read loops - Replace hardcoded paths with env var placeholders ($CLAUDE_RULES_DIR, $SKILL_STOCKTAKE_DIR) - Add json language identifier to code blocks - Add "How It Works" parent heading for Phase 1/2/3 - Add "Example" section with end-to-end run output - Add revision.reason/before/after fields to output schema for Revise verdict - Document timestamp format (date -u +%Y-%m-%dT%H:%M:%SZ) - Document candidate-id format (kebab-case from principle) - Use concrete examples in results.json schema
…ripts Address P1 review feedback: - Add scan-skills.sh and scan-rules.sh directly in rules-distill/scripts/ (no external dependency on skill-stocktake) - Remove Prerequisites section (no longer needed) - Add cross-batch merge step to prevent 2+ skills requirement from being silently broken across batch boundaries - Fix nested triple-backtick fences (use quadruple backticks) - Remove head -100 cap (silent truncation) - Rename "When to Activate" → "When to Use" (ECC standard) - Remove unnecessary env var placeholders (SKILL.md is a prompt, not a script)
rules-distill added 1 skill + 1 command: - skills: 108 → 109 - commands: 57 → 58 Updates all count references to pass CI catalog validation.
1. Rename SKILL_STOCKTAKE_* env vars to RULES_DISTILL_* for consistency
2. Remove unnecessary observation counting (use_7d/use_30d) from scan-skills.sh
3. Fix header comment: scan.sh → scan-skills.sh
4. Use jq for JSON construction in scan-rules.sh to properly escape
headings containing special characters (", \)
1. scan-rules.sh: use jq for error JSON output (proper escaping) 2. scan-rules.sh: replace GNU-only sort -z with portable sort (BSD compat) 3. scan-rules.sh: fix pipefail crash on files without H2 headings 4. scan-skills.sh: scan only SKILL.md files (skip learned/*.md and auxiliary docs that lack frontmatter) 5. scan-skills.sh: add portable get_mtime helper (GNU stat/date fallback to BSD stat/date)
981ff3a to
1b75306
Compare
|
Analysis Failed
Troubleshooting
Retry: |
| _scan_cleanup() { rm -rf "$_scan_tmpdir"; } | ||
| trap _scan_cleanup RETURN |
There was a problem hiding this comment.
RETURN trap does not fire on abnormal function exits
trap _scan_cleanup RETURN only fires when the function returns via a normal return statement or falls off the end. If set -euo pipefail causes an early exit inside scan_dir_to_json (e.g., jq fails on a corrupt SKILL.md), bash does not invoke the RETURN trap and the tmpdir is leaked. There is no script-level EXIT trap in scan-skills.sh to catch this either.
Use trap _scan_cleanup EXIT at the script level (or pass the tmpdir path back and clean it in a script-level EXIT trap) so cleanup is guaranteed regardless of how the function exits:
# Script-level, after tmpdir is created
trap 'rm -rf "$tmpdir"' EXITAlternatively, use trap _scan_cleanup EXIT RETURN inside the function so both normal and error returns trigger cleanup (though on abnormal process exits only EXIT matters).
| extract_field() { | ||
| local file="$1" field="$2" | ||
| awk -v f="$field" ' | ||
| BEGIN { fm=0 } | ||
| /^---$/ { fm++; next } | ||
| fm==1 { | ||
| n = length(f) + 2 | ||
| if (substr($0, 1, n) == f ": ") { | ||
| val = substr($0, n+1) | ||
| gsub(/^"/, "", val) | ||
| gsub(/"$/, "", val) | ||
| print val | ||
| exit | ||
| } | ||
| } | ||
| fm>=2 { exit } | ||
| ' "$file" | ||
| } |
There was a problem hiding this comment.
extract_field silently produces wrong output for multi-line YAML block scalars
When a SKILL.md frontmatter uses a YAML block scalar indicator for its description — which is common for long descriptions:
description: |
Scan skills to extract cross-cutting principles
and distill them into rulesextract_field matches the line description: | and returns | (after quote-stripping) as the description value. The resulting JSON becomes "description":"|", which is technically valid JSON but is semantically wrong. The Phase 2 subagent then sees a one-character description for every such skill, potentially misclassifying or omitting it from analysis.
The comment on line 25 documents the limitation, but there is no warning emitted and no way for the caller to distinguish a real |-valued field from a failed parse.
Consider emitting a warning to stderr when the extracted value is | or >:
if [[ "$val" == "|" || "$val" == ">" ]]; then
echo "Warning: multi-line YAML value for field '$field' in $file — skipping" >&2
fi|
Superseded by rebased PR |
* feat(skills): add rules-distill — extract cross-cutting principles from skills into rules
Applies the skill-stocktake pattern to rules maintenance:
scan skills → extract shared principles → propose rule changes.
Key design decisions:
- Deterministic collection (scan scripts) + LLM judgment (cross-read & verdict)
- 6 verdict types: Append, Revise, New Section, New File, Already Covered, Too Specific
- Anti-abstraction safeguard: 2+ skills evidence, actionable behavior test, violation risk
- Rules full text passed to LLM (no grep pre-filter) for accurate matching
- Never modifies rules automatically — always requires user approval
* fix(skills): address review feedback for rules-distill
Fixes raised by CodeRabbit, Greptile, and cubic:
- Add Prerequisites section documenting skill-stocktake dependency
- Add fallback command when skill-stocktake is not installed
- Fix shell quoting: add IFS= and -r to while-read loops
- Replace hardcoded paths with env var placeholders ($CLAUDE_RULES_DIR, $SKILL_STOCKTAKE_DIR)
- Add json language identifier to code blocks
- Add "How It Works" parent heading for Phase 1/2/3
- Add "Example" section with end-to-end run output
- Add revision.reason/before/after fields to output schema for Revise verdict
- Document timestamp format (date -u +%Y-%m-%dT%H:%M:%SZ)
- Document candidate-id format (kebab-case from principle)
- Use concrete examples in results.json schema
* fix(skills): remove skill-stocktake dependency, add self-contained scripts
Address P1 review feedback:
- Add scan-skills.sh and scan-rules.sh directly in rules-distill/scripts/
(no external dependency on skill-stocktake)
- Remove Prerequisites section (no longer needed)
- Add cross-batch merge step to prevent 2+ skills requirement
from being silently broken across batch boundaries
- Fix nested triple-backtick fences (use quadruple backticks)
- Remove head -100 cap (silent truncation)
- Rename "When to Activate" → "When to Use" (ECC standard)
- Remove unnecessary env var placeholders (SKILL.md is a prompt, not a script)
* fix: update skill/command counts in README.md and AGENTS.md
rules-distill added 1 skill + 1 command:
- skills: 108 → 109
- commands: 57 → 58
Updates all count references to pass CI catalog validation.
* fix(skills): address Servitor review feedback for rules-distill
1. Rename SKILL_STOCKTAKE_* env vars to RULES_DISTILL_* for consistency
2. Remove unnecessary observation counting (use_7d/use_30d) from scan-skills.sh
3. Fix header comment: scan.sh → scan-skills.sh
4. Use jq for JSON construction in scan-rules.sh to properly escape
headings containing special characters (", \)
* fix(skills): address CodeRabbit review — portability and scan scope
1. scan-rules.sh: use jq for error JSON output (proper escaping)
2. scan-rules.sh: replace GNU-only sort -z with portable sort (BSD compat)
3. scan-rules.sh: fix pipefail crash on files without H2 headings
4. scan-skills.sh: scan only SKILL.md files (skip learned/*.md and
auxiliary docs that lack frontmatter)
5. scan-skills.sh: add portable get_mtime helper (GNU stat/date
fallback to BSD stat/date)
* fix: sync catalog counts with filesystem (27 agents, 114 skills, 59 commands)
---------
Co-authored-by: Tatsuya Shimomoto <shimo4228@gmail.com>
Summary
Adds
/rules-distill, a meta-tool that scans installed skills, extracts cross-cutting principles appearing in 2+ skills, and proposes changes to rule files (append, revise, new section, or new file).This applies the skill-stocktake pattern to rules maintenance — just as skill-stocktake keeps skills healthy, rules-distill keeps rules aligned with the knowledge encoded in skills.
How it works
Key design decisions
Tested locally
Ran full distillation against 56 skills + 22 rule files. Results:
Type
Testing
Checklist
Summary by cubic
Adds
/rules-distill, a self-contained tool that scans installed skills, finds principles shared across 2+ skills, and proposes rule updates with user approval. Ships built-in scanners (noskill-stocktake), adds the/rules-distillcommand, and updates counts to 110 skills and 58 commands.New Features
skills/rules-distillwith a 3-phase workflow and built-in scanners (scripts/scan-skills.sh,scripts/scan-rules.sh); cross-reads full rules text (no grep prefilter) with six verdicts; never auto-edits.results.jsonwith UTC timestamps, kebab-case IDs, and Revise fields (reason/before/after); addedcommands/rules-distill.md.Bug Fixes
scan-rules.shuses portablesort, emits JSON errors viajq, handles files without H2 headings, and safely escapes headings.scan-skills.shscans onlySKILL.mdand uses a portablestat/datemtime helper; saferwhileloops withIFS/-r.RULES_DISTILL_*env vars; removed silent truncation and unused counters; fixed script headers; updatedREADME.mdandAGENTS.mdcounts.Written for commit 1b75306. Summary will update on new commits.
Summary by CodeRabbit
Documentation
New Features