Skip to content

docs(skills): code-review v2 + GitHub endpoint fixes + minor text updates#2528

Merged
ilblackdragon merged 6 commits intostagingfrom
docs/skill-text-updates
Apr 17, 2026
Merged

docs(skills): code-review v2 + GitHub endpoint fixes + minor text updates#2528
ilblackdragon merged 6 commits intostagingfrom
docs/skill-text-updates

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

Summary

  • Upgrade code-review skill to paranoid-architect v2: supports local diffs + GitHub PRs with line-level comments
  • Fix GitHub skill: clarify search endpoints, document response envelope structure, fix search_issues hallucination
  • Minor text updates across 8 other skills (commitment-digest, commitment-triage, decision-capture, delegation-tracker, idea-parking, product-prioritization, security-review, tech-debt-tracker)

Cherry-picked from #2504.

Test plan

  • Verify SKILL.md files parse correctly (valid frontmatter)
  • Spot-check code-review skill v2 instructions against actual GitHub API behavior

🤖 Generated with Claude Code

Rewrite the code-review skill from a 6-bullet checklist into a
paranoid-architect workflow that handles both local diffs and GitHub
PRs end-to-end:

- Two input shapes: local `git diff` or `owner/repo N` /
  `github.com/.../pull/N` URLs.
- Step 1 wraps GitHub fetches in `async def` + `FINAL(await ...)` to
  avoid the closure-capture quirk that kept tripping LLMs (see the
  paired codeact preamble update); reads metadata, diff, and files
  via three sequential awaits instead of `asyncio.gather`.
- Step 2 reads each changed file in full (raw media type, no base64
  module needed) so reviews account for surrounding context.
- Step 3 runs the change through six lenses: correctness, edge cases,
  security (with a real adversarial checklist), test coverage, docs,
  architecture.
- Step 4 renders findings as a severity table and asks which to post.
- Step 5 posts line-level comments via the PR comments endpoint with
  the captured head SHA, falling back to issue comments for
  multi-file findings.

Bumps `requires.skills` to include `github` so the activation pulls
in the GitHub API recipes via the chain-loader.

Adds a live e2e test (`e2e_live_code_review.rs`) plus a recorded
trace fixture (PR #2483) so the workflow is replayable without
hitting GitHub.
LLMs kept inventing a `search_issues` action and looping over
`/repos/{owner}/{repo}/pulls` for "my PRs" queries. Clarify the
GitHub tool surface in three places:

- `tools-src/github/src/lib.rs` and `registry/tools/github.json`:
  enumerate the three real search actions and call out that
  `search_issues_pull_requests` covers both. Add the canonical
  `is:pr author:@me sort:updated-desc` recipe for cross-repo "my PRs".

- `skills/github/SKILL.md`: add an "Authenticated User & Cross-Repo
  Queries" section with copy-paste recipes for `@me`, the search
  endpoints with proper URL encoding, and the response-envelope
  contract (`body` is parsed JSON for application/json, raw `str` for
  diff endpoints — never call `json.loads()` on it, never write
  `.get("body", body)` as a fallback).
@github-actions github-actions bot added scope: docs Documentation size: S 10-49 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly upgrades the code-review skill to version 2.0.0, introducing a "Paranoid Architect" persona and detailed workflows for performing deep analysis on both local changes and GitHub pull requests. The updates include specific instructions for fetching full file contexts, applying six analytical lenses (e.g., security, correctness, architecture), and posting findings back to GitHub. Corresponding documentation updates were made to the github skill to clarify search endpoints and response handling. Furthermore, a new end-to-end test suite and associated trace fixtures have been added to verify the skill's performance. Review feedback suggests improving the robustness of the GitHub API integration by URL-encoding file paths and demonstrating multi-line comment support in the provided examples.

Comment thread skills/code-review/SKILL.md Outdated
```repl
r = await http(
method="GET",
url=f"https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={head_sha}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file path should be URL-encoded before being interpolated into the URL. While many code paths are safe, files with spaces or special characters will cause the request to fail. Since the github skill documentation (which this skill requires) already mentions encoding for query parameters, it's best to be consistent here as well. Note that you may need to import urllib.parse if the environment allows it.

Suggested change
url=f"https://api.github.com/repos/{owner}/{repo}/contents/{path}?ref={head_sha}",
url=f"https://api.github.com/repos/{owner}/{repo}/contents/{urllib.parse.quote(path)}?ref={head_sha}",

"body": "**High** — `state.store` accessed directly, bypassing dispatch. See `.claude/rules/tools.md`.",
"commit_id": head_sha,
"path": "src/channels/web/handlers/foo.rs",
"line": 142,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The GitHub API supports multi-line comments using start_line and start_side. Since the instructions in Step 3 (line 102) mention capturing a "line range", it would be helpful to include these fields in the example to show the agent how to post findings that span multiple lines.

Suggested change
"line": 142,
"start_line": 140,
"start_side": "RIGHT",
"line": 142,

henrypark133
henrypark133 previously approved these changes Apr 16, 2026
Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No verified blockers in the changed code.

I checked the GitHub tool schema/description updates, the code-review/github skill guidance, and the new live replay coverage. The substantive change here is tightening the model-facing contract around GitHub search actions and adding caller-level regression coverage so the agent has to fetch and name the requested PR instead of silently substituting another target.

Residual risk: most of this PR is prompt/skill behavior, so future regressions can still come from model drift, but the added replay fixture is the right guardrail for that class of bug.

…arness methods

- Remove `.into_iter()` on `details` in catalog.rs (clippy::useless_conversion)
- Add `with_skills_dir` to `LiveTestHarnessBuilder` for e2e_live_code_review test
- Add `active_skill_names` to `TestRig` extracting from SkillActivated status events

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Upgrades the code-review skill to a paranoid-architect v2 that handles both local diffs and GitHub PRs with line-level comment posting, clarifies the GitHub skill's search endpoints, adds a 313-line live/replay e2e test for the new flow, and ships a 2979-line recorded trace fixture. Scope is broader than "minor text updates" but the pieces are coherent.

Findings

Blocking

None.

Suggestions

  • Merge ordering with PR #2530. tests/support/test_rig.rs:336-348 adds active_skill_names() which destructures StatusUpdate::SkillActivated { skill_names }. PR #2530 adds a feedback field to that variant. Whichever merges second will need a .. pattern added to the match. No runtime impact, but will break CI on the second PR unless the author rebases. Worth a heads-up comment in both PRs.
  • Ambiguous mode detection (skills/code-review/SKILL.md:79): "If the message contains anything shaped like owner/repo followed by a number, treat it as a PR request and use the GitHub path, not git." A user saying "review nearai/ironclaw 2483 locally" would be ambiguous. Consider explicitly calling out that a trailing locally / local keyword should override.
  • tools-src/github/src/lib.rs:2151 schema description is now 500+ chars; JSON schema descriptions are typically shown truncated in tool registries. Keep the canonical copy here and move extended examples to the skills/github/SKILL.md file you already updated.
  • crates/ironclaw_skills/src/catalog.rs:464 .zip(details.into_iter()) -> .zip(details) is a drive-by clippy fix unrelated to docs. Fine, but noting for future splits.
  • tests/e2e_live_code_review.rs:637 hard-codes PR_2483_TITLE. A comment notes "update this constant alongside re-recording the trace fixture" — good, but consider fetching the title from the trace fixture at test time so drift is caught by the replay machinery instead of by a stale assert.

Nits

  • skills/github/SKILL.md:369-371 Common Mistakes now has a line about /search/issues covering both — this is duplicated in the Search section above; one place would suffice.
  • registry/tools/github.json description is now very long (300+ chars). Same caveat as the schema description above.

Verdict

COMMENT — ship it. The skill instructions are thoughtful and the regression test meaningfully guards against the "silent-substitution" failure mode. Coordinate the SkillActivated pattern match with PR #2530's author to avoid a CI break on the second merge.

henrypark133
henrypark133 previously approved these changes Apr 17, 2026
Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No verified blockers in the updated docs, tool metadata, and test changes. The GitHub search contract is now explicit about search_issues_pull_requests, the skill guidance fixes the earlier ambiguous query details, and the live/replay coverage still exercises the real code-review owner/repo N flow instead of allowing a generic non-skill response to slip through. Residual risk is mostly prompt behavior rather than Rust logic, but the caller-level trace test is the right backstop for this PR.

…ne comments, description trimming (#2528)

- URL-encode file paths in GitHub API content URLs
- Add start_line/start_side to multi-line comment example
- Add 'locally' keyword override for mode detection
- Trim overly long schema descriptions
- Remove duplicated /search/issues note from Common Mistakes
- Fetch PR title from trace fixture instead of hard-coding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ilblackdragon
Copy link
Copy Markdown
Member

PR 2528 Review — docs(skills): code-review v2 + GitHub endpoint fixes

Overview

Upgrades the code-review skill to a paranoid-architect v2 that handles both local diffs and GitHub PRs with line-level comment posting, clarifies the github skill's search/user endpoints, tightens the search_issues_pull_requests tool description, lands a new live/replay e2e test for /code-review owner/repo N, and fixes one clippy nit.

Strengths

  • The code-review v2 SKILL.md is well-structured: two clear input shapes, six-lens review rubric, explicit severity table, and concrete repl blocks. The warnings about FINAL() semantics, asyncio closure-capture pitfalls, and not normalizing body with .get(\"body\", body) are specific and rooted in prior failures — exactly what a skill should teach.
  • github/SKILL.md now disambiguates search endpoints and documents the http-tool envelope shape ({status, headers, body}), which is the single biggest source of hallucinated GitHub calls.
  • tools-src/github/src/lib.rs description clarifies that there is no search_issues action; this matches the schema at line 2153.
  • The e2e test is thoughtful: it validates skill activation, target URL, PR title presence, a concrete file citation, and explicitly guards against the "empty shell / unknown fields" regression.

Issues

1. LiveTestHarnessBuilder::with_skills_dir() silently drops its PathBuf argument (likely bug).
tests/support/live_harness.rs:352,549,598 store skills_dir: Option<PathBuf> but then use it only as an is_some() flag to call rig_builder.with_skills(), which just flips enable_skills = true (test_rig.rs:740). The actual skills directory consumed by AppBuilder comes from config.skills.local_dir, which test_rig.rs:849,857,1049 hardcodes to temp_dir.path().join(\"skills\") — an empty dir. So .with_skills_dir(repo_skills_dir()) is equivalent to .with_skills_dir(PathBuf::from(\"/anywhere\")).

Impact: in replay mode the SkillRegistry has no skills loaded, no SkillActivated event fires for code-review, and the test's first assertion (active.iter().any(|s| s == \"code-review\")) should fail. If the fixture was recorded with SKILLS_DIR env var or some ad-hoc override, live mode is hiding this. This is the exact class of silent-argument-loss bug .claude/rules/testing.md calls out ("Test Through the Caller, Not Just the Helper").

Fix: propagate the stored path into TestRigBuilder so it writes config.skills.local_dir (and update the enable_skills block at test_rig.rs:1047 to use that path, not another temp subdir).

2. max_context_tokens doubled (1200 → 2500).
v2 is substantially longer, but 2500 is sizable against SKILLS_MAX_TOKENS. Worth confirming it still fits alongside github + coding in the default budget without starving other skills.

3. Fixture size.
code_review_pr_2483.json is ~100 KB / 2979 lines committed. Acceptable for replay determinism, but subsequent re-records will churn the repo. Consider whether this fixture needs pruning of x-github-* ratelimit/date headers before recording (they don't affect replay).

Minor / Nits

  • skills/code-review/SKILL.md:95 comment says "three serial GETs" — the example actually issues two (meta+diff on the same URL) plus files, which is three, but the rationale sentence about asyncio import ordering is a lot of caveat for what could also be solved by a single helper. Fine as-is.
  • skills/github/SKILL.md:174 example hardcodes nearai/ironclaw — generalize to {owner}/{repo} for consistency with the rest of the doc.
  • catalog.rs:464 clippy cleanup is correct (zip accepts IntoIterator).

Risk Summary

The SKILL.md / description churn is text-only and low risk. The live/replay test as written likely won't catch skill-activation regressions in replay mode due to issue #1 above — worth fixing before relying on it as a guardrail.

@serrrfirat
Copy link
Copy Markdown
Collaborator Author

Review feedback addressed

All comments from @gemini-code-assist and @zmanian have been addressed:

# Reviewer Fix
1 gemini-code-assist URL-encode file paths in GitHub API content URLs (urllib.parse.quote(path, safe=''))
2 gemini-code-assist Added start_line/start_side to multi-line comment example
3 zmanian Added locally/local keyword override for ambiguous mode detection
4 zmanian Trimmed search_issues_pull_requests schema description (350→110 chars)
5 zmanian Removed duplicate /search/issues note from Common Mistakes
6 zmanian Trimmed registry/tools/github.json description (334→151 chars)
7 zmanian PR title now extracted from trace fixture at test time instead of hard-coded

Item 8 (merge ordering with PR #2530 re: SkillActivated pattern) — will handle on rebase when the second PR merges.

Quality gate clean: cargo fmt ✓ · cargo clippy ✓ · cargo test --lib

@serrrfirat serrrfirat requested a review from zmanian April 17, 2026 07:05
LiveTestHarnessBuilder::with_skills_dir() stored a PathBuf but only
used it as an is_some() flag — the actual SkillRegistry always pointed
at an empty temp directory. Now the stored path flows through
TestRigBuilder::with_skills_dir() into config.skills.local_dir and
the SkillRegistry constructor.

Also generalizes the hardcoded nearai/ironclaw repo name in the
github skill's response-handling example to {owner}/{repo}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@serrrfirat
Copy link
Copy Markdown
Collaborator Author

Addressed @ilblackdragon review feedback

# Issue Fix
1 with_skills_dir() silently drops PathBuf — skills dir hardcoded to empty temp subdir Propagated skills_dir through TestRigBuilder into config.skills.local_dir; live_harness.rs now passes the actual path instead of just flipping a boolean flag
2 skills/github/SKILL.md:174 hardcodes nearai/ironclaw Generalized to {owner}/{repo}/pulls/123
3 max_context_tokens (1200→2500) / fixture size Noted — non-blocking, no code change

Commit: e72b3fc

Quality gate clean: cargo fmt ✓ · cargo clippy ✓ · cargo test --lib

@ilblackdragon ilblackdragon merged commit 27d53f5 into staging Apr 17, 2026
15 checks passed
@ilblackdragon ilblackdragon deleted the docs/skill-text-updates branch April 17, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: docs Documentation size: S 10-49 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants