Add GitHub Actions workflow to run evaluate-pr-tests via Copilot CLI#34548
Add GitHub Actions workflow to run evaluate-pr-tests via Copilot CLI#34548
Conversation
Adds a workflow that automatically evaluates test quality on PRs using the evaluate-pr-tests skill through the Copilot CLI. Architecture: - 3-job pipeline: gate → evaluate → comment - Same-repo PRs: triggers automatically on test file changes - Fork PRs: gated behind /evaluate-tests comment from write+ user - Token separation: COPILOT_GITHUB_TOKEN for Copilot API, github.token for PR comments - File sandboxing: --add-dir restricts agent to repo checkout only - Tool allowlist: bash, view, grep, glob, lsp (no git push, no network) - Idempotent comments: updates existing bot comment instead of creating duplicates Security model follows dotnet/skills evaluation.yml patterns: - Pinned action SHAs - persist-credentials: false - Minimal per-job permissions - Fork PR permission verification Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34548Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34548" |
…el if: Adopted the dotnet/skills pattern: comment body is only checked in the GitHub Actions expression engine (job-level if: condition), never passed to any run: block or env: variable. Permission check and eyes reaction are separate steps. All event data in run: blocks passed via env:. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add workflow file paths to pull_request trigger so the workflow runs on its own PR - Add COPILOT_GITHUB_TOKEN presence check before running Copilot CLI - Write informative message when token is missing (graceful degradation) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The download-artifact action preserves the original directory structure, so /tmp/copilot-output.txt becomes /tmp/evaluation/tmp/copilot-output.txt. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Pass GH_TOKEN (github.token) to Copilot CLI so it can resolve PR metadata via 'gh pr diff' and 'gh api' - Add usage example for dispatching against a specific PR from this branch - Include repo context in prompt so CLI knows which repo to query Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem: Copilot CLI checked out the target PR's head SHA, which doesn't have the evaluate-pr-tests skill. Also couldn't call gh API from sandbox. Fix: - Gate job now pre-fetches PR diff, metadata, and file list as artifacts - Evaluate job checks out the workflow branch (which has .github/skills/) - CLI reads pre-fetched PR context from /tmp/pr-context/ (local files) - No GH_TOKEN needed in evaluate job — all API calls happen in gate - Added /tmp/pr-context to --add-dir for CLI sandbox access Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CLI sandbox couldn't run pwsh or create directories, so it was synthesizing evaluations instead of using the actual analysis script. Fix: - Gate job now checks out PR code at head_sha (fetch-depth: 0 for diff) - Runs Gather-TestContext.ps1 against the PR code with proper base branch - Includes test-context.md in the pr-context artifact - Updated prompt: tells CLI to read pre-generated files, NOT run scripts - Explicit instruction to not attempt running Gather-TestContext.ps1 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
actions/checkout refuses paths outside /home/runner/work/ — use 'pr-code' relative to workspace instead of /tmp/pr-code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update the copilot-evaluate-tests workflow to prefer a MAUI_BOT_TOKEN PAT (from the MauiBot account) for posting PR comments and fall back to the built-in github.token. Comments in the file were updated to document the new token purpose and clarify that github.token is used for reactions and commit statuses. The GH_TOKEN env was changed to ${{ secrets.MAUI_BOT_TOKEN || github.token }} to implement the fallback.
- Add copilot-evaluate-tests.md (gh-aw source) - Add copilot-evaluate-tests.lock.yml (compiled) - Delete copilot-evaluate-tests.yml (old CLI workflow) - Delete Post-CopilotComment.ps1 (use add_comment safe-output) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hub.com/dotnet/maui into feature/copilot-evaluate-tests-workflow # Conflicts: # .github/scripts/Post-CopilotComment.ps1 # .github/workflows/copilot-evaluate-tests.yml
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 PR Test EvaluationOverall Verdict: ℹ️ No test evaluation needed This PR adds GitHub Actions workflow infrastructure for automated test evaluation and includes minor test utility improvements. Since it contains no functional code changes that would require test coverage, the standard test evaluation criteria do not apply. 📊 Expand Analysis SummaryChanged Files AnalysisThis PR primarily contains: Workflow Infrastructure (83% of changes)
Minor Test Infrastructure Improvements (17% of changes)
Assessment✅ Appropriate scope: This is infrastructure work that doesn't require functional test coverage Warning
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 PR Test EvaluationOverall Verdict: ❌ Tests are insufficient No tests were added for this GitHub workflow infrastructure change. 📊 Expand Full EvaluationPR Test Evaluation ReportPR: #34548 — Add copilot-evaluate-tests workflow Overall Verdict❌ Tests are insufficient This PR adds GitHub workflow infrastructure but includes no tests to validate the workflow functionality. 1. Fix Coverage — ❌No tests were added for this infrastructure change. While this PR adds a GitHub Actions workflow for test evaluation, there are no tests to verify:
2. Edge Cases & Gaps — ❌Covered:
Missing:
3. Test Type Appropriateness — ❌Current: No tests 4. Convention Compliance — ✅N/A - No test files to evaluate for conventions. 5. Flakiness Risk — N/ACannot assess without tests. 6. Duplicate Coverage — ✅ No duplicatesNo existing workflow tests to compare against. 7. Platform Scope — N/AThis is workflow infrastructure, not platform-specific code. 8. Assertion Quality — ❌No assertions present as no tests were added. 9. Fix-Test Alignment — ❌The infrastructure change (workflow addition) has no corresponding tests to validate its functionality. Recommendations
Note: For infrastructure/workflow changes like this, testing may be inherently challenging and manual validation may be sufficient. However, documenting the manual testing performed would improve confidence in the change. Warning
|
Eliminates integrity filtering warning on fork PR comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 PR Test EvaluationOverall Verdict: ✅ Tests are adequate This PR adds GitHub Actions workflow infrastructure and does not contain functional code changes that require test coverage. 📊 Expand Full EvaluationPR Test Evaluation ReportPR: #34548 — Add GitHub Actions workflow to run evaluate-pr-tests via Copilot CLI Overall Verdict✅ Tests are adequate This PR adds GitHub Actions workflow infrastructure ( 1. Fix Coverage — ℹ️ N/AFinding: This PR contains no functional code changes to evaluate for test coverage. Analysis: The changed files are GitHub Actions workflow definitions and documentation:
These are infrastructure/configuration files that enable automated test evaluation on future PRs, not code that needs test coverage itself. Evaluation SummaryNo evaluation criteria apply to this PR because:
Expected behavior: Future PRs that modify functional code will trigger this new workflow to automatically evaluate their test coverage using the RecommendationsNone required — This PR appropriately adds CI infrastructure without functional changes requiring test coverage. For future PRs: The new workflow will automatically evaluate test quality when functional changes are detected. Warning
|
The agent was evaluating the workflow branch instead of the target PR when triggered via workflow_dispatch. Two fixes: 1. Prompt now instructs gh pr checkout <number> before analysis 2. Gather-TestContext.ps1 accepts -PrNumber and uses gh pr diff for correct file listing regardless of local checkout state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🧪 PR Test EvaluationOverall Verdict: ℹ️ No functional test evaluation needed This PR adds GitHub Actions workflow infrastructure and contains only cleanup/refactoring changes to test utilities. 📊 Expand Full EvaluationPR Test Evaluation ReportPR: #34548 — Add GitHub Actions workflow to run evaluate-pr-tests via Copilot CLI Overall Verdictℹ️ No functional test evaluation needed This PR adds GitHub Actions workflow infrastructure for automated test evaluation. The test file changes are infrastructure cleanup/refactoring, not new test coverage for bug fixes. Analysis SummaryWhat this PR contains:
Files changed:
Infrastructure Changes AssessmentTest Utility Updates:
Quality: These are appropriate cleanup changes that:
Recommendations
Warning
|
Re-Review #3: PR #34548 — Commits through
|
| Commit | Change |
|---|---|
7b06ea0 |
Fetch changed files via API instead of skipping fork PRs |
acf8b1c |
Remove fork guard entirely, allow checkout for all PRs |
8ab12f4 |
Restore .github/skills/ and .github/instructions/ from base branch after checkout |
Previous Findings — All Still Fixed ✅
| Finding | Status |
|---|---|
| Path traversal in Gather-TestContext.ps1 | ✅ FIXED |
Binary corruption (Encoding.UTF8 → WriteAllBytes) |
✅ FIXED |
| URL encoding for API paths | ✅ FIXED |
| Fork guard fail-open | N/A — guard removed, replaced with new model |
New Security Model Assessment
The "checkout without execution" approach is architecturally sound for preventing classic pwn-request RCE. No workspace scripts are executed after checkout in the steps: context. The gh-aw sandbox runs with scrubbed credentials, network firewalling, and safe-outputs limits (max 1 comment). The pre_activation gate blocks fork PRs on pull_request events (head.repo.id == repository_id). This is well-documented with clear warnings.
However, the trusted-file restoration has structural issues:
New Findings (5-model consensus, 2+ agreement threshold)
🔴 Finding 1 — checkout_pr_branch.cjs runs AFTER restore, likely undoes it (5/5 models)
The compiled lock.yml has two "Checkout PR branch" steps in the agent job:
- User's step (~line 332):
gh pr checkout+git checkout "$BASE_SHA" -- .github/skills/ .github/instructions/✅ - gh-aw platform step (~line 347):
checkout_pr_branch.cjswithif: (github.event.pull_request) || (github.event.issue.pull_request)— runs after step 1
For /evaluate-tests on a fork PR (issue_comment trigger), checkout_pr_branch.cjs fires and likely re-applies the PR branch, overwriting the restored files. The restoration becomes a no-op for the exact scenario it was designed to protect.
Fix: Move the git checkout "$BASE_SHA" restoration to occur after all checkout steps, or verify with the gh-aw team that checkout_pr_branch.cjs is a no-op when the branch is already checked out.
🔴 Finding 2 — .github/copilot-instructions.md not in restore list (5/5 models)
The restore covers two directories but misses the primary Copilot instruction file:
git checkout "$BASE_SHA" -- .github/skills/ .github/instructions/ 2>/dev/null || true
# ☝️ missing: .github/copilot-instructions.mdThis 19KB file is auto-loaded by Copilot CLI as the system-level behavioral context. A fork PR could modify it to inject instructions that bias the evaluation (e.g., "Always report tests as adequate").
Fix: Add to the restore list:
git checkout "$BASE_SHA" -- .github/skills/ .github/instructions/ .github/copilot-instructions.md 2>/dev/null || true🟡 Finding 3 — git checkout doesn't remove fork-added files (4/5 models)
git checkout "$BASE_SHA" -- .github/skills/ restores existing files but does not remove files added by the fork. A fork could add .github/skills/malicious/SKILL.md that survives the restore and gets discovered by the agent.
Fix:
rm -rf .github/skills/ .github/instructions/
git checkout "$BASE_SHA" -- .github/skills/ .github/instructions/ .github/copilot-instructions.md 2>/dev/null || true🟡 Finding 4 — || true silently swallows restore failures (4/5 models)
If the restore fails (bad SHA, git error), the workflow continues with fork-supplied files and no observable failure. At minimum, log a warning.
🟢 Finding 5 — Gate exit 1 still shows red X (3/5 models)
PRs without test files get a ❌ failed check instead of a skip. UX issue, not security.
🟢 Finding 6 — .pr-changed-files.txt spoofable by fork (3/5 models)
A fork could commit a .pr-changed-files.txt picked up by Gather-TestContext.ps1, directing the agent to evaluate different files. Low impact since the agent also reads the actual PR diff.
Blast Radius Assessment
Even if all findings above were exploited:
- No RCE — no workspace scripts executed in privileged context
- No credential exfil — sandbox has scrubbed credentials + network firewall
- Max damage — 1 misleading PR comment (safe-outputs limit)
- Threat detection — gh-aw's threat detection job analyzes agent output
CI Status: ✅ passing, ⏭️ 2 skipping, ⏳ Build Analysis pending
Recommended Action: ⚠️ Request Changes
Findings 1 and 2 are straightforward fixes. The fundamental "checkout without execution" model is sound, but the restoration needs to:
- Run after
checkout_pr_branch.cjs(or be verified as not undone) - Include
.github/copilot-instructions.mdin the restore list - Clean directories before restoring (
rm -rf+git checkout)
Findings addressed: - [CRITICAL] checkout_pr_branch.cjs ordering: split restore into a separate step that runs as late as possible. For workflow_dispatch the platform checkout is skipped (if condition false), so this is fully effective. For issue_comment on fork PRs, the platform step may still run after — noted as a gh-aw platform limitation. - [CRITICAL] .github/copilot-instructions.md added to restore list - [MODERATE] rm -rf before git checkout to remove fork-added files - [MODERATE] restore failure now exits with error instead of || true - [MINOR] gate exit 1 is a UX issue — acceptable for now Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-Review #3 Response — Commit
|
| Trigger | Platform checkout runs? | Restore effective? |
|---|---|---|
workflow_dispatch |
❌ Skipped (if: pull_request || issue.pull_request is false) |
✅ Fully effective |
pull_request (same-repo) |
✅ Runs | ✅ N/A — same repo has all files |
pull_request (fork) |
✅ Runs but pre_activation gate blocks fork PRs |
✅ N/A — workflow never reaches agent |
issue_comment (fork PR) |
✅ Runs and may undo restore |
The only vulnerable path is /evaluate-tests comments on fork PRs. This is a gh-aw platform ordering limitation — we cannot insert user steps after platform steps. Options:
- File a gh-aw feature request for post-platform user steps
- Verify with gh-aw team whether
checkout_pr_branch.cjsis a no-op when branch is already checked out - Accept the risk (blast radius = 1 misleading comment on a fork PR explicitly triggered by a maintainer)
🔴 Finding 2 — .github/copilot-instructions.md ✅ FIXED
Added to the git checkout restore list alongside .github/skills/ and .github/instructions/.
🟡 Finding 3 — Fork-added files ✅ FIXED
Added rm -rf .github/skills/ .github/instructions/ before git checkout restore. This removes any fork-added files before restoring the base branch versions.
🟡 Finding 4 — || true swallowing failures ✅ FIXED
Replaced with explicit exit code check:
git checkout "$BASE_SHA" -- ... 2>/dev/null
if [ $? -eq 0 ]; then
echo "✅ Restored agent infrastructure from base branch ($BASE_SHA)"
else
echo "⚠️ Failed to restore agent infrastructure from $BASE_SHA"
exit 1
fi🟢 Finding 5 — Gate exit 1 red X
Acknowledged as UX issue. Keeping exit 1 for now since exit 0 would require a separate mechanism to prevent the agent from running on non-test PRs.
🟢 Finding 6 — .pr-changed-files.txt spoofing
Low impact since the agent also reads the actual PR diff via MCP. No change needed.
Architecture note
BASE_SHA is now saved to $GITHUB_ENV in the checkout step so the restore step can read it. The two steps are:
- Checkout PR branch — saves
BASE_SHA, runsgh pr checkout - Restore agent infrastructure —
rm -rf+git checkout $BASE_SHA -- <paths>, fails hard on error
Reference upstream gh-aw#18481 in the error message and point users to the Actions tab workflow_dispatch trigger as the workaround. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-Review #4: PR #34548 — Commits
|
| Finding | Status | Notes |
|---|---|---|
🔴 F1: checkout_pr_branch.cjs ordering |
✅ ADDRESSED | Untracked .restore-backup/ survives git checkout; agent prompt (from trusted base-branch artifact) instructs restore as first action |
🔴 F2: .github/copilot-instructions.md not in restore |
✅ FIXED | Added to both git checkout $BASE_SHA -- list and .restore-backup/ backup |
| 🟡 F3: Fork-added files survive restore | ✅ FIXED | rm -rf .github/skills/ .github/instructions/ now runs before restore |
🟡 F4: || true swallowed failures |
✅ FIXED | Replaced with if [ $? -eq 0 ]; then ... else exit 1; fi |
🟢 F5: Gate exit 1 UX |
Accepted by author — no change | |
| 🟢 F6: Changed files spoofable | ✅ IMPROVED | Gather-TestContext.ps1 -PrNumber uses gh pr diff via API, not local git diff |
The untracked backup design is mechanically correct: git checkout <branch> only affects tracked files, so .restore-backup/ (untracked) survives checkout_pr_branch.cjs. The agent prompt is baked from the base-branch artifact by the activation job — fork PRs cannot alter it.
Remaining Issue: issue_comment Fork Guard is Fail-Open (4/4 consensus)
The new guard for /evaluate-tests comments on fork PRs:
HEAD_REPO_ID=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" \
--json headRepository --jq '.headRepository.id // ""')
BASE_REPO_ID=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" \
--json baseRepository --jq '.baseRepository.id // ""')
if [ -n "$HEAD_REPO_ID" ] && [ -n "$BASE_REPO_ID" ] && \
[ "$HEAD_REPO_ID" != "$BASE_REPO_ID" ]; then
exit 1
fiWhen gh pr view fails (API error, rate limit, transient network issue), both IDs are "". [ -n "" ] is false, so the outer if never fires — fork PRs are not blocked. This is fail-open, and is a regression from the fail-closed approach in e65c602 ([ -z "$HEAD_OWNER" ] || ...).
One-line fix:
HEAD_REPO_ID=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" \
--json headRepository --jq '.headRepository.id' 2>/dev/null) || { echo "❌ API error"; exit 1; }
BASE_REPO_ID=$(gh pr view "$PR_NUMBER" --repo "$GITHUB_REPOSITORY" \
--json baseRepository --jq '.baseRepository.id' 2>/dev/null) || { echo "❌ API error"; exit 1; }
if [ "$HEAD_REPO_ID" != "$BASE_REPO_ID" ]; then
exit 1
fiMitigating factors (why this is 🟡 not 🔴):
- Even if the guard fails open, the untracked backup + agent prompt restore still apply
- Agent runs in a sandboxed container with scrubbed credentials
- Max blast radius remains 1 misleading PR comment (safe-outputs enforced)
pull_request-triggered fork runs are already blocked by the activation job'sif:condition
Minor Note: Agent Restore Is a Soft Control (4/4 consensus)
The final defense is the LLM agent following the prompt instruction: "Before doing ANYTHING else, you MUST restore from .restore-backup/". The prompt is trusted (base-branch artifact, not workspace), but LLM compliance is probabilistic. This is acceptable as defense-in-depth (sandbox + max 1 comment), not as a sole security gate — and it isn't, since fork PR paths have additional blocks before reaching the agent.
Verdict: ⚠️ One small fix, then ✅ Approve
The fail-open guard is a one-line change. Everything else is well-addressed, and the untracked backup architecture is clever and sound. Once the issue_comment fork guard is made fail-closed, this is ready to merge.
Review conducted with 5-model consensus (claude-opus-4.6 ×2, claude-sonnet-4.6, gemini-3-pro-preview, gpt-5.3-codex)
When gh pr view fails (rate limit, network error), both IDs are empty and the guard silently passes. Now uses || exit 1 on each API call and removes the -n guards so any mismatch (including empty) blocks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The .restore-backup/ mechanism and agent prompt restore were dead code: - Same-repo PRs: files already exist on the PR branch - Fork PRs via workflow_dispatch: platform checkout skipped, restore works - Fork PRs via /evaluate-tests: hard-fail before reaching restore Keep only the simple git checkout restore for the workflow_dispatch path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…il-closed patterns Incorporates learnings from building copilot-evaluate-tests workflow: - Detailed execution model diagram (activation → user steps → platform steps → agent) - Step ordering constraint (user steps always before platform steps) - Prompt rendering via runtime-import from base branch - Fork PR behavior table by trigger type - issue_comment + fork platform limitation and workaround - Fail-closed fork guard pattern with code examples - Upstream gh-aw issue references (#18481, #18518, #18521) - Expanded limitations and troubleshooting tables Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move checkout, fork guard, and restore logic into .github/scripts/gh-aw-checkout-pr.sh. This makes the pattern reusable by future gh-aw workflows with a single step: run: bash .github/scripts/gh-aw-checkout-pr.sh The script handles: - Base SHA capture before PR checkout - Fail-closed fork guard for issue_comment triggers - PR branch checkout via gh CLI - Restore of trusted agent infrastructure from base branch Fix set -e + $? bug found by 3-model review (Opus/Sonnet/Codex): git checkout inside if-condition is exempt from errexit, ensuring the diagnostic message prints on restore failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All other scripts in .github/scripts/ are .ps1. Rename gh-aw-checkout-pr.sh to Checkout-GhAwPr.ps1, following PowerShell verb-noun naming convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the fork detection guard from Checkout-GhAwPr.ps1. Fork PRs that are rebased on main already have the skill files, so checkout_pr_branch.cjs producing the final workspace state is fine. Make the base-branch restore best-effort (non-fatal) since for issue_comment triggers the platform may overwrite it anyway. Add a pre-flight check to the agent prompt: if SKILL.md is missing after all checkouts, post a helpful comment telling the user to rebase or use workflow_dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… debugging - Remove .pr-changed-files.txt support from Gather-TestContext.ps1 (dead code attack surface — no step creates this file, fork could ship one to manipulate analysis) - Update gh-aw instructions: replace stale fork hard-fail documentation with optimistic path description and known limitation - Clean up Checkout-GhAwPr.ps1: remove unused GITHUB_EVENT_NAME/WORKFLOW_NAME references, replace 2>$null with 2>&1 for restore diagnostics - Remove WORKFLOW_NAME env var from workflow (no longer consumed by script) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Re-Review #6 — Round 6 Security Assessment
Previous Findings Status
New Findings (3/4 model consensus)🟡 MODERATE —
|
Address PR review findings (3-model consensus + PureWeen Round 6): 1. Checkout-GhAwPr.ps1: Add fail-closed fork guard for issue_comment triggers using isCrossRepository. Fork PRs are rejected with an actionable notice to use workflow_dispatch instead. API failures also fail closed (exit 1, not 0). 2. Gather-TestContext.ps1: Escape markdown metacharacters (pipes, angle brackets) in Format-FileList and use double-backtick code spans so literal backticks in filenames render correctly. 3. gh-aw-workflows.instructions.md: Update fork handling docs to reflect the fail-closed guard, remove 'Known limitation' section, update trigger table and troubleshooting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ready_for_review and reopened to pull_request types (reopened was lost when overriding defaults) - Gate pull_request activation on draft == false so draft PRs don't waste agent runs - When marked ready-for-review, the workflow now re-triggers - Reorder concurrency group to prefer pull_request.number first Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Description
Adds a gh-aw (GitHub Agentic Workflows) workflow that automatically evaluates test quality on PRs using the
evaluate-pr-testsskill.What it does
When a PR adds or modifies test files, this workflow:
evaluate-pr-testsskill via Copilot CLI in a sandboxed containerTriggers
pull_requestsrc/**/tests/**)pre_activationgateworkflow_dispatchissue_comment(/evaluate-tests)Security model
steps:checks out PR code but never executes workspace scripts.github/skills/,.github/instructions/,.github/copilot-instructions.mdrestored from base branch after checkoutpull_requestevents blocked for forks viahead.repo.id == repository_idactions/checkout,actions/github-script, etc.Files added/modified
.github/workflows/copilot-evaluate-tests.md— gh-aw workflow source.github/workflows/copilot-evaluate-tests.lock.yml— Compiled workflow (auto-generated bygh aw compile).github/skills/evaluate-pr-tests/scripts/Gather-TestContext.ps1— Test context gathering script (binary-safe file download, path traversal protection).github/instructions/gh-aw-workflows.instructions.md— Copilot instructions for gh-aw developmentKnown Limitations
Fork PR evaluation via
/evaluate-testscomment is not supported in v1. The gh-aw platform inserts acheckout_pr_branch.cjsstep after all user steps, which may overwrite base-branch skill files restored for fork PRs. This is a known gh-aw platform limitation — user steps always run before platform-generated steps, with no way to insert steps after.Workaround: Use
workflow_dispatch(Actions UI → "Run workflow" → enter PR number) to evaluate fork PRs. This trigger bypasses the platform checkout step entirely and works correctly.Related upstream issues:
gh aw initFixes