Agent Workflow Metrics via GitHub Labels#33986
Conversation
de1a7e8 to
a530d85
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive GitHub label-based metrics system for tracking AI agent PR review workflow outcomes. The system uses s/agent-* prefixed labels to track review outcomes, test verification results, and fix comparison results across the automated PR review pipeline.
Changes:
- Introduces new label management module (
Update-AgentLabels.ps1) with idempotent label operations - Adds Phase 4 to Review-PR.ps1 for automatic label application based on phase outcomes
- Refactors agent output from centralized state files to distributed
content.mdfiles per phase - Updates all agent instructions and skill documentation to reflect the new phase output artifact structure
- Removes old label management code (
Update-VerificationLabels) in favor of centralized system - Adds new CI/Copilot pipeline configuration for automated agent PR reviews
- Cleans up Azure DevOps variable groups and pipeline configuration
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
.github/scripts/shared/Update-AgentLabels.ps1 |
New module implementing label management with parsing, application, and self-bootstrapping |
.github/scripts/Review-PR.ps1 |
Adds Phase 4 for label application; implements pinned SHA restoration; adds phase output directories |
.github/docs/agent-labels.md |
Comprehensive documentation of the label system, architecture, and usage examples |
eng/pipelines/ci-copilot.yml |
New Azure DevOps pipeline for running Copilot PR reviewer agent with full environment setup |
eng/pipelines/common/variables.yml |
Simplifies variable group structure; removes unused conditional logic |
eng/pipelines/common/provision.yml |
Adds skipCertificates parameter for CI scenarios |
.github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 |
Removes old label management; updates output path to new structure |
.github/skills/try-fix/SKILL.md |
Updates documentation to remove state file references |
.github/skills/learn-from-pr/SKILL.md |
Removes session markdown references |
.github/skills/ai-summary-comment/scripts/*.ps1 |
Updates all scripts to auto-load from PRAgent phase content.md files instead of state files |
.github/skills/ai-summary-comment/SKILL.md |
Documents new auto-loading behavior from phase files |
.github/skills/ai-summary-comment/NO-EXTERNAL-REFERENCES-RULE.md |
Simplifies by removing state file references |
.github/skills/ai-summary-comment/IMPROVEMENTS.md |
Updates terminology from "state file" to "content" |
.github/scripts/shared/Start-Emulator.ps1 |
Improves iOS simulator selection logic for UI test baseline compatibility |
.github/scripts/shared/Build-AndDeploy.ps1 |
Adds logic to shutdown other booted simulators before deployment |
.github/scripts/BuildAndRunHostApp.ps1 |
Adds test artifact collection for screenshots and page source |
.github/copilot-instructions.md |
Updates agent documentation to reflect new output structure |
.github/agents/pr/post-gate.md |
Updates for autonomous execution mode and phase output artifacts |
.github/agents/pr/SHARED-RULES.md |
Major update: documents phase output artifacts and agent label system; changes blocking behavior to autonomous |
.github/agents/pr/PLAN-TEMPLATE.md |
Updates plan template to reflect new phase output requirements |
.github/agents/pr.md |
Removes state file creation steps; updates for phase output artifacts |
.github/agents/learn-from-pr.md |
Removes session markdown references |
| | `s/agent-fix-win` | 🟢 `#66BB6A` | AI found a better alternative fix than the PR | Fix phase: alternative selected over PR's fix | | ||
| | `s/agent-fix-pr-picked` | 🟠 `#FF7043` | AI could not beat the PR fix — PR is the best among all candidates | Fix phase: PR selected as best after comparison | | ||
|
|
||
| Gate labels (`gate-passed`/`gate-failed`) are mutually exclusive with each other. Fix labels (`fix-win`/`fix-lose`) are mutually exclusive with each other. |
There was a problem hiding this comment.
The label name s/agent-fix-pr-picked in the code does not match the documentation which refers to it as s/agent-fix-lose.
In the documentation at line 34, the table mentions "Fix labels (fix-win/fix-lose)" suggesting the label should be called s/agent-fix-lose, but the actual label defined in Update-AgentLabels.ps1 line 35 is s/agent-fix-pr-picked.
Either the code should use s/agent-fix-lose to match the documentation's naming pattern, or the documentation should be updated to consistently use s/agent-fix-pr-picked. The current mismatch could cause confusion when users try to query these labels.
| Gate labels (`gate-passed`/`gate-failed`) are mutually exclusive with each other. Fix labels (`fix-win`/`fix-lose`) are mutually exclusive with each other. | |
| Gate labels (`gate-passed`/`gate-failed`) are mutually exclusive with each other. Fix labels (`fix-win`/`fix-pr-picked`) are mutually exclusive with each other. |
| # PRs the agent approved | ||
| is:pr label:s/agent-approved | ||
|
|
||
| # PRs where agent found a better fix |
There was a problem hiding this comment.
The comment at line 107 says "PRs where agent found a better fix" but queries for s/agent-fix-pr-picked. This is semantically backwards.
According to the label definitions:
s/agent-fix-win= "AI found a better alternative fix than the PR"s/agent-fix-pr-picked= "AI could not beat the PR fix — PR is the best"
So the query comment should say "PRs where agent could NOT beat the PR fix" or the query should use label:s/agent-fix-win instead.
| # PRs where agent found a better fix | |
| # PRs where agent could NOT beat the PR fix (PR fix was best) |
| - group: SDL_Settings | ||
| - group: AzureDevOps-Artifact-Feeds-Pats | ||
| - ${{ if eq(variables['Build.DefinitionName'], 'dotnet-maui') }}: | ||
| - group: Publish-Build-Assets # This variable group contains secrets to publis to BAR |
There was a problem hiding this comment.
Typo in the comment: "publis" should be "publish". The comment reads "This variable group contains secrets to publis to BAR" but should read "This variable group contains secrets to publish to BAR".
| - group: Publish-Build-Assets # This variable group contains secrets to publis to BAR | |
| - group: Publish-Build-Assets # This variable group contains secrets to publish to BAR |
| $script:ManualLabels = @{ | ||
| 's/agent-fix-implemented' = @{ Description = 'PR author implemented the agent suggested fix'; Color = '7B1FA2' } | ||
| } |
There was a problem hiding this comment.
The PR description mentions TWO manual labels (s/agent-fix-implemented and s/agent-suggestions-implemented), but the code only defines ONE manual label (s/agent-fix-implemented).
The PR description states:
| `s/agent-fix-implemented` | 🟣 `#7B1FA2` | PR author implemented the agent's suggested fix | Maintainer applies when PR author adopts agent's recommendation |
| `s/agent-suggestions-implemented` | 🟣 `#7B1FA2` | PR author implemented the agent's code suggestions | Maintainer applies when PR author adopts agent's recommendation |
However, Update-AgentLabels.ps1 only defines s/agent-fix-implemented (line 39), and the documentation only documents s/agent-fix-implemented (line 50). Either add the second manual label to the code, or remove it from the PR description.
| if ($reportContent -match '(?i)Final\s+Recommendation:\s*APPROVE|✅\s*Final\s+Recommendation:\s*APPROVE') { | ||
| $result.Outcome = 'approved' | ||
| } | ||
| elseif ($reportContent -match '(?i)Final\s+Recommendation:\s*REQUEST.CHANGES|⚠️\s*Final\s+Recommendation:\s*REQUEST.CHANGES') { |
There was a problem hiding this comment.
The regex pattern on line 387 uses REQUEST.CHANGES with a literal dot (.), but the pattern likely intends to match either "REQUEST CHANGES" or "REQUEST_CHANGES".
In regex, . matches any character, so this would also match "REQUESTXCHANGES" or "REQUEST-CHANGES" etc. If the intent is to match a space or underscore, the pattern should be REQUEST[\s_]CHANGES. If the intent is only to match with a space (which seems more likely based on line 384's APPROVE pattern), then it should be REQUEST\s+CHANGES.
| elseif ($reportContent -match '(?i)Final\s+Recommendation:\s*REQUEST.CHANGES|⚠️\s*Final\s+Recommendation:\s*REQUEST.CHANGES') { | |
| elseif ($reportContent -match '(?i)Final\s+Recommendation:\s*REQUEST\s+CHANGES|⚠️\s*Final\s+Recommendation:\s*REQUEST\s+CHANGES') { |
| Gate phase result: 'passed', 'failed', or $null (skipped) | ||
|
|
||
| .PARAMETER FixResult | ||
| Fix phase result: 'win' (PR best), 'lose' (alternative better), or $null (skipped) |
There was a problem hiding this comment.
The parameter documentation comment on line 214 is backwards. It says:
'win' (PR best), 'lose' (alternative better)
But based on the actual logic in the function (lines 259-286) and the label descriptions:
'win'→ appliess/agent-fix-win→ "AI found a better alternative fix than the PR" (agent wins, not PR)'lose'→ appliess/agent-fix-pr-picked→ "AI could not beat the PR fix — PR is the best" (agent loses, not alternative better)
The comment should say: 'win' (agent found better alternative), 'lose' (PR is best) which is already correctly stated on line 219.
| Fix phase result: 'win' (PR best), 'lose' (alternative better), or $null (skipped) | |
| Fix phase result: 'win' (agent found better alternative), 'lose' (PR is best), or $null (skipped) |
Introduce automated agent labeling for PR reviews: add a new shared labeler script (.github/scripts/shared/Update-AgentLabels.ps1) and wire it into Review-PR.ps1 as Phase 4 (Apply Labels). The labeler parses phase content.md files (gate/try-fix/report) to determine outcome, gate and fix signal labels, ensures labels exist, and applies/removes mutually-exclusive outcome/signal labels plus a tracking label (s/agent-reviewed). Add comprehensive docs (.github/docs/agent-labels.md) and update the PR agent SHARED-RULES.md to describe label meanings and expectations. Operations are idempotent and non-fatal; Review-PR.ps1 attempts a targeted recovery if the helper is missing.
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 33986Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 33986" |
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [Microsoft.Extensions.Logging.Debug](https://dot.net/) ([source](https://github.com/dotnet/dotnet)) | nuget | patch | `10.0.3` -> `10.0.4` | | [Microsoft.Maui.Controls](https://github.com/dotnet/maui) | nuget | patch | `10.0.41` -> `10.0.50` | --- ### Release Notes <details> <summary>dotnet/maui (Microsoft.Maui.Controls)</summary> ### [`v10.0.50`](https://github.com/dotnet/maui/releases/tag/10.0.50) [Compare Source](dotnet/maui@10.0.41...10.0.50) #### What's Changed .NET MAUI 10.0.50 introduces significant improvements across all platforms with focus on quality, performance, and developer experience. This release includes 78 commits with various improvements, bug fixes, and enhancements. #### AI - Enable packing and independent preview versioning for Essentials.AI by [@​mattleibow](https://github.com/mattleibow) in dotnet/maui#33976 - Move Essentials.AI preview iteration to eng/Versions.props by [@​mattleibow](https://github.com/mattleibow) in dotnet/maui#34025 - \[Feature] Add Microsoft.Maui.Essentials.AI - Apple Intelligence by [@​mattleibow](https://github.com/mattleibow) in dotnet/maui#33519 #### Ai Agents - Copilot agent infrastructure, emulator reliability, and try-fix workflow improvements by [@​PureWeen](https://github.com/PureWeen) via [@​Copilot](https://github.com/Copilot) in dotnet/maui#33937 - Update PR agent models to claude-sonnet-4.6 and gpt-5.3-codex by [@​kubaflo](https://github.com/kubaflo) in dotnet/maui#34109 - ci-copilot: set pipeline run title early using build.updatebuildnumber by [@​jfversluis](https://github.com/jfversluis) via [@​Copilot](https://github.com/Copilot) in dotnet/maui#34156 - Revamp find-reviewable-pr skill: priorities, defaults, and doc fixes by [@​PureWeen](https://github.com/PureWeen) in dotnet/maui#34160 - Add correct CI pipeline names to Copilot instructions by [@​jfversluis](https://github.com/jfversluis) in dotnet/maui#34255 - Add resilience to UI tests for frozen/unresponsive apps by [@​PureWeen](https://github.com/PureWeen) in dotnet/maui#34023 - Copilot CI: Structured phase outputs, autonomous execution, iOS support, and CI pipeline by [@​kubaflo](https://github.com/kubaflo) in dotnet/maui#34040 - Agent Workflow Metrics via GitHub Labels by [@​kubaflo](https://github.com/kubaflo) in dotnet/maui#33986 #### Animation - \[Android] Fixed TransformProperties issue when a wrapper view is present by [@​Ahamed-Ali](https://github.com/Ahamed-Ali) in dotnet/maui#29228 <...
Agent Workflow Labels
GitHub labels for tracking outcomes of the AI agent PR review workflow (
Review-PR.ps1).All labels use the
s/agent-*prefix for easy querying on GitHub.Label Categories
Outcome Labels
Mutually exclusive — exactly one is applied per PR review run.
s/agent-approved#2E7D32s/agent-changes-requested#E65100s/agent-review-incomplete#B71C1CWhen a new outcome label is applied, any previously applied outcome label is automatically removed.
Signal Labels
Additive — multiple can coexist on a single PR.
s/agent-gate-passed#4CAF50s/agent-gate-failed#FF9800s/agent-fix-win#66BB6As/agent-fix-lose#FF7043Gate labels (
gate-passed/gate-failed) are mutually exclusive with each other. Fix labels (fix-win/fix-lose) are mutually exclusive with each other.Tracking Label
Always applied on every completed agent run.
s/agent-reviewed#1565C0Manual Label
Applied by MAUI maintainers, not by automation.
s/agent-fix-implemented#7B1FA2s/agent-suggestions-implemented#7B1FA2How It Works
Architecture
Labels are applied exclusively from
Review-PR.ps1Phase 4. No other script applies agent labels. This single-source design avoids label conflicts and simplifies debugging.How Labels Are Parsed
The
Parse-PhaseOutcomesfunction inUpdate-AgentLabels.ps1readscontent.mdfiles from each phase directory:gate/content.md**Result:** ✅ PASSEDs/agent-gate-passedgate/content.md**Result:** ❌ FAILEDs/agent-gate-failedtry-fix/content.md**Selected Fix:** Candidate ...s/agent-fix-wintry-fix/content.md**Selected Fix:** PR ...s/agent-fix-losereport/content.mdFinal Recommendation: APPROVEs/agent-approvedreport/content.mdFinal Recommendation: REQUEST CHANGESs/agent-changes-requesteds/agent-review-incompleteSelf-Bootstrapping
Labels are created automatically on first use via
Ensure-LabelExists. No manual setup required. If a label already exists but has a stale description or color, it is updated.Querying Labels
All labels use the
s/agent-*prefix, making them easy to filter on GitHub.Common Queries
Metrics You Can Derive
is:pr label:s/agent-reviewedlabel:s/agent-approvedvslabel:s/agent-changes-requestedcountslabel:s/agent-gate-passedvslabel:s/agent-gate-failedcountslabel:s/agent-fix-winvslabel:s/agent-fix-losecountslabel:s/agent-fix-implemented/label:s/agent-changes-requestedlabel:s/agent-review-incomplete/label:s/agent-reviewedImplementation Details
Files
.github/scripts/shared/Update-AgentLabels.ps1.github/scripts/Review-PR.ps1Apply-AgentLabelsin Phase 4.github/agents/pr/SHARED-RULES.mdKey Functions
Apply-AgentLabelsParse-PhaseOutcomescontent.mdfiles, returns outcome/gate/fix resultsUpdate-AgentOutcomeLabelUpdate-AgentSignalLabelsUpdate-AgentReviewedLabelEnsure-LabelExistsDesign Principles
Review-PR.ps1only — no other scripts touch labelsMigrated From
The following old infrastructure was removed as part of this implementation:
Update-VerificationLabelsfunction inverify-tests-fail.ps1— removed (labels now come fromReview-PR.ps1only)s/ai-reproduction-confirmed/s/ai-reproduction-failedlabels — superseded bys/agent-gate-passed/s/agent-gate-failed