Skip to content

Agent Workflow Metrics via GitHub Labels#33986

Merged
jfversluis merged 1 commit intodotnet:mainfrom
kubaflo:agent-labels
Mar 3, 2026
Merged

Agent Workflow Metrics via GitHub Labels#33986
jfversluis merged 1 commit intodotnet:mainfrom
kubaflo:agent-labels

Conversation

@kubaflo
Copy link
Copy Markdown
Contributor

@kubaflo kubaflo commented Feb 10, 2026

Agent Workflow Labels

GitHub labels for tracking outcomes of the AI agent PR review workflow (Review-PR.ps1).

All labels use the s/agent-* prefix for easy querying on GitHub.


Label Categories

Outcome Labels

Mutually exclusive — exactly one is applied per PR review run.

Label Color Description Applied When
s/agent-approved 🟢 #2E7D32 AI agent recommends approval — PR fix is correct and optimal Report phase recommends APPROVE
s/agent-changes-requested 🟠 #E65100 AI agent recommends changes — found a better alternative or issues Report phase recommends REQUEST CHANGES
s/agent-review-incomplete 🔴 #B71C1C AI agent could not complete all phases (blocker, timeout, error) Agent exits without completing all phases

When a new outcome label is applied, any previously applied outcome label is automatically removed.

Signal Labels

Additive — multiple can coexist on a single PR.

Label Color Description Applied When
s/agent-gate-passed 🟢 #4CAF50 AI verified tests catch the bug (fail without fix, pass with fix) Gate phase passes
s/agent-gate-failed 🟠 #FF9800 AI could not verify tests catch the bug Gate phase fails
s/agent-fix-win 🟢 #66BB6A AI found a better alternative fix than the PR Fix phase: alternative selected over PR's fix
s/agent-fix-lose 🟠 #FF7043 AI could not beat the PR fix — PR is the best among all candidates Fix phase: PR selected as best after comparison

Gate labels (gate-passed/gate-failed) are mutually exclusive with each other. Fix labels (fix-win/fix-lose) are mutually exclusive with each other.

Tracking Label

Always applied on every completed agent run.

Label Color Description Applied When
s/agent-reviewed 🔵 #1565C0 PR was reviewed by AI agent workflow (full 4-phase review) Every completed agent run

Manual Label

Applied by MAUI maintainers, not by automation.

Label Color Description Applied When
s/agent-fix-implemented 🟣 #7B1FA2 PR author implemented the agent's suggested fix Maintainer applies when PR author adopts agent's recommendation
s/agent-suggestions-implemented 🟣 #7B1FA2 PR author implemented the agent's code suggestions Maintainer applies when PR author adopts agent's recommendation

How It Works

Architecture

Review-PR.ps1
├── Phase 1: PR Agent Review (Copilot CLI)
│   ├── Pre-Flight → writes content.md
│   ├── Gate       → writes content.md
│   ├── Fix        → writes content.md
│   └── Report     → writes content.md
├── Phase 2: PR Finalize (optional)
├── Phase 3: Post Comments (optional)
└── Phase 4: Apply Labels  ← labels are applied here
    ├── Parse content.md files
    ├── Determine outcome + signal labels
    ├── Apply via GitHub REST API
    └── Non-fatal: errors warn but don't fail the workflow

Labels are applied exclusively from Review-PR.ps1 Phase 4. No other script applies agent labels. This single-source design avoids label conflicts and simplifies debugging.

How Labels Are Parsed

The Parse-PhaseOutcomes function in Update-AgentLabels.ps1 reads content.md files from each phase directory:

Source File What's Parsed Resulting Label
gate/content.md **Result:** ✅ PASSED s/agent-gate-passed
gate/content.md **Result:** ❌ FAILED s/agent-gate-failed
try-fix/content.md **Selected Fix:** Candidate ... s/agent-fix-win
try-fix/content.md **Selected Fix:** PR ... s/agent-fix-lose
report/content.md Final Recommendation: APPROVE s/agent-approved
report/content.md Final Recommendation: REQUEST CHANGES s/agent-changes-requested
(missing report) No report file exists s/agent-review-incomplete

Self-Bootstrapping

Labels are created automatically on first use via Ensure-LabelExists. No manual setup required. If a label already exists but has a stale description or color, it is updated.


Querying Labels

All labels use the s/agent-* prefix, making them easy to filter on GitHub.

Common Queries

# PRs the agent approved
is:pr label:s/agent-approved

# PRs where agent found a better fix
is:pr label:s/agent-fix-lose

# PRs where agent found better fix AND author implemented it
is:pr label:s/agent-changes-requested label:s/agent-fix-implemented

# PRs where tests don't catch the bug
is:pr label:s/agent-gate-failed

# Agent-reviewed PRs that are still open
is:pr is:open label:s/agent-reviewed

# All agent-reviewed PRs (total count)
is:pr label:s/agent-reviewed

Metrics You Can Derive

Metric Query
Total agent reviews is:pr label:s/agent-reviewed
Approval rate Compare label:s/agent-approved vs label:s/agent-changes-requested counts
Gate pass rate Compare label:s/agent-gate-passed vs label:s/agent-gate-failed counts
Fix win rate Compare label:s/agent-fix-win vs label:s/agent-fix-lose counts
Agent adoption rate label:s/agent-fix-implemented / label:s/agent-changes-requested
Incomplete review rate label:s/agent-review-incomplete / label:s/agent-reviewed

Implementation Details

Files

File Purpose
.github/scripts/shared/Update-AgentLabels.ps1 Label helper module (all label logic)
.github/scripts/Review-PR.ps1 Orchestrator that calls Apply-AgentLabels in Phase 4
.github/agents/pr/SHARED-RULES.md Documents label system for the PR agent

Key Functions

Function Description
Apply-AgentLabels Main entry point — parses phases and applies all labels
Parse-PhaseOutcomes Reads content.md files, returns outcome/gate/fix results
Update-AgentOutcomeLabel Applies one outcome label, removes conflicting ones
Update-AgentSignalLabels Adds/removes gate and fix signal labels
Update-AgentReviewedLabel Ensures tracking label is present
Ensure-LabelExists Creates or updates a label in the repository

Design Principles

  • Idempotent: Safe to re-run — checks before add/remove, GitHub ignores duplicate adds
  • Non-fatal: Label failures emit warnings but never fail the overall workflow
  • Single source: All labels applied from Review-PR.ps1 only — no other scripts touch labels
  • Self-bootstrapping: Labels are created on first use via GitHub API
  • Mutual exclusivity enforced: Outcome labels and same-category signal labels automatically remove their counterpart

Migrated From

The following old infrastructure was removed as part of this implementation:

  • Update-VerificationLabels function in verify-tests-fail.ps1 — removed (labels now come from Review-PR.ps1 only)
  • s/ai-reproduction-confirmed / s/ai-reproduction-failed labels — superseded by s/agent-gate-passed / s/agent-gate-failed

@kubaflo kubaflo changed the title Add agent workflow metrics labels (s/agent-* prefix) Agent Workflow Metrics via GitHub Labels Feb 10, 2026
@kubaflo kubaflo force-pushed the agent-labels branch 2 times, most recently from de1a7e8 to a530d85 Compare February 26, 2026 00:24
@kubaflo kubaflo marked this pull request as ready for review February 26, 2026 00:24
Copilot AI review requested due to automatic review settings February 26, 2026 00:24
@kubaflo kubaflo added area-ai-agents Copilot CLI agents, agent skills, AI-assisted development copilot labels Feb 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive GitHub label-based metrics system for tracking AI agent PR review workflow outcomes. The system uses s/agent-* prefixed labels to track review outcomes, test verification results, and fix comparison results across the automated PR review pipeline.

Changes:

  • Introduces new label management module (Update-AgentLabels.ps1) with idempotent label operations
  • Adds Phase 4 to Review-PR.ps1 for automatic label application based on phase outcomes
  • Refactors agent output from centralized state files to distributed content.md files per phase
  • Updates all agent instructions and skill documentation to reflect the new phase output artifact structure
  • Removes old label management code (Update-VerificationLabels) in favor of centralized system
  • Adds new CI/Copilot pipeline configuration for automated agent PR reviews
  • Cleans up Azure DevOps variable groups and pipeline configuration

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.github/scripts/shared/Update-AgentLabels.ps1 New module implementing label management with parsing, application, and self-bootstrapping
.github/scripts/Review-PR.ps1 Adds Phase 4 for label application; implements pinned SHA restoration; adds phase output directories
.github/docs/agent-labels.md Comprehensive documentation of the label system, architecture, and usage examples
eng/pipelines/ci-copilot.yml New Azure DevOps pipeline for running Copilot PR reviewer agent with full environment setup
eng/pipelines/common/variables.yml Simplifies variable group structure; removes unused conditional logic
eng/pipelines/common/provision.yml Adds skipCertificates parameter for CI scenarios
.github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 Removes old label management; updates output path to new structure
.github/skills/try-fix/SKILL.md Updates documentation to remove state file references
.github/skills/learn-from-pr/SKILL.md Removes session markdown references
.github/skills/ai-summary-comment/scripts/*.ps1 Updates all scripts to auto-load from PRAgent phase content.md files instead of state files
.github/skills/ai-summary-comment/SKILL.md Documents new auto-loading behavior from phase files
.github/skills/ai-summary-comment/NO-EXTERNAL-REFERENCES-RULE.md Simplifies by removing state file references
.github/skills/ai-summary-comment/IMPROVEMENTS.md Updates terminology from "state file" to "content"
.github/scripts/shared/Start-Emulator.ps1 Improves iOS simulator selection logic for UI test baseline compatibility
.github/scripts/shared/Build-AndDeploy.ps1 Adds logic to shutdown other booted simulators before deployment
.github/scripts/BuildAndRunHostApp.ps1 Adds test artifact collection for screenshots and page source
.github/copilot-instructions.md Updates agent documentation to reflect new output structure
.github/agents/pr/post-gate.md Updates for autonomous execution mode and phase output artifacts
.github/agents/pr/SHARED-RULES.md Major update: documents phase output artifacts and agent label system; changes blocking behavior to autonomous
.github/agents/pr/PLAN-TEMPLATE.md Updates plan template to reflect new phase output requirements
.github/agents/pr.md Removes state file creation steps; updates for phase output artifacts
.github/agents/learn-from-pr.md Removes session markdown references

| `s/agent-fix-win` | 🟢 `#66BB6A` | AI found a better alternative fix than the PR | Fix phase: alternative selected over PR's fix |
| `s/agent-fix-pr-picked` | 🟠 `#FF7043` | AI could not beat the PR fix — PR is the best among all candidates | Fix phase: PR selected as best after comparison |

Gate labels (`gate-passed`/`gate-failed`) are mutually exclusive with each other. Fix labels (`fix-win`/`fix-lose`) are mutually exclusive with each other.
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The label name s/agent-fix-pr-picked in the code does not match the documentation which refers to it as s/agent-fix-lose.

In the documentation at line 34, the table mentions "Fix labels (fix-win/fix-lose)" suggesting the label should be called s/agent-fix-lose, but the actual label defined in Update-AgentLabels.ps1 line 35 is s/agent-fix-pr-picked.

Either the code should use s/agent-fix-lose to match the documentation's naming pattern, or the documentation should be updated to consistently use s/agent-fix-pr-picked. The current mismatch could cause confusion when users try to query these labels.

Suggested change
Gate labels (`gate-passed`/`gate-failed`) are mutually exclusive with each other. Fix labels (`fix-win`/`fix-lose`) are mutually exclusive with each other.
Gate labels (`gate-passed`/`gate-failed`) are mutually exclusive with each other. Fix labels (`fix-win`/`fix-pr-picked`) are mutually exclusive with each other.

Copilot uses AI. Check for mistakes.
# PRs the agent approved
is:pr label:s/agent-approved

# PRs where agent found a better fix
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 107 says "PRs where agent found a better fix" but queries for s/agent-fix-pr-picked. This is semantically backwards.

According to the label definitions:

  • s/agent-fix-win = "AI found a better alternative fix than the PR"
  • s/agent-fix-pr-picked = "AI could not beat the PR fix — PR is the best"

So the query comment should say "PRs where agent could NOT beat the PR fix" or the query should use label:s/agent-fix-win instead.

Suggested change
# PRs where agent found a better fix
# PRs where agent could NOT beat the PR fix (PR fix was best)

Copilot uses AI. Check for mistakes.
- group: SDL_Settings
- group: AzureDevOps-Artifact-Feeds-Pats
- ${{ if eq(variables['Build.DefinitionName'], 'dotnet-maui') }}:
- group: Publish-Build-Assets # This variable group contains secrets to publis to BAR
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the comment: "publis" should be "publish". The comment reads "This variable group contains secrets to publis to BAR" but should read "This variable group contains secrets to publish to BAR".

Suggested change
- group: Publish-Build-Assets # This variable group contains secrets to publis to BAR
- group: Publish-Build-Assets # This variable group contains secrets to publish to BAR

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +40
$script:ManualLabels = @{
's/agent-fix-implemented' = @{ Description = 'PR author implemented the agent suggested fix'; Color = '7B1FA2' }
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions TWO manual labels (s/agent-fix-implemented and s/agent-suggestions-implemented), but the code only defines ONE manual label (s/agent-fix-implemented).

The PR description states:

| `s/agent-fix-implemented` | 🟣 `#7B1FA2` | PR author implemented the agent's suggested fix | Maintainer applies when PR author adopts agent's recommendation |
| `s/agent-suggestions-implemented` | 🟣 `#7B1FA2` | PR author implemented the agent's code suggestions | Maintainer applies when PR author adopts agent's recommendation |

However, Update-AgentLabels.ps1 only defines s/agent-fix-implemented (line 39), and the documentation only documents s/agent-fix-implemented (line 50). Either add the second manual label to the code, or remove it from the PR description.

Copilot uses AI. Check for mistakes.
if ($reportContent -match '(?i)Final\s+Recommendation:\s*APPROVE|✅\s*Final\s+Recommendation:\s*APPROVE') {
$result.Outcome = 'approved'
}
elseif ($reportContent -match '(?i)Final\s+Recommendation:\s*REQUEST.CHANGES|⚠️\s*Final\s+Recommendation:\s*REQUEST.CHANGES') {
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern on line 387 uses REQUEST.CHANGES with a literal dot (.), but the pattern likely intends to match either "REQUEST CHANGES" or "REQUEST_CHANGES".

In regex, . matches any character, so this would also match "REQUESTXCHANGES" or "REQUEST-CHANGES" etc. If the intent is to match a space or underscore, the pattern should be REQUEST[\s_]CHANGES. If the intent is only to match with a space (which seems more likely based on line 384's APPROVE pattern), then it should be REQUEST\s+CHANGES.

Suggested change
elseif ($reportContent -match '(?i)Final\s+Recommendation:\s*REQUEST.CHANGES|⚠️\s*Final\s+Recommendation:\s*REQUEST.CHANGES') {
elseif ($reportContent -match '(?i)Final\s+Recommendation:\s*REQUEST\s+CHANGES|⚠️\s*Final\s+Recommendation:\s*REQUEST\s+CHANGES') {

Copilot uses AI. Check for mistakes.
Gate phase result: 'passed', 'failed', or $null (skipped)

.PARAMETER FixResult
Fix phase result: 'win' (PR best), 'lose' (alternative better), or $null (skipped)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter documentation comment on line 214 is backwards. It says:

  • 'win' (PR best), 'lose' (alternative better)

But based on the actual logic in the function (lines 259-286) and the label descriptions:

  • 'win' → applies s/agent-fix-win → "AI found a better alternative fix than the PR" (agent wins, not PR)
  • 'lose' → applies s/agent-fix-pr-picked → "AI could not beat the PR fix — PR is the best" (agent loses, not alternative better)

The comment should say: 'win' (agent found better alternative), 'lose' (PR is best) which is already correctly stated on line 219.

Suggested change
Fix phase result: 'win' (PR best), 'lose' (alternative better), or $null (skipped)
Fix phase result: 'win' (agent found better alternative), 'lose' (PR is best), or $null (skipped)

Copilot uses AI. Check for mistakes.
Introduce automated agent labeling for PR reviews: add a new shared labeler script (.github/scripts/shared/Update-AgentLabels.ps1) and wire it into Review-PR.ps1 as Phase 4 (Apply Labels). The labeler parses phase content.md files (gate/try-fix/report) to determine outcome, gate and fix signal labels, ensures labels exist, and applies/removes mutually-exclusive outcome/signal labels plus a tracking label (s/agent-reviewed). Add comprehensive docs (.github/docs/agent-labels.md) and update the PR agent SHARED-RULES.md to describe label meanings and expectations. Operations are idempotent and non-fatal; Review-PR.ps1 attempts a targeted recovery if the helper is missing.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 33986

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 33986"

@jfversluis jfversluis merged commit c5fadf8 into dotnet:main Mar 3, 2026
4 of 5 checks passed
evgenygunko pushed a commit to evgenygunko/CopyWordsDA that referenced this pull request Mar 11, 2026
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [Microsoft.Extensions.Logging.Debug](https://dot.net/) ([source](https://github.com/dotnet/dotnet)) | nuget | patch | `10.0.3` -> `10.0.4` |
| [Microsoft.Maui.Controls](https://github.com/dotnet/maui) | nuget | patch | `10.0.41` -> `10.0.50` |

---

### Release Notes

<details>
<summary>dotnet/maui (Microsoft.Maui.Controls)</summary>

### [`v10.0.50`](https://github.com/dotnet/maui/releases/tag/10.0.50)

[Compare Source](dotnet/maui@10.0.41...10.0.50)

#### What's Changed

.NET MAUI 10.0.50 introduces significant improvements across all platforms with focus on quality, performance, and developer experience. This release includes 78 commits with various improvements, bug fixes, and enhancements.

#### AI

-   Enable packing and independent preview versioning for Essentials.AI by [@&#8203;mattleibow](https://github.com/mattleibow) in dotnet/maui#33976

-   Move Essentials.AI preview iteration to eng/Versions.props by [@&#8203;mattleibow](https://github.com/mattleibow) in dotnet/maui#34025

-   \[Feature] Add Microsoft.Maui.Essentials.AI - Apple Intelligence by [@&#8203;mattleibow](https://github.com/mattleibow) in dotnet/maui#33519

#### Ai Agents

-   Copilot agent infrastructure, emulator reliability, and try-fix workflow improvements by [@&#8203;PureWeen](https://github.com/PureWeen) via [@&#8203;Copilot](https://github.com/Copilot) in dotnet/maui#33937

-   Update PR agent models to claude-sonnet-4.6 and gpt-5.3-codex by [@&#8203;kubaflo](https://github.com/kubaflo) in dotnet/maui#34109

-   ci-copilot: set pipeline run title early using build.updatebuildnumber by [@&#8203;jfversluis](https://github.com/jfversluis) via [@&#8203;Copilot](https://github.com/Copilot) in dotnet/maui#34156

-   Revamp find-reviewable-pr skill: priorities, defaults, and doc fixes by [@&#8203;PureWeen](https://github.com/PureWeen) in dotnet/maui#34160

-   Add correct CI pipeline names to Copilot instructions by [@&#8203;jfversluis](https://github.com/jfversluis) in dotnet/maui#34255

-   Add resilience to UI tests for frozen/unresponsive apps by [@&#8203;PureWeen](https://github.com/PureWeen) in dotnet/maui#34023

-   Copilot CI: Structured phase outputs, autonomous execution, iOS support, and CI pipeline by [@&#8203;kubaflo](https://github.com/kubaflo) in dotnet/maui#34040

-   Agent Workflow Metrics via GitHub Labels by [@&#8203;kubaflo](https://github.com/kubaflo) in dotnet/maui#33986

#### Animation

-   \[Android] Fixed TransformProperties issue when a wrapper view is present by [@&#8203;Ahamed-Ali](https://github.com/Ahamed-Ali) in dotnet/maui#29228

    <...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development copilot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants