Add automated performance regression monitoring and issue filing by Copilot · Pull Request #4925 · microsoft/ebpf-for-windows

Copilot · 2026-01-21T19:22:27Z

Implementation plan for automated perf regression monitoring

Original prompt

This section details on the original issue you should resolve

Automatically file bugs if BPF performance changes by more than 2σ
This issue tracks adding automatic monitoring + issue filing for the perf metrics shown on the Perf Dashboard.

Perf results are already uploaded to Postgres and a regression query is already executed in CI (see .github/workflows/upload-perf-results.yml), but:

There is no scheduled workflow that runs independently to catch regressions / detect “no recent data”.
There is no automation to create/update GitHub issues per regressed metric.

Proposal doc: docs/PerfRegressionAutoIssueFilingProposal.md

Requirements

Create a scheduled workflow that executes the regression query.
For each metric outside threshold, either file a new issue or update the currently active issue.
If the most recent run is more than 3 days old, file a new issue or update an existing issue.

Existing building blocks

.github/workflows/upload-perf-results.yml already:
- Logs into Azure and fetches Postgres connection secrets from KeyVault.
- Downloads and runs check_perf_results.sql from microsoft/bpf_performance.
- Produces results/regression_results.csv.
check_perf_results.sql parameters:
- platform (default Windows 2019)
- repository (default microsoft/ebpf-for-windows)
- look_back (default 30 days)
- max_sigma (default 2)
check_perf_results.sql output columns:
- timestamp, metric, value, mean_value, stddev_value

Related noise sensitivity issue: #4115.

Concrete plan (implementable)

1) Add scheduled workflow: `.github/workflows/monitor-perf-regressions.yml`

Trigger:
- schedule (e.g. every 6 hours or daily)
- workflow_dispatch
Must be independent from perf upload workflows.
Use the same Azure KeyVault secrets used by upload-perf-results.yml:
- PGDATABASE, PGHOST, PGUSER, PGPASSWORD, PGPORT
Run on ubuntu-latest.

2) Query regressions per platform

Configure a platform list (same platforms used by perf runs), e.g.:
- Windows 2019
- Windows 2022

For each platform:

curl https://raw.githubusercontent.com/microsoft/bpf_performance/refs/heads/main/scripts/check_perf_results.sql &gt; check_perf_results.sql
psql -f ./check_perf_results.sql \
  -v platform=&#39;Windows 2019&#39; \
  -v repository=&#39;microsoft/ebpf-for-windows&#39; \
  -v look_back=&#39;30 days&#39; \
  -v max_sigma=&#39;3&#39; \
  --csv &gt; regression_results.csv

3) Detect stale data (no recent results)

For each platform, run:

SELECT MAX(&#34;timestamp&#34;) AS last_run
FROM benchmarkresults
WHERE platform = :&#39;platform&#39;
  AND repository = :&#39;repository&#39;;

If NOW() - last_run > INTERVAL '3 days', upsert a single issue per platform:

Title: Perf: no recent results for
Body: include the last seen timestamp and the dashboard link.

4) Upsert issues per (platform, metric)

To prevent duplicates, embed a stable marker in the issue body:

Upsert behavior:

If an open issue exists with the marker:
- Add a new comment containing latest timestamp/value/mean/stddev (+ computed z-score and percent delta).
Else:
- Create a new issue:
  - Title: Perf regression: :
  - Labels: tests (and optionally a new perf-regression label)
  - Body: dashboard link + stats table + marker.

Implementation suggestion:

Use actions/github-script or a small Python script invoked by the workflow to:
- Parse regression_results.csv.
- Search open issues for the marker.
- Create issues/comments.

5) Reduce false positives / noise

The default max_sigma=2 can be too sensitive (#4115).

MVP recommended policy (configurable via workflow env vars):

Use max_sigma=3 by default in the scheduled monitor workflow.
Add optional minimum effect size gating before filing/updating:
- abs_percent_delta = 100 * |value-mean| / |mean| >= MIN_PERCENT_DELTA (e.g. 5%).

Optional follow-up: require the same regression to appear in 2 consecutive scheduled runs before filing.

Deliverables checklist

New scheduled workflow .github/workflows/monitor-perf-regressions.yml
Per-platform stale-run detection and issue upsert (last run > 3 days)
Regression query execution per platform and CSV parsing
Per-(platform, metric) issue upsert using the embedded marker
Noise controls (configurable max_sigma, optional percent-delta gating)
Documentation: docs/PerfRegressionAutoIssueFilingProposal.md

Comments on the Issue (you are @copilot in this section)

@Alan-Jowett
I wrote a concrete implementation proposal in docs/PerfRegressionAutoIssueFilingProposal.md and updated the is...

Fixes Automatically file bugs if BPF performance changes by more than 2σ #3981

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Fixes #3981

Copilot

Pull request overview

This PR adds automated performance regression monitoring that runs on a schedule (every 6 hours) to detect performance regressions and stale data, automatically filing and updating GitHub issues when problems are detected. This addresses the issue that performance results are uploaded to PostgreSQL but regressions go unnoticed between CI runs.

Changes:

Adds scheduled workflow .github/workflows/monitor-perf-regressions.yml that queries the database per platform, detects anomalies using statistical analysis (3σ threshold by default), and files/updates GitHub issues
Implements stale data detection (>3 days) with dedicated issue filing/updating
Includes configurable noise controls (max_sigma=3, optional min_percent_delta=5%) to reduce false positives per issue #4115
Adds comprehensive documentation in docs/PerfRegressionAutoIssueFilingProposal.md

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`.github/workflows/monitor-perf-regressions.yml`	New scheduled workflow that queries PostgreSQL for regressions across platforms, detects stale data, and manages GitHub issues with embedded markers to prevent duplicates
`docs/PerfRegressionAutoIssueFilingProposal.md`	Documentation describing the architecture, implementation details, noise controls, security considerations, and future enhancements for the automated regression monitoring system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/monitor-perf-regressions.yml

Alan-Jowett · 2026-01-21T20:16:46Z

@copilot address and reply to each comment with what you fixed.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/monitor-perf-regressions.yml

Alan-Jowett · 2026-01-21T20:41:34Z

@copilot address and reply to each comment with what you fixed

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>

…ting, improve percent delta calculation Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>

matthewige · 2026-02-02T23:09:51Z

.github/workflows/monitor-perf-regressions.yml

+      STALE_DATA_THRESHOLD_DAYS: '3'
+      REPOSITORY: 'microsoft/ebpf-for-windows'
+      # List of platforms to monitor (must match names in database)
+      PLATFORMS: '["Windows 2019", "Lab Windows 2022"]'


@copilot - Where are these platforms coming from? I believe we do not run on tests on 2019 anymore, do we?

@copilot - Can you review this?

You're right to question this. The platforms in line 40 were based on what I found in the existing upload-perf-results.yml workflow:

"Windows 2019" - This is the default platform (line 20 in upload-perf-results.yml) used by the upload_perf_results job. However, the actual performance job runs on ebpf_cicd_perf_ws2022 environment.

"Lab Windows 2022" - This is explicitly set by the upload_netperf_results_lab_2022 job.

There appears to be a mismatch: the km_performance tests run on WS2022 but upload with the default platform name "Windows 2019".

Could you clarify:

Should the upload_perf_results job in cicd.yml specify platform: "Windows 2022" (or another name)?

Or should the monitoring workflow only track "Lab Windows 2022" since that's the only one with an explicit platform name?

What platform name(s) actually exist in the PostgreSQL database for this repository?

@Alan-Jowett - Do you know what OS versions the perf tests actually run on?

Alan-Jowett · 2026-02-03T18:19:27Z

Convert to draft until this is in the current milestone

Copilot AI assigned Copilot and Alan-Jowett Jan 21, 2026

Copilot started work on behalf of Alan-Jowett January 21, 2026 19:23 View session

Copilot AI changed the title ~~[WIP] Add automated bug filing for BPF performance changes~~ Add automated performance regression monitoring and issue filing Jan 21, 2026

Copilot AI requested a review from Alan-Jowett January 21, 2026 19:31

Copilot finished work on behalf of Alan-Jowett January 21, 2026 19:31

Alan-Jowett requested a review from Copilot January 21, 2026 19:53

Copilot started reviewing on behalf of Alan-Jowett January 21, 2026 19:54 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Copilot started work on behalf of Alan-Jowett January 21, 2026 20:17 View session

Copilot finished work on behalf of Alan-Jowett January 21, 2026 20:19

Alan-Jowett requested a review from Copilot January 21, 2026 20:33

Copilot started reviewing on behalf of Alan-Jowett January 21, 2026 20:34 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Copilot started work on behalf of Alan-Jowett January 21, 2026 20:42 View session

Copilot finished work on behalf of Alan-Jowett January 21, 2026 20:44

Alan-Jowett requested a review from Copilot January 21, 2026 20:46

Copilot started reviewing on behalf of Alan-Jowett January 21, 2026 20:47 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Alan-Jowett force-pushed the copilot/automate-bug-filing-for-bpf branch from 3e9e2c7 to 399d2fd Compare January 27, 2026 17:15

Alan-Jowett marked this pull request as ready for review January 27, 2026 17:15

Alan-Jowett requested review from LakshK98, dthaler, matthewige, mikeagun, mtfriesen, poornagmsft and saxena-anurag as code owners January 27, 2026 17:15

Alan-Jowett requested a review from shankarseal as a code owner January 27, 2026 17:15

Alan-Jowett approved these changes Jan 28, 2026

View reviewed changes

Alan-Jowett added this to the 2604 milestone Jan 28, 2026

github-project-automation bot added this to eBPF for Windows Triage Jan 28, 2026

github-project-automation bot moved this to Todo in eBPF for Windows Triage Jan 28, 2026

Alan-Jowett enabled auto-merge January 28, 2026 22:49

Copilot AI and others added 4 commits February 2, 2026 10:47

Add scheduled workflow for perf regression monitoring and documentation

4e88298

Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>

Fix timezone handling in stale data detection

aeab269

Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>

Address code review feedback: improve error handling and code quality

840a511

Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>

Address second round of code review: remove unused import, fix format…

d03d6d1

…ting, improve percent delta calculation Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>

Alan-Jowett force-pushed the copilot/automate-bug-filing-for-bpf branch from 399d2fd to d03d6d1 Compare February 2, 2026 18:47

matthewige reviewed Feb 2, 2026

View reviewed changes

kumarvin123 approved these changes Feb 2, 2026

View reviewed changes

Copilot started work on behalf of matthewige February 2, 2026 23:22 View session

Copilot AI requested a review from matthewige February 2, 2026 23:24

Copilot finished work on behalf of matthewige February 2, 2026 23:24

Alan-Jowett marked this pull request as draft February 3, 2026 18:19

auto-merge was automatically disabled February 3, 2026 18:19
Pull request was converted to draft

Conversation

Copilot AI commented Jan 21, 2026 • edited by Alan-Jowett Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation plan for automated perf regression monitoring

Requirements

Existing building blocks

Concrete plan (implementable)

1) Add scheduled workflow: .github/workflows/monitor-perf-regressions.yml

2) Query regressions per platform

3) Detect stale data (no recent results)

4) Upsert issues per (platform, metric)

5) Reduce false positives / noise

Deliverables checklist

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alan-Jowett commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alan-Jowett commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

matthewige Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewige Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

matthewige Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Alan-Jowett commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Jan 21, 2026 •

edited by Alan-Jowett

Loading

1) Add scheduled workflow: `.github/workflows/monitor-perf-regressions.yml`

matthewige Feb 2, 2026 •

edited

Loading