Add automated performance regression monitoring and issue filing#4925
Add automated performance regression monitoring and issue filing#4925
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds automated performance regression monitoring that runs on a schedule (every 6 hours) to detect performance regressions and stale data, automatically filing and updating GitHub issues when problems are detected. This addresses the issue that performance results are uploaded to PostgreSQL but regressions go unnoticed between CI runs.
Changes:
- Adds scheduled workflow
.github/workflows/monitor-perf-regressions.ymlthat queries the database per platform, detects anomalies using statistical analysis (3σ threshold by default), and files/updates GitHub issues - Implements stale data detection (>3 days) with dedicated issue filing/updating
- Includes configurable noise controls (max_sigma=3, optional min_percent_delta=5%) to reduce false positives per issue #4115
- Adds comprehensive documentation in
docs/PerfRegressionAutoIssueFilingProposal.md
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
.github/workflows/monitor-perf-regressions.yml |
New scheduled workflow that queries PostgreSQL for regressions across platforms, detects stale data, and manages GitHub issues with embedded markers to prevent duplicates |
docs/PerfRegressionAutoIssueFilingProposal.md |
Documentation describing the architecture, implementation details, noise controls, security considerations, and future enhancements for the automated regression monitoring system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot address and reply to each comment with what you fixed. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@copilot address and reply to each comment with what you fixed |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3e9e2c7 to
399d2fd
Compare
Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
…ting, improve percent delta calculation Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
399d2fd to
d03d6d1
Compare
| STALE_DATA_THRESHOLD_DAYS: '3' | ||
| REPOSITORY: 'microsoft/ebpf-for-windows' | ||
| # List of platforms to monitor (must match names in database) | ||
| PLATFORMS: '["Windows 2019", "Lab Windows 2022"]' |
There was a problem hiding this comment.
@copilot - Where are these platforms coming from? I believe we do not run on tests on 2019 anymore, do we?
There was a problem hiding this comment.
You're right to question this. The platforms in line 40 were based on what I found in the existing upload-perf-results.yml workflow:
-
"Windows 2019" - This is the default platform (line 20 in upload-perf-results.yml) used by the
upload_perf_resultsjob. However, the actualperformancejob runs onebpf_cicd_perf_ws2022environment. -
"Lab Windows 2022" - This is explicitly set by the
upload_netperf_results_lab_2022job.
There appears to be a mismatch: the km_performance tests run on WS2022 but upload with the default platform name "Windows 2019".
Could you clarify:
- Should the
upload_perf_resultsjob in cicd.yml specifyplatform: "Windows 2022"(or another name)? - Or should the monitoring workflow only track "Lab Windows 2022" since that's the only one with an explicit platform name?
- What platform name(s) actually exist in the PostgreSQL database for this repository?
There was a problem hiding this comment.
@Alan-Jowett - Do you know what OS versions the perf tests actually run on?
|
Convert to draft until this is in the current milestone |
Pull request was converted to draft
Implementation plan for automated perf regression monitoring
docs/PerfRegressionAutoIssueFilingProposal.md.github/workflows/monitor-perf-regressions.ymlreimportOriginal prompt
This section details on the original issue you should resolve
Automatically file bugs if BPF performance changes by more than 2σ
This issue tracks adding automatic monitoring + issue filing for the perf metrics shown on the Perf Dashboard.
Perf results are already uploaded to Postgres and a regression query is already executed in CI (see
.github/workflows/upload-perf-results.yml), but:Proposal doc:
docs/PerfRegressionAutoIssueFilingProposal.mdRequirements
Existing building blocks
.github/workflows/upload-perf-results.ymlalready:check_perf_results.sqlfrommicrosoft/bpf_performance.results/regression_results.csv.check_perf_results.sqlparameters:platform(defaultWindows 2019)repository(defaultmicrosoft/ebpf-for-windows)look_back(default30 days)max_sigma(default2)check_perf_results.sqloutput columns:timestamp,metric,value,mean_value,stddev_valueRelated noise sensitivity issue: #4115.
Concrete plan (implementable)
1) Add scheduled workflow:
.github/workflows/monitor-perf-regressions.ymlschedule(e.g. every 6 hours or daily)workflow_dispatchupload-perf-results.yml:PGDATABASE,PGHOST,PGUSER,PGPASSWORD,PGPORTubuntu-latest.2) Query regressions per platform
Windows 2019Windows 2022For each platform:
3) Detect stale data (no recent results)
For each platform, run:
If
NOW() - last_run > INTERVAL '3 days', upsert a single issue per platform:Perf: no recent results for4) Upsert issues per (platform, metric)
To prevent duplicates, embed a stable marker in the issue body:
Upsert behavior:
Perf regression: :tests(and optionally a newperf-regressionlabel)Implementation suggestion:
actions/github-scriptor a small Python script invoked by the workflow to:regression_results.csv.5) Reduce false positives / noise
The default
max_sigma=2can be too sensitive (#4115).MVP recommended policy (configurable via workflow env vars):
max_sigma=3by default in the scheduled monitor workflow.abs_percent_delta = 100 * |value-mean| / |mean| >= MIN_PERCENT_DELTA(e.g. 5%).Optional follow-up: require the same regression to appear in 2 consecutive scheduled runs before filing.
Deliverables checklist
.github/workflows/monitor-perf-regressions.ymlmax_sigma, optional percent-delta gating)docs/PerfRegressionAutoIssueFilingProposal.mdComments on the Issue (you are @copilot in this section)
@Alan-Jowett
I wrote a concrete implementation proposal in
docs/PerfRegressionAutoIssueFilingProposal.mdand updated the is...✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
Fixes #3981