Skip to content

Add automated performance regression monitoring and issue filing#4925

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/automate-bug-filing-for-bpf
Draft

Add automated performance regression monitoring and issue filing#4925
Copilot wants to merge 4 commits intomainfrom
copilot/automate-bug-filing-for-bpf

Conversation

Copy link
Contributor

Copilot AI commented Jan 21, 2026

Implementation plan for automated perf regression monitoring

  • Analyze existing workflows and understand architecture
  • Create documentation file docs/PerfRegressionAutoIssueFilingProposal.md
  • Create new scheduled workflow .github/workflows/monitor-perf-regressions.yml
    • Add schedule trigger (every 6 hours)
    • Add workflow_dispatch for manual runs
    • Configure Azure login and KeyVault secret fetching
    • Implement per-platform regression query execution
    • Implement stale data detection (>3 days)
    • Implement issue upsert logic with HTML markers
    • Add noise controls (configurable max_sigma, percent-delta gating)
  • Test workflow syntax and structure
  • Address initial code review feedback (timezone handling)
  • Address first round PR review feedback:
    • Add error handling flags to curl command
    • Add version comment to azure/login action
    • Use parameterized queries for stale data check
    • Remove unused search_query variable
    • Replace bare except with specific Exception
    • Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
  • Address second round PR review feedback:
    • Remove unused re import
    • Fix percent delta calculation to handle negative means correctly
    • Use textwrap.dedent() for clean multi-line string formatting
    • Fix comment formatting for cleaner issue/comment bodies
  • All deliverables complete and validated

Original prompt


This section details on the original issue you should resolve

Automatically file bugs if BPF performance changes by more than 2σ
This issue tracks adding automatic monitoring + issue filing for the perf metrics shown on the Perf Dashboard.

Perf results are already uploaded to Postgres and a regression query is already executed in CI (see .github/workflows/upload-perf-results.yml), but:

  • There is no scheduled workflow that runs independently to catch regressions / detect “no recent data”.
  • There is no automation to create/update GitHub issues per regressed metric.

Proposal doc: docs/PerfRegressionAutoIssueFilingProposal.md


Requirements

  • Create a scheduled workflow that executes the regression query.
  • For each metric outside threshold, either file a new issue or update the currently active issue.
  • If the most recent run is more than 3 days old, file a new issue or update an existing issue.

Existing building blocks

  • .github/workflows/upload-perf-results.yml already:

    • Logs into Azure and fetches Postgres connection secrets from KeyVault.
    • Downloads and runs check_perf_results.sql from microsoft/bpf_performance.
    • Produces results/regression_results.csv.
  • check_perf_results.sql parameters:

    • platform (default Windows 2019)
    • repository (default microsoft/ebpf-for-windows)
    • look_back (default 30 days)
    • max_sigma (default 2)
  • check_perf_results.sql output columns:

    • timestamp, metric, value, mean_value, stddev_value

Related noise sensitivity issue: #4115.


Concrete plan (implementable)

1) Add scheduled workflow: .github/workflows/monitor-perf-regressions.yml

  • Trigger:
    • schedule (e.g. every 6 hours or daily)
    • workflow_dispatch
  • Must be independent from perf upload workflows.
  • Use the same Azure KeyVault secrets used by upload-perf-results.yml:
    • PGDATABASE, PGHOST, PGUSER, PGPASSWORD, PGPORT
  • Run on ubuntu-latest.

2) Query regressions per platform

  • Configure a platform list (same platforms used by perf runs), e.g.:
    • Windows 2019
    • Windows 2022

For each platform:

curl https://raw.githubusercontent.com/microsoft/bpf_performance/refs/heads/main/scripts/check_perf_results.sql > check_perf_results.sql
psql -f ./check_perf_results.sql \
  -v platform='Windows 2019' \
  -v repository='microsoft/ebpf-for-windows' \
  -v look_back='30 days' \
  -v max_sigma='3' \
  --csv > regression_results.csv

3) Detect stale data (no recent results)

For each platform, run:

SELECT MAX("timestamp") AS last_run
FROM benchmarkresults
WHERE platform = :'platform'
  AND repository = :'repository';

If NOW() - last_run > INTERVAL '3 days', upsert a single issue per platform:

  • Title: Perf: no recent results for
  • Body: include the last seen timestamp and the dashboard link.

4) Upsert issues per (platform, metric)

To prevent duplicates, embed a stable marker in the issue body:


Upsert behavior:

  • If an open issue exists with the marker:
    • Add a new comment containing latest timestamp/value/mean/stddev (+ computed z-score and percent delta).
  • Else:
    • Create a new issue:
      • Title: Perf regression: :
      • Labels: tests (and optionally a new perf-regression label)
      • Body: dashboard link + stats table + marker.

Implementation suggestion:

  • Use actions/github-script or a small Python script invoked by the workflow to:
    • Parse regression_results.csv.
    • Search open issues for the marker.
    • Create issues/comments.

5) Reduce false positives / noise

The default max_sigma=2 can be too sensitive (#4115).

MVP recommended policy (configurable via workflow env vars):

  • Use max_sigma=3 by default in the scheduled monitor workflow.
  • Add optional minimum effect size gating before filing/updating:
    • abs_percent_delta = 100 * |value-mean| / |mean| >= MIN_PERCENT_DELTA (e.g. 5%).

Optional follow-up: require the same regression to appear in 2 consecutive scheduled runs before filing.


Deliverables checklist

  • New scheduled workflow .github/workflows/monitor-perf-regressions.yml
  • Per-platform stale-run detection and issue upsert (last run > 3 days)
  • Regression query execution per platform and CSV parsing
  • Per-(platform, metric) issue upsert using the embedded marker
  • Noise controls (configurable max_sigma, optional percent-delta gating)
  • Documentation: docs/PerfRegressionAutoIssueFilingProposal.md

Comments on the Issue (you are @copilot in this section)

@Alan-Jowett
I wrote a concrete implementation proposal in docs/PerfRegressionAutoIssueFilingProposal.md and updated the is...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Fixes #3981

Copilot AI changed the title [WIP] Add automated bug filing for BPF performance changes Add automated performance regression monitoring and issue filing Jan 21, 2026
Copilot AI requested a review from Alan-Jowett January 21, 2026 19:31
@Alan-Jowett Alan-Jowett requested a review from Copilot January 21, 2026 19:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated performance regression monitoring that runs on a schedule (every 6 hours) to detect performance regressions and stale data, automatically filing and updating GitHub issues when problems are detected. This addresses the issue that performance results are uploaded to PostgreSQL but regressions go unnoticed between CI runs.

Changes:

  • Adds scheduled workflow .github/workflows/monitor-perf-regressions.yml that queries the database per platform, detects anomalies using statistical analysis (3σ threshold by default), and files/updates GitHub issues
  • Implements stale data detection (>3 days) with dedicated issue filing/updating
  • Includes configurable noise controls (max_sigma=3, optional min_percent_delta=5%) to reduce false positives per issue #4115
  • Adds comprehensive documentation in docs/PerfRegressionAutoIssueFilingProposal.md

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
.github/workflows/monitor-perf-regressions.yml New scheduled workflow that queries PostgreSQL for regressions across platforms, detects stale data, and manages GitHub issues with embedded markers to prevent duplicates
docs/PerfRegressionAutoIssueFilingProposal.md Documentation describing the architecture, implementation details, noise controls, security considerations, and future enhancements for the automated regression monitoring system

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Alan-Jowett
Copy link
Member

@copilot address and reply to each comment with what you fixed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Alan-Jowett
Copy link
Member

@copilot address and reply to each comment with what you fixed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Alan-Jowett Alan-Jowett force-pushed the copilot/automate-bug-filing-for-bpf branch from 3e9e2c7 to 399d2fd Compare January 27, 2026 17:15
@Alan-Jowett Alan-Jowett marked this pull request as ready for review January 27, 2026 17:15
Copilot AI and others added 4 commits February 2, 2026 10:47
Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
…ting, improve percent delta calculation

Co-authored-by: Alan-Jowett <20480683+Alan-Jowett@users.noreply.github.com>
@Alan-Jowett Alan-Jowett force-pushed the copilot/automate-bug-filing-for-bpf branch from 399d2fd to d03d6d1 Compare February 2, 2026 18:47
STALE_DATA_THRESHOLD_DAYS: '3'
REPOSITORY: 'microsoft/ebpf-for-windows'
# List of platforms to monitor (must match names in database)
PLATFORMS: '["Windows 2019", "Lab Windows 2022"]'
Copy link
Contributor

@matthewige matthewige Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Where are these platforms coming from? I believe we do not run on tests on 2019 anymore, do we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - Can you review this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right to question this. The platforms in line 40 were based on what I found in the existing upload-perf-results.yml workflow:

  1. "Windows 2019" - This is the default platform (line 20 in upload-perf-results.yml) used by the upload_perf_results job. However, the actual performance job runs on ebpf_cicd_perf_ws2022 environment.

  2. "Lab Windows 2022" - This is explicitly set by the upload_netperf_results_lab_2022 job.

There appears to be a mismatch: the km_performance tests run on WS2022 but upload with the default platform name "Windows 2019".

Could you clarify:

  • Should the upload_perf_results job in cicd.yml specify platform: "Windows 2022" (or another name)?
  • Or should the monitoring workflow only track "Lab Windows 2022" since that's the only one with an explicit platform name?
  • What platform name(s) actually exist in the PostgreSQL database for this repository?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Alan-Jowett - Do you know what OS versions the perf tests actually run on?

@Alan-Jowett
Copy link
Member

Convert to draft until this is in the current milestone

@Alan-Jowett Alan-Jowett marked this pull request as draft February 3, 2026 18:19
auto-merge was automatically disabled February 3, 2026 18:19

Pull request was converted to draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

Automatically file bugs if BPF performance changes by more than 2σ

4 participants