Skip to content

feat(evals): add overall pass rate row to eval nightly summary table#20905

Merged
gundermanc merged 1 commit intomainfrom
gundermanc/trend
Mar 4, 2026
Merged

feat(evals): add overall pass rate row to eval nightly summary table#20905
gundermanc merged 1 commit intomainfrom
gundermanc/trend

Conversation

@gundermanc
Copy link
Member

@gundermanc gundermanc commented Mar 3, 2026

Summary

Adds a row below the header in the markdown table showing the overall pass rate for each historical and current run in the nightly evals script.

Makes it easier to spot regressions across runs, at a glance.

Details

Improves the readability of the aggregate_evals.js script output by quickly surfacing the top-level pass/fail percentage of the current and past test runs.

Related Issues

None.

How to Validate

  1. Provide an artifacts directory to the script or run it locally if you have gh cli access to recent runs.
  2. Observe the summary table includes a new row | **Overall Pass Rate** | ... |.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Size Change: -2 B (0%)

Total Size: 25.8 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 25.3 MB -2 B (0%)
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 221 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 227 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 11.5 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B

compressed-size-action

@gundermanc gundermanc marked this pull request as ready for review March 3, 2026 00:56
@gundermanc gundermanc requested a review from a team as a code owner March 3, 2026 00:56
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the readability of the nightly evaluation summary by introducing an 'Overall Pass Rate' row in the generated markdown table. This addition provides an immediate visual cue for the pass/fail percentage across different test runs, making it significantly easier to identify performance regressions at a glance without delving into individual test results.

Highlights

  • Overall Pass Rate Row: Added a new row to the nightly evaluation summary markdown table, displaying the overall pass rate for each historical and current run.
  • Pass Rate Calculation Refactoring: Introduced a new helper function, getPassRate, to centralize and reuse the logic for calculating pass rates, improving code readability and maintainability.
Changelog
  • scripts/aggregate_evals.js
    • Refactored pass rate calculation into a dedicated getPassRate function.
    • Implemented logic to generate and display a new 'Overall Pass Rate' row in the markdown table.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an 'Overall Pass Rate' row to the markdown table in the nightly evals summary. This is a useful enhancement for quickly assessing performance trends. The implementation refactors the pass rate calculation into a reusable getPassRate function, which is then used to populate the new row for both historical and current runs. The code is clear and effectively implements the desired feature. I found no issues with the changes.

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 3, 2026
Copy link
Collaborator

@abhipatel12 abhipatel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gundermanc gundermanc added this pull request to the merge queue Mar 4, 2026
Merged via the queue into main with commit 5488521 Mar 4, 2026
50 checks passed
@gundermanc gundermanc deleted the gundermanc/trend branch March 4, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants