Add slash command for promoting behavioral evals to CI blocking by gundermanc · Pull Request #20575 · google-gemini/gemini-cli

gundermanc · 2026-02-27T16:55:10Z

Summary

Updates docs to indicate that tests must start as non-CI blocking and are to be promoted to CI blocking only via the new slash command. The slash command will utilize the historical record of nightly eval runs to identify and select tests that are highly reliable for promotion to CI blocking.

Updated docs clarify that every test should start out USUALLY_PASSES (not CI blocking).
Added a slash command which automatically reviews the nightly runs history and promotes only the set of tests that have successfully passed 100% of the time (3/3 times), for all supported models, 7 days in row (21X in a row, per model)
The process to promote a test is now: check in -> let stabilize over 7 days -> if needed, use /fix-behavioral-eval to stabilize -> use slash command to promote.

gemini-code-assist · 2026-02-27T16:55:25Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a structured process for managing behavioral evaluations, ensuring that new tests do not immediately block continuous integration. By mandating that all new evaluations start as USUALLY_PASSES and are promoted to ALWAYS_PASSES only after demonstrating consistent reliability via a new slash command, the change aims to enhance CI stability and reduce flakiness. The updated documentation provides clear guidelines for developers on this new workflow.

Highlights

New Behavioral Evaluation Policy: All new behavioral evaluations must now be created with the USUALLY_PASSES policy, ensuring they do not block CI initially.
Automated Test Promotion: A new slash command, /promote-behavioral-eval, has been introduced to automate the promotion of stable tests from USUALLY_PASSES to ALWAYS_PASSES.
Promotion Criteria: The promotion command verifies that tests have achieved a 100% success rate over at least 10 nightly runs across all supported models before updating their policy.
Documentation Update: The evals/README.md file has been updated to clearly outline the new test promotion process and the usage of the /promote-behavioral-eval command.

Changelog

evals/README.md
- Updated the 'Policies' section to mandate that new behavioral evaluations start as USUALLY_PASSES and link to the new promotion process.
- Added a new 'Test promotion process' section detailing the incubation, monitoring, and promotion steps for evaluations.
- Modified an example evalTest call to use USUALLY_PASSES and included a comment about the promotion process.
- Introduced a new top-level section 'Promoting evaluations' that describes the /promote-behavioral-eval slash command, its automated steps, and usage.
evals/validation_fidelity.eval.ts
- Changed the evalTest policy from ALWAYS_PASSES to USUALLY_PASSES for the 'validation_fidelity' test.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/commands/promote-behavioral-eval.toml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new process for promoting behavioral evaluation tests. My review found a broken link in the documentation that should be fixed to ensure clarity for developers following the new process.

evals/README.md

github-actions · 2026-02-27T16:59:46Z

Size Change: -2 B (0%)

Total Size: 25.7 MB

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/gemini.js`	25.2 MB	-2 B (0%)
`./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js`	221 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js`	227 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js`	11.5 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js`	132 B	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B
`./bundle/sandbox-macos-strict-open.sb`	4.82 kB	0 B
`./bundle/sandbox-macos-strict-proxied.sb`	5.02 kB	0 B

_{compressed-size-action}

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…anc/promote

…le-gemini#20575) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gundermanc added 3 commits February 27, 2026 08:35

Demote unreliable test.

67eb222

Add slash command.

a999615

Update docs.

597a7a9

gundermanc requested a review from a team as a code owner February 27, 2026 16:55

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

evals/README.md Outdated Show resolved Hide resolved

Drop threshold to 7.

86560cf

gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Feb 27, 2026

gundermanc mentioned this pull request Feb 27, 2026

Promote stable tests to CI blocking. #20581

Merged

Apply suggestion from @gemini-code-assist[bot]

b9dd0dc

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

joshualitt approved these changes Feb 27, 2026

View reviewed changes

gundermanc enabled auto-merge February 27, 2026 17:38

gundermanc added this pull request to the merge queue Feb 27, 2026

gundermanc removed this pull request from the merge queue due to a manual request Feb 27, 2026

SandyTao520 approved these changes Feb 27, 2026

View reviewed changes

gundermanc added 2 commits February 27, 2026 09:58

Update README.

cf2baa8

Merge remote-tracking branch 'origin/gundermanc/promote' into gunderm…

851b2d3

…anc/promote

gundermanc added this pull request to the merge queue Feb 27, 2026

Merged via the queue into main with commit b2b6092 Feb 27, 2026
27 checks passed

gundermanc deleted the gundermanc/promote branch February 27, 2026 19:22

gemini-code-assist bot mentioned this pull request Feb 27, 2026

Changelog for v0.32.0-preview.0 #20627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slash command for promoting behavioral evals to CI blocking#20575

Add slash command for promoting behavioral evals to CI blocking#20575
gundermanc merged 7 commits intomainfrom
gundermanc/promote

gundermanc commented Feb 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gundermanc commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gundermanc commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading