test(core): remove hardcoded model from TestRig by NTaylorMullen · Pull Request #18710 · google-gemini/gemini-cli

NTaylorMullen · 2026-02-10T05:17:24Z

Summary

This PR removes the hardcoded DEFAULT_GEMINI_MODEL (gemini-2.5-pro) override from TestRig in behavioral evaluations. By omitting the model key from the generated settings.json, the CLI is allowed to fall back to its internal dynamic default (currently auto-gemini-3 which resolves to gemini-3-pro-preview).

Details

Why: Allows evaluations to naturally track the CLI's default model, ensuring that "always passing" evals are tested against the same model a typical user would experience by default.
Impact: Behavioral evals in CI will now run against Gemini 3 instead of being pinned to Gemini 2.5 Pro.

Related Issues

None.

How to Validate

Run an evaluation locally: npm run test:always_passing_evals -- evals/shell-efficiency.eval.ts
Inspect the network logs in evals/logs/*.jsonl.
Verify the user-agent header or request URL reflects the current default model (gemini-3-pro-preview) rather than gemini-2.5-pro.

Pre-Merge Checklist

Updated relevant documentation and README (if needed)
Added/updated tests (if needed) - Verified via local evaluation run.
Noted breaking changes (if any)
Validated on required platforms/methods:
- MacOS
  - npm run

gemini-code-assist · 2026-02-10T05:17:37Z

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the testing infrastructure to ensure that behavioral evaluations utilize the CLI's dynamic default model rather than a static, hardcoded version. This change allows tests to naturally track the latest default model, providing more relevant and up-to-date validation of the user experience.

Highlights

Model Defaulting: The hardcoded DEFAULT_GEMINI_MODEL (gemini-2.5-pro) override has been removed from TestRig in behavioral evaluations.
Dynamic Model Selection: By omitting the model key from settings.json, the CLI will now fall back to its internal dynamic default model (currently auto-gemini-3, resolving to gemini-3-pro-preview).
Evaluation Impact: Behavioral evaluations in CI will now run against Gemini 3 instead of being pinned to Gemini 2.5 Pro, ensuring tests align with the typical user experience.

Changelog

packages/test-utils/src/test-rig.ts
- Removed the import of DEFAULT_GEMINI_MODEL.
- Eliminated the explicit model configuration block from the TestRig setup.

Activity

No activity has occurred on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request removes the hardcoded model from the TestRig configuration, allowing behavioral evaluations to default to the CLI's standard model, which improves the accuracy of tests reflecting the user experience. No security vulnerabilities were found.

github-actions · 2026-02-10T05:21:07Z

Size Change: -2 B (0%)

Total Size: 23.9 MB

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/gemini.js`	23.9 MB	-2 B (0%)
`./bundle/sandbox-macos-permissive-closed.sb`	1.03 kB	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-closed.sb`	3.29 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B

_{compressed-size-action}

- Removes hardcoded model from TestRig for unpinned evaluations.\n- Pins integration tests to gemini-2.5-pro via new GEMINI_TEST_TYPE=integration env var.\n- Moves shell efficiency evals to USUALLY_PASSES to prevent PR blocking while tracking Gemini 3.

* Fix newline insertion bug in replace tool (google-gemini#18595) * fix(evals): update save_memory evals and simplify tool description (google-gemini#18610) * chore(evals): update validation_fidelity_pre_existing_errors to USUALLY_PASSES (google-gemini#18617) * fix: shorten tool call IDs and fix duplicate tool name in truncated output filenames (google-gemini#18600) * feat(cli): implement atomic writes and safety checks for trusted folders (google-gemini#18406) * Remove relative docs links (google-gemini#18650) * docs: add legacy snippets convention to GEMINI.md (google-gemini#18597) * fix(chore): Support linting for cjs (google-gemini#18639) Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com> * feat: move shell efficiency guidelines to tool description (google-gemini#18614) * Added "" as default value, since getText() used to expect a string only and thus crashed when undefined... Fixes google-gemini#18076 (google-gemini#18099) * Allow @-includes outside of workspaces (with permission) (google-gemini#18470) * chore: make `ask_user` header description more clear (google-gemini#18657) * bug(core): Fix minor bug in migration logic. (google-gemini#18661) * Harded code assist converter. (google-gemini#18656) * refactor(core): model-dependent tool definitions (google-gemini#18563) * feat: enable plan mode experiment in settings (google-gemini#18636) * refactor: push isValidPath() into parsePastedPaths() (google-gemini#18664) * fix(cli): correct 'esc to cancel' position and restore duration display (google-gemini#18534) * feat(cli): add DevTools integration with gemini-cli-devtools (google-gemini#18648) * chore: remove unused exports and redundant hook files (google-gemini#18681) * Fix number of lines being reported in rewind confirmation dialog (google-gemini#18675) * feat(cli): disable folder trust in headless mode (google-gemini#18407) * Disallow unsafe type assertions (google-gemini#18688) * Change event type for release (google-gemini#18693) * feat: handle multiple dynamic context filenames in system prompt (google-gemini#18598) * Properly parse at-commands with narrow non-breaking spaces (google-gemini#18677) * refactor(core): centralize core tool definitions and support model-specific schemas (google-gemini#18662) * feat(core): Render memory hierarchically in context. (google-gemini#18350) * feat: Ctrl+O to expand paste placeholder (google-gemini#18103) * fix(cli): Improve header spacing (google-gemini#18531) * Feature/quota visibility 16795 (google-gemini#18203) * docs: remove TOC marker from Plan Mode header (google-gemini#18678) * Inline thinking bubbles with summary/full modes (google-gemini#18033) Co-authored-by: Jacob Richman <jacob314@gmail.com> * fix(ui): remove redundant newlines in Gemini messages (google-gemini#18538) * test(cli): fix AppContainer act() warnings and improve waitFor resilience (google-gemini#18676) * refactor(core): refine Security & System Integrity section in system prompt (google-gemini#18601) * Fix layout rounding. (google-gemini#18667) * docs(skills): enhance pr-creator safety and interactivity (google-gemini#18616) * test(core): remove hardcoded model from TestRig (google-gemini#18710) * feat(core): optimize sub-agents system prompt intro (google-gemini#18608) * feat(cli): update approval mode labels and shortcuts per latest UX spec (google-gemini#18698) * fix(plan): update persistent approval mode setting (google-gemini#18638) Co-authored-by: Sandy Tao <sandytao520@icloud.com> * fix: move toasts location to left side (google-gemini#18705) * feat(routing): restrict numerical routing to Gemini 3 family (google-gemini#18478) * fix(ide): fix ide nudge setting (google-gemini#18733) * fix(core): standardize tool formatting in system prompts (google-gemini#18615) * chore: consolidate to green in ask user dialog (google-gemini#18734) * feat: add `extensionsExplore` setting to enable extensions explore UI. (google-gemini#18686) * feat(cli): defer devtools startup and integrate with F12 (google-gemini#18695) * ui: update & subdue footer colors and animate progress indicator (google-gemini#18570) * test: add model-specific snapshots for coreTools (google-gemini#18707) Co-authored-by: matt korwel <matt.korwel@gmail.com> * ci: shard windows tests and fix event listener leaks (google-gemini#18670) * fix: allow `ask_user` tool in yolo mode (google-gemini#18541) * feat: redact disabled tools from system prompt (google-gemini#13597) (google-gemini#18613) * Update Gemini.md to use the curent year on creating new files (google-gemini#18460) * Code review cleanup for thinking display (google-gemini#18720) * fix(cli): hide scrollbars when in alternate buffer copy mode (google-gemini#18354) Co-authored-by: Jacob Richman <jacob314@gmail.com> * Fix issues with rip grep (google-gemini#18756) * fix(cli): fix history navigation regression after prompt autocomplete (google-gemini#18752) * chore: cleanup unused and add unlisted dependencies in packages/cli (google-gemini#18749) * Fix issue where Gemini CLI creates tests in a new file (google-gemini#18409) * feat(telemetry): Ensure experiment IDs are included in OpenTelemetry logs (google-gemini#18747) * feat(ux): added text wrapping capabilities to markdown tables (google-gemini#18240) Co-authored-by: jacob314 <jacob314@gmail.com> * Revert "fix(mcp): ensure MCP transport is closed to prevent memory leaks" (google-gemini#18771) * chore(release): bump version to 0.30.0-nightly.20260210.a2174751d (google-gemini#18772) * chore: cleanup unused and add unlisted dependencies in packages/core (google-gemini#18762) * chore(core): update activate_skill prompt verbiage to be more direct (google-gemini#18605) * Add autoconfigure memory usage setting to the dialog (google-gemini#18510) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(core): prevent race condition in policy persistence (google-gemini#18506) Co-authored-by: Allen Hutchison <adh@google.com> * fix(evals): prevent false positive in hierarchical memory test (google-gemini#18777) * test(evals): mark all `save_memory` evals as `USUALLY_PASSES` due to unreliability (google-gemini#18786) * feat(cli): add setting to hide shortcuts hint UI (google-gemini#18562) * feat(core): formalize 5-phase sequential planning workflow (google-gemini#18759) * Introduce limits for search results. (google-gemini#18767) --------- Co-authored-by: Andrew Garrett <andrewgarrett@google.com> Co-authored-by: N. Taylor Mullen <ntaylormullen@google.com> Co-authored-by: Sandy Tao <sandytao520@icloud.com> Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com> Co-authored-by: christine betts <chrstn@uw.edu> Co-authored-by: Aswin Ashok <aswwwin@google.com> Co-authored-by: Abhijith V Ashok <abhi2349jith@gmail.com> Co-authored-by: Tommaso Sciortino <sciortino@gmail.com> Co-authored-by: Jack Wotherspoon <jackwoth@google.com> Co-authored-by: joshualitt <joshualitt@google.com> Co-authored-by: Jacob Richman <jacob314@gmail.com> Co-authored-by: Aishanee Shah <aishaneeshah@gmail.com> Co-authored-by: Jerop Kipruto <jerop@google.com> Co-authored-by: Adib234 <30782825+Adib234@users.noreply.github.com> Co-authored-by: Christian Gunderman <gundermanc@gmail.com> Co-authored-by: g-samroberts <158088236+g-samroberts@users.noreply.github.com> Co-authored-by: Spencer <spencertang@google.com> Co-authored-by: Dmitry Lyalin <dmitry.lyalin@lyalin.com> Co-authored-by: matt korwel <matt.korwel@gmail.com> Co-authored-by: Shreya Keshive <shreyakeshive@google.com> Co-authored-by: Sri Pasumarthi <111310667+sripasg@users.noreply.github.com> Co-authored-by: Keith Guerin <keithguerin@gmail.com> Co-authored-by: Sehoon Shon <sshon@google.com> Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com> Co-authored-by: Kevin Ramdass <ramdass.kevin@gmail.com> Co-authored-by: Dev Randalpura <devrandalpura@google.com> Co-authored-by: gemini-cli-robot <gemini-cli-robot@google.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Brad Dux <959674+braddux@users.noreply.github.com> Co-authored-by: Allen Hutchison <adh@google.com> Co-authored-by: Abhijit Balaji <abhijitbalaji@google.com>

NTaylorMullen requested a review from a team as a code owner February 10, 2026 05:17

gundermanc approved these changes Feb 10, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 10, 2026

View reviewed changes

NTaylorMullen enabled auto-merge February 10, 2026 05:32

NTaylorMullen disabled auto-merge February 10, 2026 05:32

gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Feb 10, 2026

NTaylorMullen force-pushed the ntm/remove-hardcoded-eval-model branch from 149d703 to cb16bfa Compare February 10, 2026 06:30

NTaylorMullen enabled auto-merge February 10, 2026 06:33

NTaylorMullen added this pull request to the merge queue Feb 10, 2026

NTaylorMullen removed this pull request from the merge queue due to a manual request Feb 10, 2026

NTaylorMullen force-pushed the ntm/remove-hardcoded-eval-model branch from cb16bfa to 7b908ee Compare February 10, 2026 06:45

NTaylorMullen enabled auto-merge February 10, 2026 06:46

NTaylorMullen force-pushed the ntm/remove-hardcoded-eval-model branch from 7b908ee to 1bec83e Compare February 10, 2026 07:46

NTaylorMullen added this pull request to the merge queue Feb 10, 2026

Merged via the queue into main with commit 67d9b76 Feb 10, 2026
26 checks passed

NTaylorMullen deleted the ntm/remove-hardcoded-eval-model branch February 10, 2026 08:05

This was referenced Feb 18, 2026

Changelog for v0.29.0 #19361

Merged

Changelog for v0.30.0-preview.5 #20107

Merged

kuishou68 pushed a commit to iOfficeAI/aioncli that referenced this pull request Feb 27, 2026

test(core): remove hardcoded model from TestRig (google-gemini#18710)

d3e19ba

liamhelmer pushed a commit to badal-io/gemini-cli that referenced this pull request Mar 12, 2026

test(core): remove hardcoded model from TestRig (google-gemini#18710)

fa372f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(core): remove hardcoded model from TestRig#18710

test(core): remove hardcoded model from TestRig#18710
NTaylorMullen merged 1 commit intomainfrom
ntm/remove-hardcoded-eval-model

NTaylorMullen commented Feb 10, 2026

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NTaylorMullen commented Feb 10, 2026

Summary

Details

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

gemini-code-assist bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 10, 2026 •

edited

Loading