fix(evals): stale tool log snapshot, missing telemetry wait, and wrong import path by ishaan-arora-1 · Pull Request #23842 · google-gemini/gemini-cli

ishaan-arora-1 · 2026-03-26T00:21:01Z

Three eval files had assertion reliability bugs where tool logs were read at the wrong time or imported from the wrong source.

tracker.eval.ts — readToolLogs() was captured once after waitForToolCall(TRACKER_CREATE_TASK_TOOL_NAME) (line 44). After a second waitForToolCall(TRACKER_UPDATE_TASK_TOOL_NAME) (line 55), the code searched the stale snapshot for the update call (line 63). The update call was not in that snapshot, so updateCall would be undefined and JSON.parse(updateCall!.toolRequest.args) would throw a misleading error instead of a clear assertion failure. Fixed by re-reading tool logs after the second wait.

edit-locations-eval.eval.ts — readToolLogs() was called without a preceding waitForTelemetryReady(), so tool logs could be incomplete when assertions ran. Also removed a leftover console.log('DEBUG: targetFiles', targetFiles) statement.

save_memory.eval.ts — Imported assertModelHasOutput and checkModelOutputContent from ../integration-tests/test-helper.js instead of ./test-helper.js (which re-exports the same functions via export * from '@google/gemini-cli-test-utils'). Every other eval file uses the local import. This is the same fix applied to hierarchical_memory.eval.ts in #23790.

gemini-code-assist · 2026-03-26T00:21:15Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the reliability and consistency of several evaluation files by resolving issues related to stale tool log snapshots, ensuring proper telemetry readiness before assertions, and standardizing import paths for shared test utilities. These changes aim to prevent misleading test failures and improve the overall robustness of the evaluation suite.

Highlights

tracker.eval.ts reliability: Fixed an issue in tracker.eval.ts where tool logs were read once and then used as a stale snapshot, leading to misleading assertion errors when a second tool call was made. The fix involves re-reading the tool logs to ensure up-to-date data.
edit-locations-eval.eval.ts telemetry and debugging: Addressed a potential race condition in edit-locations-eval.eval.ts by ensuring the test waits for telemetry to be ready before attempting to read tool logs. Additionally, a leftover debug console.log statement was removed.
save_memory.eval.ts import path correction: Corrected an incorrect import path in save_memory.eval.ts for test utility functions (assertModelHasOutput and checkModelOutputContent), aligning it with other evaluation files by using the local test-helper.js.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request improves the reliability and correctness of evaluation tests. Specifically, it adds a waitForTelemetryReady() call in edit-locations-eval.eval.ts to ensure telemetry data is fully processed before reading tool logs, removes a debug console.log statement, corrects import paths in save_memory.eval.ts to use a local test-helper.js for consistency, and re-reads toolLogs in tracker.eval.ts to prevent issues with stale log data. I have no feedback to provide as all review comments were filtered out.

gemini-code-assist · 2026-03-26T00:23:24Z

evals/tracker.eval.ts

      ).toBe(true);

-      const updateCall = toolLogs.find(
+      const updatedToolLogs = rig.readToolLogs();


Re-reading toolLogs into updatedToolLogs before finding the tracker_update_task call is a critical fix. Previously, the code was using a stale snapshot of toolLogs, which could lead to updateCall being undefined and subsequent assertion failures that masked the true issue. This ensures the test uses the most current state of tool activity.

gemini-code-assist · 2026-03-26T00:23:24Z

evals/edit-locations-eval.eval.ts

    prompt: 'Fix the bug in src/math.ts. Do not run the code.',
    timeout: 180000,
    assert: async (rig) => {
+      await rig.waitForTelemetryReady();


The addition of await rig.waitForTelemetryReady(); is a crucial fix. Without this, readToolLogs() could be called before telemetry data is fully processed, leading to incomplete logs and flaky test results. This ensures the test operates on accurate and up-to-date telemetry information.

gemini-code-assist · 2026-03-26T00:23:24Z

evals/edit-locations-eval.eval.ts

        }
      });

-      console.log('DEBUG: targetFiles', targetFiles);


Removing the console.log statement is a good practice. Debugging logs should not be committed to the main branch as they can clutter test output and potentially expose internal state during automated runs, impacting maintainability and clarity.

gemini-code-assist · 2026-03-26T00:23:24Z

evals/save_memory.eval.ts

 import {
+  evalTest,
  assertModelHasOutput,
  checkModelOutputContent,
-} from '../integration-tests/test-helper.js';
+} from './test-helper.js';


Correcting the import path from ../integration-tests/test-helper.js to ./test-helper.js ensures consistency and correctness within the evals directory. Relying on the local re-export is more robust and aligns with the import patterns used by other evaluation files, improving maintainability.

…g import path tracker.eval.ts: readToolLogs() was called once after waiting for tracker_create_task, but the same snapshot was reused to find tracker_update_task after a second waitForToolCall(). The update call was not in that snapshot. Re-read logs after the second wait. edit-locations-eval.eval.ts: readToolLogs() was called without waitForTelemetryReady(), risking incomplete logs. Also removed a leftover console.log('DEBUG: ...') statement. save_memory.eval.ts: Consolidated imports to use ./test-helper.js (which re-exports from @google/gemini-cli-test-utils) instead of reaching into ../integration-tests/test-helper.js. Fixes google-gemini#23841

gemini-cli · 2026-04-09T03:04:57Z

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

ishaan-arora-1 requested a review from a team as a code owner March 26, 2026 00:21

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

gemini-cli bot added the area/platform Issues related to Build infra, Release mgmt, Testing, Eval infra, Capacity, Quota mgmt label Mar 26, 2026

github-actions bot mentioned this pull request Mar 26, 2026

📊 AI CLI 工具社区动态日报 2026-03-26 gsscsd/big_model_radar#96

Open

ishaan-arora-1 force-pushed the fix/eval-assertion-reliability branch from 1fa4a25 to 68b69e2 Compare March 29, 2026 11:14

gemini-cli bot closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evals): stale tool log snapshot, missing telemetry wait, and wrong import path#23842

fix(evals): stale tool log snapshot, missing telemetry wait, and wrong import path#23842
ishaan-arora-1 wants to merge 1 commit intogoogle-gemini:mainfrom
ishaan-arora-1:fix/eval-assertion-reliability

ishaan-arora-1 commented Mar 26, 2026

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Uh oh!

gemini-code-assist bot Mar 26, 2026

Uh oh!

gemini-code-assist bot Mar 26, 2026

Uh oh!

gemini-code-assist bot Mar 26, 2026

Uh oh!

gemini-cli bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ishaan-arora-1 commented Mar 26, 2026

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-cli bot commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant