Skip to content

fix(evals): correct misplaced params in answer-vs-act and tool_output_masking#23788

Closed
ishaan-arora-1 wants to merge 1 commit intogoogle-gemini:mainfrom
ishaan-arora-1:fix/eval-misplaced-params
Closed

fix(evals): correct misplaced params in answer-vs-act and tool_output_masking#23788
ishaan-arora-1 wants to merge 1 commit intogoogle-gemini:mainfrom
ishaan-arora-1:fix/eval-misplaced-params

Conversation

@ishaan-arora-1
Copy link
Copy Markdown
Contributor

Fixes #23787

Found two eval files where config properties were at the wrong nesting level, causing them to be silently ignored.

answer-vs-act.eval.tstimeout was inside params instead of being a top-level EvalCase property. evalTest passes evalCase.timeout to rig.run(), not anything from params, so the 20s timeout was never applied (defaulted to 300s).

tool_output_masking.eval.tssecurity was directly in params instead of under settings. TestRig.setup() passes options.settings to _createSettingsFile, so the folder trust config was never written. (Compare with hierarchical_memory.eval.ts which does this correctly.)

Both slipped through because EvalCase.params had [key: string]: unknown, accepting any key without complaint. Replaced the index signature with the specific keys TestRig.setup() actually reads: settings, state, fakeResponsesPath.

@ishaan-arora-1 ishaan-arora-1 requested a review from a team as a code owner March 25, 2026 13:56
@google-cla
Copy link
Copy Markdown

google-cla bot commented Mar 25, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical configuration issues in two evaluation files, answer-vs-act.eval.ts and tool_output_masking.eval.ts, where key properties were misplaced, leading to their silent omission during test execution. By correcting the nesting of timeout and security parameters and refining the EvalCase.params type definition, the changes ensure that evaluation configurations are correctly applied and prevent future misconfigurations from going unnoticed, improving the reliability of the evaluation suite.

Highlights

  • Corrected timeout placement in answer-vs-act.eval.ts: The timeout property was moved from being incorrectly nested within params to its proper top-level EvalCase position, ensuring the 20-second timeout is now applied instead of defaulting to 300 seconds.
  • Adjusted security nesting in tool_output_masking.eval.ts: The security configuration, specifically folderTrust, was moved to be correctly nested under params.settings, resolving an issue where folder trust settings were not being applied during test setup.
  • Refined EvalCase.params type definition: The generic [key: string]: unknown index signature was replaced with specific keys (settings, state, fakeResponsesPath) in evals/test-helper.ts to prevent similar misconfigurations from being silently ignored in the future by enforcing stricter type checking.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the structure of evaluation parameters within the evals directory. Key changes include moving the timeout parameter to the top-level of EvalCase, adding state and fakeResponsesPath to the params object, and nesting security settings under a new settings property within params. A high-severity concern was raised regarding the settings type within EvalCase's params, which is currently too loose (Record<string, unknown>). It is recommended to tighten this type, potentially by importing Settings from packages/cli/src/config/settingsSchema.ts, to enhance type safety and prevent future configuration-related bugs.

@@ -266,7 +266,8 @@ export interface EvalCase {
name: string;
params?: {
settings?: ForbiddenToolSettings & Record<string, unknown>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The type for settings is ForbiddenToolSettings & Record<string, unknown>, which is very loose. This is the same kind of issue that allowed the bugs this PR is fixing. The Record<string, unknown> allows any property, which can lead to typos or misplaced configuration going unnoticed by the type checker.

To improve type safety and prevent future issues, consider tightening this type. Ideally, it should be based on the canonical settings schema. If it's possible to import the Settings type from packages/cli/src/config/settingsSchema.ts, changing the type to ForbiddenToolSettings & Settings would provide much stronger validation at compile time. This would ensure that the settings objects used in evaluations are structurally correct.

References
  1. The comment suggests tightening a type to ForbiddenToolSettings & Settings based on settingsSchema.ts, which directly ensures the TypeScript type aligns with the canonical settings schema for stronger validation.
  2. The comment advocates for using the canonical settings schema as the source of truth for defining types, which prevents issues like loose Record<string, unknown> types and reinforces the schema's role as the single source of truth for configuration structure.

@ishaan-arora-1
Copy link
Copy Markdown
Contributor Author

ishaan-arora-1 commented Mar 26, 2026

The Settings type lives in packages/cli/src/config/settingsSchema.ts — a package the evals don't depend on (evals/test-helper.ts only imports from @google/gemini-cli-core and @google/gemini-cli-test-utils). Importing it directly would create a cross-package dependency that doesn't exist today.

Additionally, TestRig.setup() in @google/gemini-cli-test-utils itself defines settings as Record<string, unknown>, so our type here is already as strict as the downstream consumer.

The right path is a follow-up that re-exports Settings from @google/gemini-cli-test-utils (which already depends on core), then tightens both TestRig.setup() and EvalCase.params.settings together. Happy to file that issue and pick it up.

…_masking

`answer-vs-act.eval.ts` had `timeout` inside `params` where it's
ignored — moved it to the top-level `EvalCase` property.

`tool_output_masking.eval.ts` had `security` directly in `params`
instead of nested under `settings`, so folder trust config was never
written to the settings file.

Also tightened the `EvalCase.params` type to only accept keys that
`TestRig.setup()` actually reads (`settings`, `state`,
`fakeResponsesPath`), replacing the permissive index signature that
allowed these bugs to compile silently.

Fixes google-gemini#23787
@ishaan-arora-1 ishaan-arora-1 force-pushed the fix/eval-misplaced-params branch from 556f8e9 to 359bc03 Compare March 29, 2026 11:15
@gemini-cli
Copy link
Copy Markdown
Contributor

gemini-cli bot commented Apr 9, 2026

Hi there! Thank you for your interest in contributing to Gemini CLI.

To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383).

We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'.

This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community!

@gemini-cli gemini-cli bot closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(evals): misplaced params in answer-vs-act and tool_output_masking evals

1 participant