fix(evals): correct misplaced params in answer-vs-act and tool_output_masking#23788
fix(evals): correct misplaced params in answer-vs-act and tool_output_masking#23788ishaan-arora-1 wants to merge 1 commit intogoogle-gemini:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical configuration issues in two evaluation files, Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the structure of evaluation parameters within the evals directory. Key changes include moving the timeout parameter to the top-level of EvalCase, adding state and fakeResponsesPath to the params object, and nesting security settings under a new settings property within params. A high-severity concern was raised regarding the settings type within EvalCase's params, which is currently too loose (Record<string, unknown>). It is recommended to tighten this type, potentially by importing Settings from packages/cli/src/config/settingsSchema.ts, to enhance type safety and prevent future configuration-related bugs.
| @@ -266,7 +266,8 @@ export interface EvalCase { | |||
| name: string; | |||
| params?: { | |||
| settings?: ForbiddenToolSettings & Record<string, unknown>; | |||
There was a problem hiding this comment.
The type for settings is ForbiddenToolSettings & Record<string, unknown>, which is very loose. This is the same kind of issue that allowed the bugs this PR is fixing. The Record<string, unknown> allows any property, which can lead to typos or misplaced configuration going unnoticed by the type checker.
To improve type safety and prevent future issues, consider tightening this type. Ideally, it should be based on the canonical settings schema. If it's possible to import the Settings type from packages/cli/src/config/settingsSchema.ts, changing the type to ForbiddenToolSettings & Settings would provide much stronger validation at compile time. This would ensure that the settings objects used in evaluations are structurally correct.
References
- The comment suggests tightening a type to
ForbiddenToolSettings & Settingsbased onsettingsSchema.ts, which directly ensures the TypeScript type aligns with the canonical settings schema for stronger validation. - The comment advocates for using the canonical settings schema as the source of truth for defining types, which prevents issues like loose
Record<string, unknown>types and reinforces the schema's role as the single source of truth for configuration structure.
|
The Additionally, The right path is a follow-up that re-exports |
…_masking `answer-vs-act.eval.ts` had `timeout` inside `params` where it's ignored — moved it to the top-level `EvalCase` property. `tool_output_masking.eval.ts` had `security` directly in `params` instead of nested under `settings`, so folder trust config was never written to the settings file. Also tightened the `EvalCase.params` type to only accept keys that `TestRig.setup()` actually reads (`settings`, `state`, `fakeResponsesPath`), replacing the permissive index signature that allowed these bugs to compile silently. Fixes google-gemini#23787
556f8e9 to
359bc03
Compare
|
Hi there! Thank you for your interest in contributing to Gemini CLI. To ensure we maintain high code quality and focus on our prioritized roadmap, we have updated our contribution policy (see Discussion #17383). We only guarantee review and consideration of pull requests for issues that are explicitly labeled as 'help wanted'. All other community pull requests are subject to closure after 14 days if they do not align with our current focus areas. For this reason, we strongly recommend that contributors only submit pull requests against issues explicitly labeled as 'help-wanted'. This pull request is being closed as it has been open for 14 days without a 'help wanted' designation. We encourage you to find and contribute to existing 'help wanted' issues in our backlog! Thank you for your understanding and for being part of our community! |
Fixes #23787
Found two eval files where config properties were at the wrong nesting level, causing them to be silently ignored.
answer-vs-act.eval.ts—timeoutwas insideparamsinstead of being a top-levelEvalCaseproperty.evalTestpassesevalCase.timeouttorig.run(), not anything fromparams, so the 20s timeout was never applied (defaulted to 300s).tool_output_masking.eval.ts—securitywas directly inparamsinstead of undersettings.TestRig.setup()passesoptions.settingsto_createSettingsFile, so the folder trust config was never written. (Compare withhierarchical_memory.eval.tswhich does this correctly.)Both slipped through because
EvalCase.paramshad[key: string]: unknown, accepting any key without complaint. Replaced the index signature with the specific keysTestRig.setup()actually reads:settings,state,fakeResponsesPath.