Skip to content

feat: add behavioral evals for web tool selection#23415

Open
PewterZz wants to merge 5 commits intogoogle-gemini:mainfrom
PewterZz:feat/add-web-tools-eval
Open

feat: add behavioral evals for web tool selection#23415
PewterZz wants to merge 5 commits intogoogle-gemini:mainfrom
PewterZz:feat/add-web-tools-eval

Conversation

@PewterZz
Copy link
Copy Markdown

@PewterZz PewterZz commented Mar 22, 2026

Summary

Adds four behavioral evals testing the agent's ability to correctly choose between web tools based on the nature of the request -- without being told which tool to use.

Details

Test Policy What it tests
Current info not in local files USUALLY_PASSES Agent chooses google_web_search for version info not available locally
Specific URL provided USUALLY_PASSES Agent uses web_fetch for an explicit URL, not google_web_search
Answer available locally USUALLY_PASSES Agent reads package.json rather than searching the web
URL mentioned in context but task is local USUALLY_PASSES Agent resists fetching a URL mentioned as context when the task only requires local edits

Design note: Prompts do not name the expected tool. Each eval creates a situation where the agent must infer the right tool from context. This tests genuine decision-making rather than instruction-following.

Finding during validation: The correct tool name is google_web_search (defined as WEB_SEARCH_TOOL_NAME in base-declarations.ts). Documentation uses web_search. All assertions import constants from @google/gemini-cli-core rather than using string literals.

How to Validate

npm run build
RUN_EVALS=1 npx vitest run evals/web-tools.eval.ts --config evals/vitest.config.ts --reporter=verbose

Related Issues

Fixes #23483
Related to #23331

Adds three evals covering the agent's decision about when to use web
tools vs. local file reads:

- google_web_search for current information queries
- web_fetch when given a specific URL
- no web tool calls when the answer exists in local files

All three evals validated against the live Gemini API. Notably,
the correct tool name is google_web_search (as defined in
WEB_SEARCH_TOOL_NAME in base-declarations.ts), not web_search.
@PewterZz PewterZz requested a review from a team as a code owner March 22, 2026 01:40
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial behavioral evaluations to enhance the agent's ability to correctly select between web tools (google_web_search, web_fetch) and local file access. By adding these tests, the system gains better coverage for scenarios requiring live data, specific URL fetching, or local context, thereby improving the agent's overall tool-use accuracy and reliability.

Highlights

  • New Behavioral Evals: Added three new behavioral evaluation tests to cover the agent's decision-making regarding web tool usage versus local file reads, specifically for google_web_search and web_fetch.
  • Tool Name Discrepancy Addressed: Corrected the tool name for web search from web_search to google_web_search in the new evals, addressing a discrepancy found during validation and documenting it.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@PewterZz
Copy link
Copy Markdown
Author

cc @gundermanc — this is part of pre-proposal work for the GSoC behavioral evals project (#23331). Happy to adjust the prompt wording or assertion logic based on your feedback.

@PewterZz PewterZz changed the title feat(evals): add behavioral evals for web tool selection feat: add behavioral evals for web tool selection Mar 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable behavioral evaluations for the web tools (google_web_search and web_fetch), significantly improving test coverage for the agent's tool selection logic. The new tests are well-structured with clear prompts and assertions. The implementation is clean and follows existing patterns.

@gemini-cli gemini-cli bot added priority/p2 Important but can be addressed in a future release. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. labels Mar 22, 2026
@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality 🔒 maintainer only ⛔ Do not contribute. Internal roadmap item. priority/p2 Important but can be addressed in a future release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add behavioral evals for web tool selection (google_web_search vs web_fetch)

1 participant