-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Add behavioral evals for web tool selection (google_web_search vs web_fetch) #23483
Copy link
Copy link
Open
Labels
area/agentIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent QualityIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Qualitystatus/need-triageIssues that need to be triaged by the triage automation.Issues that need to be triaged by the triage automation.
Description
Summary
The eval suite has no tests covering how the agent chooses between google_web_search and web_fetch. These are the two web tools available to the agent but they serve different purposes:
google_web_search— for open-ended queries where the agent needs to find informationweb_fetch— for fetching a specific URL the user has already provided
Without evals, regressions in this distinction go undetected.
Expected behavior
- When asked for current information with no URL, agent should use
google_web_search - When given a specific URL, agent should use
web_fetch, not search - When the answer is in local files, agent should use neither web tool
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/agentIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent QualityIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Qualitystatus/need-triageIssues that need to be triaged by the triage automation.Issues that need to be triaged by the triage automation.