fix(beta): persist server-side builtin tool calls in history by vvlrff · Pull Request #2695 · ag2ai/ag2

vvlrff · 2026-04-16T22:39:41Z

Summary

Provider-executed builtin tools (web search, web fetch, code execution, image generation, etc.) used to disappear from agent history. They were dispatched and executed server-side by the provider — but the client silently dropped the resulting blocks (ServerToolUseBlock, executable_code, code_execution_result, ResponseFunctionWebSearch, ImageGenerationCall, …). On a chained reply.ask(), the model had no record of its own prior server-side tool activity and would either hallucinate, repeat the call, or reject the follow-up because the assistant message it received back didn't match what it had emitted.

This PR introduces a uniform mechanism for capturing those server-side tool calls and results as typed events, persisting them through the agent's stream/history, and round-tripping them back into the provider's API on subsequent turns. It covers Anthropic, Google Gemini, and OpenAI Responses providers, and cleans up several related issues uncovered along the way.

Problem

The client loops in all three providers only handled "regular" content blocks (text, thinking, function/tool calls). Server-side blocks were dropped on the floor:
- Anthropic — ServerToolUseBlock, WebSearchToolResultBlock, WebFetchToolResultBlock, CodeExecutionToolResultBlock, BashCodeExecutionToolResultBlock, TextEditorCodeExecutionToolResultBlock.
- Gemini — Part.executable_code, Part.code_execution_result, candidate.grounding_metadata (google search / url context).
- OpenAI Responses — ResponseFunctionWebSearch, ResponseCodeInterpreterToolCall, ImageGenerationCall, ResponseReasoningItem.
convert_messages() in each provider's mapper had no branches for these events, so even if they had been emitted they wouldn't be serialised back into the API on the next turn.
Anthropic's pause_turn continuation loop appended intermediate responses to a local messages list without sending them through the stream → history never saw them.
Once BuiltinToolCallEvent (a ToolCallEvent subclass) started being emitted, the executor's _tool_not_found fallback fired for tools that actually ran on the server — producing spurious ToolNotFoundEvent entries in history.
ShellTool on Anthropic was silently broken — Anthropic's bash tool is client-side, but ShellTool.register() was a no-op, so the executor raised ToolNotFoundError and the model hallucinated output.

Solution

Each provider gets a small, isolated events.py module that defines provider-specific subclasses of BuiltinToolCallEvent / BuiltinToolResultEvent. The events:

Carry the provider's native object (block, part, item) as a non-repr, non-default Field, so it can be re-serialised verbatim on the next turn (no lossy reconstruction).
Expose factory classmethods (from_block, from_item, from_executable_code, from_grounding, from_code_execution_result) that map provider-side names/IDs/inputs into the framework-level BuiltinToolCallEvent / BuiltinToolResultEvent shape.
Are emitted by the client through context.send() so they end up in the persistent stream / conversation history just like every other event.
Are recognised by each provider's convert_messages() and converted back into the provider-specific message format on subsequent turns.

The result: server-side tool activity becomes a first-class part of the conversation history and survives across reply.ask() chains, multi-turn flows, and persistent stream backends.

Changes by Provider

Anthropic

New: autogen/beta/config/anthropic/events.py

AnthropicServerToolCallEvent(BuiltinToolCallEvent) — wraps ServerToolUseBlock. from_block() maps web_search / web_fetch / code_execution / bash_code_execution / text_editor_code_execution block names to canonical AG2 tool names (WEB_SEARCH_TOOL_NAME, WEB_FETCH_TOOL_NAME, CODE_EXECUTION_TOOL_NAME).
AnthropicServerToolResultEvent(BuiltinToolResultEvent) — wraps the union of *ToolResultBlock types via AnthropicServerToolResultBlockType. from_block() dispatches by isinstance to the canonical tool name.

Modified: autogen/beta/config/anthropic/anthropic_client.py

Imports ServerToolUseBlock and the result block types from anthropic.types.
New helper _emit_builtin_tool_events() — converts server-side blocks into typed events.
_process_response() and _process_stream() now handle server_tool_use and *_tool_result content blocks for both streaming and non-streaming paths.
The non-streaming pause_turn continuation loop now calls _emit_builtin_tool_events() on every intermediate response so the events show up in the stream (the streaming loop already routed through _process_stream()).

Modified: autogen/beta/config/anthropic/mappers.py

convert_messages() now recognises AnthropicServerToolCallEvent / AnthropicServerToolResultEvent and re-serialises them via block.model_dump(exclude_none=True, mode="json"). Both attach to the same assistant message, matching Anthropic's expected wire format.
ShellToolSchema now raises UnsupportedToolError("shell", "anthropic") — Anthropic's bash is a client-side tool. Users should switch to LocalShellTool, which works with any provider via subprocess.

Google Gemini

New: autogen/beta/config/gemini/events.py

GeminiServerToolCallEvent(BuiltinToolCallEvent) — wraps types.Part and/or types.GroundingMetadata. Two factories:
- from_executable_code(part) — converts Part.executable_code (code interpreter calls) into a tool call with {"code": ..., "language": ...} arguments.
- from_grounding(gm, name=...) — converts GroundingMetadata (Google Search / URL Context calls) into a tool call with {"queries": [...]} arguments. Generates a synthetic UUID since Gemini doesn't return an id.
GeminiServerToolResultEvent(BuiltinToolResultEvent) — same wrapping, with from_code_execution_result() and from_grounding() factories.

Modified: autogen/beta/config/gemini/gemini_client.py

_process_response() and _process_stream() now walk every candidate's parts and emit:
- A GeminiServerToolCallEvent for executable_code parts, immediately followed by a matching GeminiServerToolResultEvent when a code_execution_result part appears (linked via pending_code_call_id).
- A grounding call/result pair when the candidate has grounding_metadata (deferred to the end of the stream so the final, fully-populated metadata is used).

Modified: autogen/beta/config/gemini/mappers.py

convert_messages() now recognises GeminiServerToolCallEvent / GeminiServerToolResultEvent. When the wrapped event carries a native types.Part, it's appended back to the previous model-role Content so the conversation history matches Gemini's expected shape.
New grounding_tool_name(gm) helper — chooses web_search vs web_fetch based on whether web_search_queries is set.

OpenAI Responses

Modified: autogen/beta/config/openai/openai_responses_client.py

Removed ~135 lines of inline server-tool handling logic. The client now delegates to the typed events introduced in events.py (OpenAIServerToolCallEvent, OpenAIServerToolResultEvent, OpenAIReasoningEvent) via their from_item() factories — both for the non-streaming _process_response() and the streaming _process_stream().
ImageGenerationCall.result is still decoded into BinaryResult files alongside the typed event, preserving backwards compatibility for image outputs.

Modified: autogen/beta/config/openai/events.py

OpenAIServerToolCallEvent.from_item() now handles ResponseFunctionWebSearch, ResponseCodeInterpreterToolCall, and ImageGenerationCall uniformly.
OpenAIServerToolResultEvent.from_item() mirrors the dispatch.
OpenAIReasoningEvent carries the original ResponseReasoningItem so it can be replayed verbatim.

Modified: autogen/beta/config/openai/mappers.py

events_to_responses_input() recognises OpenAIReasoningEvent and OpenAIServerToolCallEvent and serialises them via message.item.model_dump(exclude_none=True, mode="json") — the Responses API accepts the same dict back as input.

Builtin Tools

All builtin tools that proxy to a server-executed provider tool now register a no-op sub_scope listener in their register() method, scoped to BuiltinToolCallEvent.name == <TOOL_NAME>:

code_execution.py, image_generation.py, mcp_server.py, memory.py, skills.py, web_fetch.py, shell.py

This consumes the synthesised builtin-tool-call event so the executor's _tool_not_found fallback no longer fires for tools that ran on the server. The function is module-level (no nested closures inside hot paths) per project conventions.

code_execution.py also adds a CodeExecutionVersions TypeAlias and registers the new code_execution_20260120 Anthropic version alongside the existing code_execution_20250825.

shell.py docstrings are rewritten to reflect that only OpenAI Responses API executes shell server-side; Anthropic users are pointed to LocalShellTool.

…tools

…in-tool-history

… built-in tools

…ed functionality

…in-tool-history

…history

…tool-history

…egrations - Added `from_block` and `from_grounding` methods to `AnthropicServerToolCallEvent` and `AnthropicServerToolResultEvent` for improved event creation from tool use blocks. - Introduced `GeminiServerToolCallEvent` and `GeminiServerToolResultEvent` classes to handle tool calls and results in the Gemini integration, including methods for creating events from executable code and grounding metadata. - Updated `gemini_client.py` to process responses and streams, emitting appropriate tool call and result events. - Enhanced `mappers.py` to support new event types and ensure proper conversion of messages. - Removed unused imports and cleaned up event handling in OpenAI integration, streamlining the response processing logic. - Added comprehensive tests for new event handling in both Anthropic and Gemini configurations, ensuring correct behavior for tool calls and results.

…emissions

codecov · 2026-04-27T20:43:54Z

Codecov Report

❌ Patch coverage is 72.57384% with 65 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
autogen/beta/config/anthropic/anthropic_client.py	20.83%	16 Missing and 3 partials ⚠️
...ogen/beta/config/openai/openai_responses_client.py	40.00%	13 Missing and 2 partials ⚠️
autogen/beta/config/anthropic/events.py	75.00%	5 Missing and 4 partials ⚠️
autogen/beta/config/openai/events.py	88.57%	2 Missing and 2 partials ⚠️
autogen/beta/tools/builtin/image_generation.py	25.00%	3 Missing ⚠️
autogen/beta/tools/builtin/mcp_server.py	25.00%	3 Missing ⚠️
autogen/beta/tools/builtin/memory.py	25.00%	3 Missing ⚠️
autogen/beta/tools/builtin/shell.py	25.00%	3 Missing ⚠️
autogen/beta/tools/builtin/web_fetch.py	25.00%	3 Missing ⚠️
autogen/beta/config/gemini/gemini_client.py	93.54%	0 Missing and 2 partials ⚠️
... and 1 more

Files with missing lines	Coverage Δ
autogen/beta/config/anthropic/mappers.py	`86.99% <100.00%> (+1.28%)`	⬆️
autogen/beta/config/gemini/events.py	`100.00% <100.00%> (ø)`
autogen/beta/config/gemini/mappers.py	`83.66% <100.00%> (+2.09%)`	⬆️
autogen/beta/config/openai/mappers.py	`82.81% <100.00%> (+0.95%)`	⬆️
autogen/beta/tools/builtin/code_execution.py	`88.46% <100.00%> (+0.46%)`	⬆️
autogen/beta/tools/builtin/skills.py	`96.42% <75.00%> (-3.58%)`	⬇️
autogen/beta/config/gemini/gemini_client.py	`72.99% <93.54%> (+15.98%)`	⬆️
autogen/beta/tools/builtin/image_generation.py	`93.18% <25.00%> (-4.38%)`	⬇️
autogen/beta/tools/builtin/mcp_server.py	`92.85% <25.00%> (-4.58%)`	⬇️
autogen/beta/tools/builtin/memory.py	`88.00% <25.00%> (-7.46%)`	⬇️
... and 6 more

... and 52 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

feat(beta): implement server-side tool event handling and testing

63f778d

vvlrff requested review from Lancetnik and marklysze as code owners April 16, 2026 22:39

Merge branch 'main' into fix/anthropic-builtin-tool-history

0ba0ba3

github-actions Bot added the beta label Apr 16, 2026

Lancetnik reviewed Apr 17, 2026

View reviewed changes

Comment thread autogen/beta/tools/executor.py Outdated

vvlrff added 3 commits April 18, 2026 17:48

feat: add event handling for tool execution across multiple built-in …

8058b1a

…tools

Merge remote-tracking branch 'upstream/main' into fix/anthropic-built…

a378550

…in-tool-history

feat: normalize tool names for built-in tool events and update tests

7afec09

Lancetnik requested changes Apr 19, 2026

View reviewed changes

vvlrff and others added 2 commits April 20, 2026 01:07

refactor: streamline tool name handling and improve test coverage for…

564a02e

… built-in tools

Merge branch 'main' into fix/anthropic-builtin-tool-history

79e5e5b

Lancetnik self-assigned this Apr 21, 2026

vvlrff added 4 commits April 21, 2026 20:02

feat(events): introduce Anthropic server tool events and update relat…

7cafda1

…ed functionality

Merge remote-tracking branch 'upstream/main' into fix/anthropic-built…

7dd08b5

…in-tool-history

add OpenAI event classes and integrate into response processing

e79f824

refactor: clean up import statements in search tools

b7946ee

vvlrff mentioned this pull request Apr 21, 2026

fix(beta): persist OpenAI server-side builtin tool calls in history #2712

Closed

vvlrff marked this pull request as draft April 23, 2026 16:56

vvlrff added 4 commits April 26, 2026 15:13

Merge remote-tracking branch 'upstream/main' into fix/anthropic-built…

5d74854

…in-tool-history

Merge remote-tracking branch 'upstream/main' into fix/openai-builtin-…

a8ce14d

…history

Merge branch 'fix/openai-builtin-history' into fix/anthropic-builtin-…

1b62764

…tool-history

vvlrff marked this pull request as ready for review April 26, 2026 16:33

vvlrff changed the title ~~fix(beta): persist Anthropic server-side builtin tool calls in history~~ fix(beta): persist server-side builtin tool calls in history Apr 26, 2026

vvlrff and others added 3 commits April 27, 2026 19:58

fix: update event handling for Anthropic server tool results

3ee7176

Merge branch 'main' into fix/anthropic-builtin-tool-history

fceec56

fix: enhance OpenAI reasoning event handling and ensure unique event …

863a88b

…emissions

Lancetnik approved these changes Apr 27, 2026

View reviewed changes

Lancetnik enabled auto-merge April 27, 2026 20:40

Lancetnik added this pull request to the merge queue Apr 27, 2026

Merged via the queue into ag2ai:main with commit 52187d8 Apr 27, 2026
31 of 34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(beta): persist server-side builtin tool calls in history#2695

fix(beta): persist server-side builtin tool calls in history#2695
Lancetnik merged 18 commits intoag2ai:mainfrom
vvlrff:fix/anthropic-builtin-tool-history

vvlrff commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vvlrff commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes by Provider

Anthropic

Google Gemini

OpenAI Responses

Builtin Tools

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 27, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vvlrff commented Apr 16, 2026 •

edited

Loading