fix(beta): persist server-side builtin tool calls in history#2695
Merged
Lancetnik merged 18 commits intoag2ai:mainfrom Apr 27, 2026
Merged
fix(beta): persist server-side builtin tool calls in history#2695Lancetnik merged 18 commits intoag2ai:mainfrom
Lancetnik merged 18 commits intoag2ai:mainfrom
Conversation
Lancetnik
reviewed
Apr 17, 2026
Lancetnik
requested changes
Apr 19, 2026
…egrations - Added `from_block` and `from_grounding` methods to `AnthropicServerToolCallEvent` and `AnthropicServerToolResultEvent` for improved event creation from tool use blocks. - Introduced `GeminiServerToolCallEvent` and `GeminiServerToolResultEvent` classes to handle tool calls and results in the Gemini integration, including methods for creating events from executable code and grounding metadata. - Updated `gemini_client.py` to process responses and streams, emitting appropriate tool call and result events. - Enhanced `mappers.py` to support new event types and ensure proper conversion of messages. - Removed unused imports and cleaned up event handling in OpenAI integration, streamlining the response processing logic. - Added comprehensive tests for new event handling in both Anthropic and Gemini configurations, ensuring correct behavior for tool calls and results.
Lancetnik
approved these changes
Apr 27, 2026
Lancetnik
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Provider-executed builtin tools (web search, web fetch, code execution, image generation, etc.) used to disappear from agent history. They were dispatched and executed server-side by the provider — but the client silently dropped the resulting blocks (
ServerToolUseBlock,executable_code,code_execution_result,ResponseFunctionWebSearch,ImageGenerationCall, …). On a chainedreply.ask(), the model had no record of its own prior server-side tool activity and would either hallucinate, repeat the call, or reject the follow-up because the assistant message it received back didn't match what it had emitted.This PR introduces a uniform mechanism for capturing those server-side tool calls and results as typed events, persisting them through the agent's stream/history, and round-tripping them back into the provider's API on subsequent turns. It covers Anthropic, Google Gemini, and OpenAI Responses providers, and cleans up several related issues uncovered along the way.
Problem
ServerToolUseBlock,WebSearchToolResultBlock,WebFetchToolResultBlock,CodeExecutionToolResultBlock,BashCodeExecutionToolResultBlock,TextEditorCodeExecutionToolResultBlock.Part.executable_code,Part.code_execution_result,candidate.grounding_metadata(google search / url context).ResponseFunctionWebSearch,ResponseCodeInterpreterToolCall,ImageGenerationCall,ResponseReasoningItem.convert_messages()in each provider's mapper had no branches for these events, so even if they had been emitted they wouldn't be serialised back into the API on the next turn.pause_turncontinuation loop appended intermediate responses to a localmessageslist without sending them through the stream → history never saw them.BuiltinToolCallEvent(aToolCallEventsubclass) started being emitted, the executor's_tool_not_foundfallback fired for tools that actually ran on the server — producing spuriousToolNotFoundEvententries in history.ShellToolon Anthropic was silently broken — Anthropic'sbashtool is client-side, butShellTool.register()was a no-op, so the executor raisedToolNotFoundErrorand the model hallucinated output.Solution
Each provider gets a small, isolated
events.pymodule that defines provider-specific subclasses ofBuiltinToolCallEvent/BuiltinToolResultEvent. The events:block,part,item) as a non-repr, non-defaultField, so it can be re-serialised verbatim on the next turn (no lossy reconstruction).from_block,from_item,from_executable_code,from_grounding,from_code_execution_result) that map provider-side names/IDs/inputs into the framework-levelBuiltinToolCallEvent/BuiltinToolResultEventshape.context.send()so they end up in the persistent stream / conversation history just like every other event.convert_messages()and converted back into the provider-specific message format on subsequent turns.The result: server-side tool activity becomes a first-class part of the conversation history and survives across
reply.ask()chains, multi-turn flows, and persistent stream backends.Changes by Provider
Anthropic
New:
autogen/beta/config/anthropic/events.pyAnthropicServerToolCallEvent(BuiltinToolCallEvent)— wrapsServerToolUseBlock.from_block()mapsweb_search/web_fetch/code_execution/bash_code_execution/text_editor_code_executionblock names to canonical AG2 tool names (WEB_SEARCH_TOOL_NAME,WEB_FETCH_TOOL_NAME,CODE_EXECUTION_TOOL_NAME).AnthropicServerToolResultEvent(BuiltinToolResultEvent)— wraps the union of*ToolResultBlocktypes viaAnthropicServerToolResultBlockType.from_block()dispatches byisinstanceto the canonical tool name.Modified:
autogen/beta/config/anthropic/anthropic_client.pyServerToolUseBlockand the result block types fromanthropic.types._emit_builtin_tool_events()— converts server-side blocks into typed events._process_response()and_process_stream()now handleserver_tool_useand*_tool_resultcontent blocks for both streaming and non-streaming paths.pause_turncontinuation loop now calls_emit_builtin_tool_events()on every intermediate response so the events show up in the stream (the streaming loop already routed through_process_stream()).Modified:
autogen/beta/config/anthropic/mappers.pyconvert_messages()now recognisesAnthropicServerToolCallEvent/AnthropicServerToolResultEventand re-serialises them viablock.model_dump(exclude_none=True, mode="json"). Both attach to the same assistant message, matching Anthropic's expected wire format.ShellToolSchemanow raisesUnsupportedToolError("shell", "anthropic")— Anthropic's bash is a client-side tool. Users should switch toLocalShellTool, which works with any provider via subprocess.Google Gemini
New:
autogen/beta/config/gemini/events.pyGeminiServerToolCallEvent(BuiltinToolCallEvent)— wrapstypes.Partand/ortypes.GroundingMetadata. Two factories:from_executable_code(part)— convertsPart.executable_code(code interpreter calls) into a tool call with{"code": ..., "language": ...}arguments.from_grounding(gm, name=...)— convertsGroundingMetadata(Google Search / URL Context calls) into a tool call with{"queries": [...]}arguments. Generates a synthetic UUID since Gemini doesn't return an id.GeminiServerToolResultEvent(BuiltinToolResultEvent)— same wrapping, withfrom_code_execution_result()andfrom_grounding()factories.Modified:
autogen/beta/config/gemini/gemini_client.py_process_response()and_process_stream()now walk every candidate's parts and emit:GeminiServerToolCallEventforexecutable_codeparts, immediately followed by a matchingGeminiServerToolResultEventwhen acode_execution_resultpart appears (linked viapending_code_call_id).grounding_metadata(deferred to the end of the stream so the final, fully-populated metadata is used).Modified:
autogen/beta/config/gemini/mappers.pyconvert_messages()now recognisesGeminiServerToolCallEvent/GeminiServerToolResultEvent. When the wrapped event carries a nativetypes.Part, it's appended back to the previousmodel-roleContentso the conversation history matches Gemini's expected shape.grounding_tool_name(gm)helper — choosesweb_searchvsweb_fetchbased on whetherweb_search_queriesis set.OpenAI Responses
Modified:
autogen/beta/config/openai/openai_responses_client.pyevents.py(OpenAIServerToolCallEvent,OpenAIServerToolResultEvent,OpenAIReasoningEvent) via theirfrom_item()factories — both for the non-streaming_process_response()and the streaming_process_stream().ImageGenerationCall.resultis still decoded intoBinaryResultfiles alongside the typed event, preserving backwards compatibility for image outputs.Modified:
autogen/beta/config/openai/events.pyOpenAIServerToolCallEvent.from_item()now handlesResponseFunctionWebSearch,ResponseCodeInterpreterToolCall, andImageGenerationCalluniformly.OpenAIServerToolResultEvent.from_item()mirrors the dispatch.OpenAIReasoningEventcarries the originalResponseReasoningItemso it can be replayed verbatim.Modified:
autogen/beta/config/openai/mappers.pyevents_to_responses_input()recognisesOpenAIReasoningEventandOpenAIServerToolCallEventand serialises them viamessage.item.model_dump(exclude_none=True, mode="json")— the Responses API accepts the same dict back as input.Builtin Tools
All builtin tools that proxy to a server-executed provider tool now register a no-op
sub_scopelistener in theirregister()method, scoped toBuiltinToolCallEvent.name == <TOOL_NAME>:code_execution.py,image_generation.py,mcp_server.py,memory.py,skills.py,web_fetch.py,shell.pyThis consumes the synthesised builtin-tool-call event so the executor's
_tool_not_foundfallback no longer fires for tools that ran on the server. The function is module-level (no nested closures inside hot paths) per project conventions.code_execution.pyalso adds aCodeExecutionVersionsTypeAliasand registers the newcode_execution_20260120Anthropic version alongside the existingcode_execution_20250825.shell.pydocstrings are rewritten to reflect that only OpenAI Responses API executes shell server-side; Anthropic users are pointed toLocalShellTool.