Is your feature request related to a problem? Please describe.
Implement Gemini V2 Client (ModelClientV2)
Why Implement Gemini V2 Client?
The current Gemini client (api_type: "google") uses the legacy ModelClient interface, which returns flattened ChatCompletion responses. This limits access to rich multimodal content and advanced Gemini 3 features.
Current Limitations (V1 Client)
- Lost Rich Content: Thinking tokens, multimodal content (images, audio, video), and structured data are flattened or lost
- No Thinking Support: Gemini 3's thinking/reasoning tokens not fully accessible
- Limited Multimodal: Images, audio, and video content not preserved as structured blocks
- No Type Safety: Responses are untyped, making it easy to introduce bugs
- Inconsistent API: Different response formats across OpenAI, Anthropic, Gemini, and Bedrock clients
Benefits of V2 Client
-
Rich Content Preservation
- All content types (text, images, audio, video, reasoning) preserved as typed
ContentBlock objects
- Thinking/reasoning tokens from Gemini 3 models fully accessible
- No data loss during response transformation
-
Better Developer Experiencen
V1 - Manual parsing
response = client.create(params)
messages = client.message_retrieval(response)
content = messages[0] if messages else ""
V2 - Direct property access
response = client.create(params)
text = response.text # All text content
reasoning = response.reasoning # Thinking tokens
tool_calls = response.get_tool_calls() # Tool calls as objects
images = response.get_content_by_type("image") # Image content blocks
3. Type Safety
- Pydantic models with automatic validation
- Typed content blocks (
TextContent, ImageContent, ReasoningContent, etc.)
- IDE autocomplete and type checking support
-
Forward Compatibility
- Unknown content types automatically handled via
GenericContent
- No code changes needed when Gemini adds new features
- Future-proof architecture
-
Provider-Agnostic Format
- Unified response format across OpenAI, Anthropic, Gemini, and Bedrock
- Easier to switch between providers
- Consistent developer experience
-
Backward Compatible
- Can use
create_v1_compatible() to get V1 format when needed
- Works seamlessly with existing V1 clients in group chats
- No breaking changes to existing code
Use Cases Enabled
- Thinking/Reasoning: Full access to Gemini 3's thinking tokens and reasoning capabilities
- Multimodal Applications: Proper handling of images, audio, and video content
- Structured Outputs: Better support for Pydantic models and JSON schemas
- Tool Calling: Rich access to tool call information with thought signatures
- Cost Tracking: Built-in cost calculation per response
Migration Path
Migration is simple - just change api_type:
V1
{"api_type": "google", "model": "gemini-2.5-pro", ...}
V2 (recommended)
{"api_type": "gemini_v2", "model": "gemini-2.5-pro", ...} # All other configuration parameters remain identical### Advanced Features (Gemini 3 Models)
{
"api_type": "gemini_v2",
"model": "gemini-3-pro-preview",
"thinking_level": "High", # Options: "High", "Medium", "Low", "Minimal"
"thinking_budget": 10000, # Token budget (0 = disabled, -1 = automatic)
"include_thoughts": True # Include thinking tokens in response
}## Implementation Status
✅ Completed: Gemini V2 client implementation with full ModelClientV2 protocol support
✅ Completed: Comprehensive unit tests
✅ Completed: Integration tests
✅ Completed: Documentation updates
✅ Completed: Backward compatibility layer
Related
- Bedrock V2 Client: Similar implementation pattern
- OpenAI V2 Client: Reference implementation
- ModelClientV2 Protocol:
autogen/llm_clients/client_v2.py
Describe the solution you'd like
No response
Additional context
No response
Is your feature request related to a problem? Please describe.
Implement Gemini V2 Client (ModelClientV2)
Why Implement Gemini V2 Client?
The current Gemini client (
api_type: "google") uses the legacy ModelClient interface, which returns flattenedChatCompletionresponses. This limits access to rich multimodal content and advanced Gemini 3 features.Current Limitations (V1 Client)
Benefits of V2 Client
Rich Content Preservation
ContentBlockobjectsBetter Developer Experiencen
V1 - Manual parsing
response = client.create(params)
messages = client.message_retrieval(response)
content = messages[0] if messages else ""
V2 - Direct property access
response = client.create(params)
text = response.text # All text content
reasoning = response.reasoning # Thinking tokens
tool_calls = response.get_tool_calls() # Tool calls as objects
images = response.get_content_by_type("image") # Image content blocks
3. Type Safety
TextContent,ImageContent,ReasoningContent, etc.)Forward Compatibility
GenericContentProvider-Agnostic Format
Backward Compatible
create_v1_compatible()to get V1 format when neededUse Cases Enabled
Migration Path
Migration is simple - just change
api_type:V1
{"api_type": "google", "model": "gemini-2.5-pro", ...}
V2 (recommended)
{"api_type": "gemini_v2", "model": "gemini-2.5-pro", ...} # All other configuration parameters remain identical### Advanced Features (Gemini 3 Models)
{
"api_type": "gemini_v2",
"model": "gemini-3-pro-preview",
"thinking_level": "High", # Options: "High", "Medium", "Low", "Minimal"
"thinking_budget": 10000, # Token budget (0 = disabled, -1 = automatic)
"include_thoughts": True # Include thinking tokens in response
}## Implementation Status
✅ Completed: Gemini V2 client implementation with full ModelClientV2 protocol support
✅ Completed: Comprehensive unit tests
✅ Completed: Integration tests
✅ Completed: Documentation updates
✅ Completed: Backward compatibility layer
Related
autogen/llm_clients/client_v2.pyDescribe the solution you'd like
No response
Additional context
No response