Skip to content

[Feature Request]: Migrate to Gemini_v2 #2353

@priyansh4320

Description

@priyansh4320

Is your feature request related to a problem? Please describe.

Implement Gemini V2 Client (ModelClientV2)

Why Implement Gemini V2 Client?

The current Gemini client (api_type: "google") uses the legacy ModelClient interface, which returns flattened ChatCompletion responses. This limits access to rich multimodal content and advanced Gemini 3 features.

Current Limitations (V1 Client)

  1. Lost Rich Content: Thinking tokens, multimodal content (images, audio, video), and structured data are flattened or lost
  2. No Thinking Support: Gemini 3's thinking/reasoning tokens not fully accessible
  3. Limited Multimodal: Images, audio, and video content not preserved as structured blocks
  4. No Type Safety: Responses are untyped, making it easy to introduce bugs
  5. Inconsistent API: Different response formats across OpenAI, Anthropic, Gemini, and Bedrock clients

Benefits of V2 Client

  1. Rich Content Preservation

    • All content types (text, images, audio, video, reasoning) preserved as typed ContentBlock objects
    • Thinking/reasoning tokens from Gemini 3 models fully accessible
    • No data loss during response transformation
  2. Better Developer Experiencen

    V1 - Manual parsing

    response = client.create(params)
    messages = client.message_retrieval(response)
    content = messages[0] if messages else ""

    V2 - Direct property access

    response = client.create(params)
    text = response.text # All text content
    reasoning = response.reasoning # Thinking tokens
    tool_calls = response.get_tool_calls() # Tool calls as objects
    images = response.get_content_by_type("image") # Image content blocks
    3. Type Safety

    • Pydantic models with automatic validation
    • Typed content blocks (TextContent, ImageContent, ReasoningContent, etc.)
    • IDE autocomplete and type checking support
  3. Forward Compatibility

    • Unknown content types automatically handled via GenericContent
    • No code changes needed when Gemini adds new features
    • Future-proof architecture
  4. Provider-Agnostic Format

    • Unified response format across OpenAI, Anthropic, Gemini, and Bedrock
    • Easier to switch between providers
    • Consistent developer experience
  5. Backward Compatible

    • Can use create_v1_compatible() to get V1 format when needed
    • Works seamlessly with existing V1 clients in group chats
    • No breaking changes to existing code

Use Cases Enabled

  • Thinking/Reasoning: Full access to Gemini 3's thinking tokens and reasoning capabilities
  • Multimodal Applications: Proper handling of images, audio, and video content
  • Structured Outputs: Better support for Pydantic models and JSON schemas
  • Tool Calling: Rich access to tool call information with thought signatures
  • Cost Tracking: Built-in cost calculation per response

Migration Path

Migration is simple - just change api_type:

V1

{"api_type": "google", "model": "gemini-2.5-pro", ...}

V2 (recommended)

{"api_type": "gemini_v2", "model": "gemini-2.5-pro", ...} # All other configuration parameters remain identical### Advanced Features (Gemini 3 Models)

{
"api_type": "gemini_v2",
"model": "gemini-3-pro-preview",
"thinking_level": "High", # Options: "High", "Medium", "Low", "Minimal"
"thinking_budget": 10000, # Token budget (0 = disabled, -1 = automatic)
"include_thoughts": True # Include thinking tokens in response
}## Implementation Status

Completed: Gemini V2 client implementation with full ModelClientV2 protocol support
Completed: Comprehensive unit tests
Completed: Integration tests
Completed: Documentation updates
Completed: Backward compatibility layer

Related

  • Bedrock V2 Client: Similar implementation pattern
  • OpenAI V2 Client: Reference implementation
  • ModelClientV2 Protocol: autogen/llm_clients/client_v2.py

Describe the solution you'd like

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions