fix(gemini): correct streaming finish_reason for tool calls#21577
Conversation
Gemini returns finishReason="STOP" even when tool calls are present, and sends tool_calls and finishReason in separate streaming chunks. The ModelResponseIterator now tracks tool_calls across chunks and correctly maps finish_reason to "tool_calls" per the OpenAI spec. Fixes BerriAI#21041
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a streaming bug where Gemini's Root cause: Gemini sends tool calls and Solution:
Tests: 4 new comprehensive mock-based tests covering all scenarios (tool calls, text-only, multiple tools, content filter). All changes are provider-specific and properly isolated within Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py | Adds has_seen_tool_calls flag and final chunk handling logic to correctly map finish_reason="tool_calls" when tool calls arrive in separate chunks from finishReason |
| tests/test_litellm/llms/vertex_ai/gemini/test_gemini_streaming_tool_call_finish_reason.py | New test file with 4 comprehensive mock-based tests covering tool call streaming, text-only streaming, multiple tool calls, and content filtering scenarios |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Chunk arrives in ModelResponseIterator.chunk_parser] --> B{Process candidates via _process_candidates}
B --> C{Does candidate have 'content' field?}
C -->|Yes| D[Extract tool_calls, text, etc]
C -->|No| E[Skip candidate - returns empty choices]
D --> F{Does delta have tool_calls?}
F -->|Yes| G[Set has_seen_tool_calls = True]
F -->|No| H[Keep flag unchanged]
G --> I[_create_streaming_choice sets finish_reason via _check_finish_reason]
H --> I
E --> J{Is choices empty AND candidates exist?}
J -->|Yes| K{Check has_seen_tool_calls flag}
J -->|No| M[Return response]
K -->|True| L[Create choice with finish_reason='tool_calls']
K -->|False| N[Map finishReason normally via _check_finish_reason]
L --> M
N --> M
Last reviewed commit: 5596545
b2c7d2e
into
BerriAI:litellm_oss_staging_03_03_2026
…or gemini-3.1-flash-lite-preview Models like gemini-3.1-flash-lite-preview send the final streaming chunk with empty content (text:"") alongside finishReason:"STOP", instead of omitting content entirely. The existing fix (PR BerriAI#21577) only handled chunks without content, so this case was missed. Now, after processing candidates, if tool_calls were seen in earlier chunks and a choice has finish_reason="stop", it is overridden to "tool_calls" to match the OpenAI spec. Fixes BerriAI#22900
Relevant issues
Fixes #21041
Pre-Submission checklist
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
🐛 Bug Fix
Changes
Gemini always returns
finishReason: "STOP"regardless of whether the response contains tool calls. Per the OpenAI spec,finish_reasonmust be"tool_calls"when the model called a tool.LiteLLM already maps this correctly for non-streaming responses (PR #10485), but in streaming mode the
finishReasonarrives in a separate chunk withoutcontent, which_process_candidates()skips entirely — so the mapping never runs.Fix (in
ModelResponseIterator, Gemini handler layer):self.has_seen_tool_callsflag that gets set when any streaming choice containsdelta.tool_calls_process_candidatesproduces empty choices but a candidate hasfinishReason, create aStreamingChoiceswith the correctfinish_reason(respecting prior tool_calls)Tests (4 new, all mock-only, no network calls):
finish_reason="tool_calls"✓finish_reason="stop"✓finish_reason="tool_calls"✓finish_reason="content_filter"preserved ✓