fix(vertex): streaming finish_reason='stop' instead of 'tool_calls' for gemini-3.1-flash-lite-preview#23895
Conversation
…or gemini-3.1-flash-lite-preview Models like gemini-3.1-flash-lite-preview send the final streaming chunk with empty content (text:"") alongside finishReason:"STOP", instead of omitting content entirely. The existing fix (PR BerriAI#21577) only handled chunks without content, so this case was missed. Now, after processing candidates, if tool_calls were seen in earlier chunks and a choice has finish_reason="stop", it is overridden to "tool_calls" to match the OpenAI spec. Fixes BerriAI#22900
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR extends the existing Gemini streaming Key changes:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py | Adds a post-_process_candidates override to remap finish_reason="stop" → "tool_calls" when has_seen_tool_calls is True, covering the case where Gemini sends an empty-content final chunk alongside finishReason="STOP". Logic is correct and well-scoped; minor robustness concern around non-final chunks that happen to carry finish_reason="stop". |
| model_prices_and_context_window.json | Adds gpt-4-0314 model entry, but the entry incorrectly sets supports_tool_choice: true and supports_prompt_caching: true — both capabilities postdate the 0314 snapshot and could cause API errors for users of that model. |
| tests/test_litellm/llms/vertex_ai/gemini/test_gemini_streaming_tool_call_finish_reason.py | Adds a well-structured mock unit test (test_streaming_tool_call_finish_reason_with_empty_content_in_final_chunk) that directly exercises the new code path using chunk_parser without any network calls, consistent with the project's test-isolation requirements. |
Sequence Diagram
sequenceDiagram
participant Gemini as Gemini API
participant MRI as ModelResponseIterator
participant PC as _process_candidates
participant Client as LiteLLM Client
Gemini->>MRI: Chunk 1: functionCall parts, no finishReason
MRI->>PC: _process_candidates(candidates)
PC-->>MRI: StreamingChoices(tool_calls=..., finish_reason=None)
Note over MRI: has_seen_tool_calls = True
MRI-->>Client: finish_reason=None, delta.tool_calls=[...]
Gemini->>MRI: Chunk 2 (new case): parts=[{text:""}], finishReason=STOP
MRI->>PC: _process_candidates(candidates)
Note over PC: "content" key present → choice IS created<br/>finish_reason mapped to "stop"
PC-->>MRI: StreamingChoices(finish_reason="stop", delta.content="")
Note over MRI: NEW BLOCK: has_seen_tool_calls=True<br/>AND finish_reason=="stop"<br/>→ override to "tool_calls"
MRI-->>Client: finish_reason="tool_calls" ✓
Note over Gemini,Client: Previous fix (pre-PR): Chunk 2 had NO content at all
Note over Gemini,Client: "content" key absent → _process_candidates skips → choices=[]<br/>→ existing block creates choice with finish_reason="tool_calls"
Comments Outside Diff (1)
-
model_prices_and_context_window.json, line 16865-16877 (link)Likely incorrect metadata for
gpt-4-0314The newly added
gpt-4-0314entry sets"supports_tool_choice": trueand"supports_prompt_caching": true, but both appear wrong for this model snapshot:- Function/tool calling was introduced by OpenAI with the
gpt-4-0613snapshot. The0314snapshot predates that feature entirely, so"supports_tool_choice": trueis incorrect. - Prompt caching for OpenAI was introduced for more recent models (e.g.
gpt-4o), not legacy snapshots like0314.
This looks like a copy-paste from
gpt-4-0613(or a similar entry) without verifying which capabilities the0314snapshot actually exposes. Shipping incorrect metadata will cause litellm to attempt to use those features against a model that doesn't support them, potentially resulting in API errors for users.(Please verify the actual capabilities of
gpt-4-0314against the OpenAI documentation before merging.) - Function/tool calling was introduced by OpenAI with the
Last reviewed commit: 0c28b47
| if self.has_seen_tool_calls: | ||
| for choice in model_response.choices: | ||
| if choice.finish_reason == "stop": | ||
| choice.finish_reason = "tool_calls" |
There was a problem hiding this comment.
Override applies to every post-tool-call chunk, not just the final one
The new block overrides finish_reason == "stop" → "tool_calls" for every chunk processed after has_seen_tool_calls becomes True, not only for the final chunk.
In practice today this is safe because finish_reason is None on intermediate streaming chunks. However, it is an implicit assumption that could break if Gemini ever sends a non-terminal finishReason: "STOP" (e.g. for safety reasons) after a tool call chunk in the same streaming response. Adding a guard that the choice also has no meaningful delta content would make the intent explicit and more robust:
if self.has_seen_tool_calls:
for choice in model_response.choices:
if (
choice.finish_reason == "stop"
and getattr(getattr(choice, "delta", None), "content", None) in (None, "")
and not getattr(getattr(choice, "delta", None), "tool_calls", None)
):
choice.finish_reason = "tool_calls"1f5a67a
into
BerriAI:litellm_oss_staging_03_17_2026
Relevant issues
Fixes #22900
Pre-Submission checklist
tests/test_litellm/directorymake test-unit@greptileaiType
🐛 Bug Fix
Changes
Models like
gemini-3.1-flash-lite-previewsend the final streaming chunk with empty content (parts: [{text: ""}]) alongsidefinishReason: "STOP", instead of omittingcontententirely. The existing fix (#21577) only handled chunks without content, so this case was missed.After
_process_candidatesruns, ifhas_seen_tool_callsisTrueand any choice hasfinish_reason="stop", override it to"tool_calls".