Skip to content

fix(vertex): streaming finish_reason='stop' instead of 'tool_calls' for gemini-3.1-flash-lite-preview#23895

Merged
Chesars merged 1 commit intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/streaming-tool-call-finish-reason-empty-content
Mar 17, 2026
Merged

fix(vertex): streaming finish_reason='stop' instead of 'tool_calls' for gemini-3.1-flash-lite-preview#23895
Chesars merged 1 commit intoBerriAI:litellm_oss_staging_03_17_2026from
Chesars:fix/streaming-tool-call-finish-reason-empty-content

Conversation

@Chesars
Copy link
Copy Markdown
Collaborator

@Chesars Chesars commented Mar 17, 2026

Relevant issues

Fixes #22900

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai

Type

🐛 Bug Fix

Changes

Models like gemini-3.1-flash-lite-preview send the final streaming chunk with empty content (parts: [{text: ""}]) alongside finishReason: "STOP", instead of omitting content entirely. The existing fix (#21577) only handled chunks without content, so this case was missed.

After _process_candidates runs, if has_seen_tool_calls is True and any choice has finish_reason="stop", override it to "tool_calls".

…or gemini-3.1-flash-lite-preview

Models like gemini-3.1-flash-lite-preview send the final streaming chunk
with empty content (text:"") alongside finishReason:"STOP", instead of
omitting content entirely. The existing fix (PR BerriAI#21577) only handled
chunks without content, so this case was missed.

Now, after processing candidates, if tool_calls were seen in earlier
chunks and a choice has finish_reason="stop", it is overridden to
"tool_calls" to match the OpenAI spec.

Fixes BerriAI#22900
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 17, 2026 8:51pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR extends the existing Gemini streaming finish_reason fix to handle a second model behaviour: models like gemini-3.1-flash-lite-preview that send the final streaming chunk with empty content (parts: [{text: ""}]) together with finishReason: "STOP", instead of omitting content entirely. It also adds the gpt-4-0314 model to model_prices_and_context_window.json.

Key changes:

  • In ModelResponseIterator.chunk_parser, a new block (lines 3009-3012) checks, after _process_candidates has run, whether has_seen_tool_calls is True and overrides any finish_reason == "stop" to "tool_calls". This sits alongside the existing fix (lines 2981-3002) that handles the completely-content-less final chunk case.
  • A focused mock unit test is added to test_gemini_streaming_tool_call_finish_reason.py that directly exercises the new path via chunk_parser with no network calls.
  • gpt-4-0314 is added to model_prices_and_context_window.json, but the entry carries supports_tool_choice: true and supports_prompt_caching: true — both of which are incorrect for this pre-function-calling legacy snapshot and should be removed before merging.

Confidence Score: 3/5

  • The streaming fix is correct, but the gpt-4-0314 JSON entry ships with incorrect capability flags that could cause API errors.
  • The core streaming logic change is minimal, well-understood, and backed by a new mock test. The risk is the unrelated model_prices_and_context_window.json addition for gpt-4-0314, which incorrectly marks the model as supporting tool choice and prompt caching — features that didn't exist in OpenAI's API when the 0314 snapshot was published. That metadata error lowers confidence.
  • model_prices_and_context_window.json — the gpt-4-0314 entry needs capability flags verified and likely corrected before merge.

Important Files Changed

Filename Overview
litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py Adds a post-_process_candidates override to remap finish_reason="stop""tool_calls" when has_seen_tool_calls is True, covering the case where Gemini sends an empty-content final chunk alongside finishReason="STOP". Logic is correct and well-scoped; minor robustness concern around non-final chunks that happen to carry finish_reason="stop".
model_prices_and_context_window.json Adds gpt-4-0314 model entry, but the entry incorrectly sets supports_tool_choice: true and supports_prompt_caching: true — both capabilities postdate the 0314 snapshot and could cause API errors for users of that model.
tests/test_litellm/llms/vertex_ai/gemini/test_gemini_streaming_tool_call_finish_reason.py Adds a well-structured mock unit test (test_streaming_tool_call_finish_reason_with_empty_content_in_final_chunk) that directly exercises the new code path using chunk_parser without any network calls, consistent with the project's test-isolation requirements.

Sequence Diagram

sequenceDiagram
    participant Gemini as Gemini API
    participant MRI as ModelResponseIterator
    participant PC as _process_candidates
    participant Client as LiteLLM Client

    Gemini->>MRI: Chunk 1: functionCall parts, no finishReason
    MRI->>PC: _process_candidates(candidates)
    PC-->>MRI: StreamingChoices(tool_calls=..., finish_reason=None)
    Note over MRI: has_seen_tool_calls = True
    MRI-->>Client: finish_reason=None, delta.tool_calls=[...]

    Gemini->>MRI: Chunk 2 (new case): parts=[{text:""}], finishReason=STOP
    MRI->>PC: _process_candidates(candidates)
    Note over PC: "content" key present → choice IS created<br/>finish_reason mapped to "stop"
    PC-->>MRI: StreamingChoices(finish_reason="stop", delta.content="")
    Note over MRI: NEW BLOCK: has_seen_tool_calls=True<br/>AND finish_reason=="stop"<br/>→ override to "tool_calls"
    MRI-->>Client: finish_reason="tool_calls" ✓

    Note over Gemini,Client: Previous fix (pre-PR): Chunk 2 had NO content at all
    Note over Gemini,Client: "content" key absent → _process_candidates skips → choices=[]<br/>→ existing block creates choice with finish_reason="tool_calls"
Loading

Comments Outside Diff (1)

  1. model_prices_and_context_window.json, line 16865-16877 (link)

    P1 Likely incorrect metadata for gpt-4-0314

    The newly added gpt-4-0314 entry sets "supports_tool_choice": true and "supports_prompt_caching": true, but both appear wrong for this model snapshot:

    • Function/tool calling was introduced by OpenAI with the gpt-4-0613 snapshot. The 0314 snapshot predates that feature entirely, so "supports_tool_choice": true is incorrect.
    • Prompt caching for OpenAI was introduced for more recent models (e.g. gpt-4o), not legacy snapshots like 0314.

    This looks like a copy-paste from gpt-4-0613 (or a similar entry) without verifying which capabilities the 0314 snapshot actually exposes. Shipping incorrect metadata will cause litellm to attempt to use those features against a model that doesn't support them, potentially resulting in API errors for users.

    (Please verify the actual capabilities of gpt-4-0314 against the OpenAI documentation before merging.)

Last reviewed commit: 0c28b47

Comment on lines +3009 to +3012
if self.has_seen_tool_calls:
for choice in model_response.choices:
if choice.finish_reason == "stop":
choice.finish_reason = "tool_calls"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Override applies to every post-tool-call chunk, not just the final one

The new block overrides finish_reason == "stop""tool_calls" for every chunk processed after has_seen_tool_calls becomes True, not only for the final chunk.

In practice today this is safe because finish_reason is None on intermediate streaming chunks. However, it is an implicit assumption that could break if Gemini ever sends a non-terminal finishReason: "STOP" (e.g. for safety reasons) after a tool call chunk in the same streaming response. Adding a guard that the choice also has no meaningful delta content would make the intent explicit and more robust:

if self.has_seen_tool_calls:
    for choice in model_response.choices:
        if (
            choice.finish_reason == "stop"
            and getattr(getattr(choice, "delta", None), "content", None) in (None, "")
            and not getattr(getattr(choice, "delta", None), "tool_calls", None)
        ):
            choice.finish_reason = "tool_calls"

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 17, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing Chesars:fix/streaming-tool-call-finish-reason-empty-content (0c28b47) with litellm_oss_staging_03_17_2026 (b0db75d)1

Open in CodSpeed

Footnotes

  1. No successful run was found on litellm_oss_staging_03_17_2026 (278c9ba) during the generation of this report, so c693800 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@Chesars Chesars merged commit 1f5a67a into BerriAI:litellm_oss_staging_03_17_2026 Mar 17, 2026
38 of 39 checks passed
@Chesars Chesars deleted the fix/streaming-tool-call-finish-reason-empty-content branch March 17, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: vertex_ai/gemini-3.1-flash-lite-preview returns "finish_reason": "stop" instead of "tool_calls" when using streaming

1 participant