Skip to content

[Bug]: vertex_ai/gemini-3.1-flash-lite-preview returns "finish_reason": "stop" instead of "tool_calls" when using streaming #22900

@mvrodrig

Description

@mvrodrig

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Hi @krrishdholakia, @ishaan-jaff, @Chesars !

This is the same issue reported in #21041, #12249 and others: when using streaming with function tools, the final chunk ends with "finish_reason": "stop" instead of "tool_calls". This breaks agentic workflows that rely on detecting tool call completions.
This time the affected model is:

vertex_ai/gemini-3.1-flash-lite-preview

Steps to Reproduce

  1. Test with the following curl:
curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
    "stream": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What'\''s the weather like in Lima, Peru today? celsius"
      }
    ],
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Retrieve current weather for a specific location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and country, e.g., Lima, Peru"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "stream_options": {
      "include_usage": true
    }
  }'

Expected behavior
The final streaming chunk should return "finish_reason": "tool_calls" when the model decides to invoke a tool.

Actual behavior
The final chunk returns "finish_reason": "stop", even though the model is clearly attempting to use tool calls. This prevents agentic frameworks from detecting tool call completions and correctly invoking the functions.

Thanks!

Relevant log output

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}

data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}}

data: [DONE]

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

stable v1.81.12

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions