Check for existing issues
What happened?
Hi @krrishdholakia, @ishaan-jaff, @Chesars !
This is the same issue reported in #21041, #12249 and others: when using streaming with function tools, the final chunk ends with "finish_reason": "stop" instead of "tool_calls". This breaks agentic workflows that rely on detecting tool call completions.
This time the affected model is:
vertex_ai/gemini-3.1-flash-lite-preview
Steps to Reproduce
- Test with the following curl:
curl --request POST \
--url http://localhost:4000/v1/chat/completions \
--header 'Content-Type: application/json' \
--data '{
"stream": true,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What'\''s the weather like in Lima, Peru today? celsius"
}
],
"model": "vertex_ai/gemini-3.1-flash-lite-preview",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieve current weather for a specific location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, e.g., Lima, Peru"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
],
"stream_options": {
"include_usage": true
}
}'
Expected behavior
The final streaming chunk should return "finish_reason": "tool_calls" when the model decides to invoke a tool.
Actual behavior
The final chunk returns "finish_reason": "stop", even though the model is clearly attempting to use tool calls. This prevents agentic frameworks from detecting tool call completions and correctly invoking the functions.
Thanks!
Relevant log output
data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]}
data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}}
data: [DONE]
What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
stable v1.81.12
Twitter / LinkedIn details
No response
Check for existing issues
What happened?
Hi @krrishdholakia, @ishaan-jaff, @Chesars !
This is the same issue reported in #21041, #12249 and others: when using streaming with function tools, the final chunk ends with
"finish_reason": "stop"instead of"tool_calls". This breaks agentic workflows that rely on detecting tool call completions.This time the affected model is:
vertex_ai/gemini-3.1-flash-lite-previewSteps to Reproduce
Expected behavior
The final streaming chunk should return "finish_reason": "tool_calls" when the model decides to invoke a tool.
Actual behavior
The final chunk returns
"finish_reason": "stop", even though the model is clearly attempting to use tool calls. This prevents agentic frameworks from detecting tool call completions and correctly invoking the functions.Thanks!
Relevant log output
data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"finish_reason":"stop","index":0,"delta":{}}]} data: {"id":"wHKpaYD4MrGAitYPuLXfuQw","created":1772712643,"model":"vertex_ai/gemini-3.1-flash-lite-preview","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":142,"prompt_tokens":66,"total_tokens":208,"completion_tokens_details":{"reasoning_tokens":117,"text_tokens":25},"prompt_tokens_details":{"text_tokens":66}}} data: [DONE]What part of LiteLLM is this about?
Proxy
What LiteLLM version are you on ?
stable v1.81.12
Twitter / LinkedIn details
No response