fix: preserve role='assistant' in Azure streaming with include_usage by Chesars · Pull Request #24354 · BerriAI/litellm

Chesars · 2026-03-22T13:13:16Z

Relevant issues

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

When stream_options.include_usage=True, Azure sends an initial chunk with choices=[] (prompt_filter_results) before the first content chunk. LiteLLM inflated this empty-choices chunk with a default StreamingChoices, which:

Consumed the sent_first_chunk flag
Caused strip_role_from_delta to strip role from the real first chunk
The real first chunk with role='assistant' and content='' was also discarded by is_chunk_non_empty as "empty"

Net result: no chunk ever contained role='assistant'.

Fix (4 files):

streaming_handler.py - chunk_creator: Set model_response.choices = [] for chunks without choices, forwarding them faithfully instead of inflating with a default StreamingChoices
streaming_handler.py - is_chunk_non_empty: Treat chunks with role in delta as non-empty (the first chunk with role='assistant' and content='' is a valid OpenAI chunk)
streaming_handler.py - __next__/__anext__: Guard choices[0] access and only mark sent_first_chunk for chunks with real choices
main.py + streaming_chunk_builder_utils.py: Guard choices[0] access in stream_chunk_builder for chunks with choices=[]

When Azure sends stream_options.include_usage=True, it emits an initial chunk with choices=[] (prompt_filter_results) before the first content chunk. Previously, LiteLLM inflated this empty-choices chunk with a default StreamingChoices, which consumed the sent_first_chunk flag and caused strip_role_from_delta to strip role from the real first chunk. Additionally, the first real chunk with role='assistant' and content='' was discarded by is_chunk_non_empty as "empty". This fix: - Forwards chunks with choices=[] faithfully (no inflated default) - Only marks sent_first_chunk for chunks with real choices - Treats chunks with role in delta as non-empty - Guards choices[0] access in __next__/__anext__ and stream_chunk_builder Fixes BerriAI#24221

vercel · 2026-03-22T13:13:21Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 22, 2026 1:15pm

codspeed-hq · 2026-03-22T13:15:32Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing Chesars:fix/azure-streaming-role-include-usage (ce16db1) with main (c89496f)}

greptile-apps · 2026-03-22T13:18:03Z

Greptile Summary

This PR fixes a role-stripping bug in Azure streaming when stream_options.include_usage=True (issue #24221). Azure sends an initial choices=[] chunk (prompt_filter_results) before any content. LiteLLM was inflating this empty-choices chunk with a default StreamingChoices, which consumed the sent_first_chunk flag and caused strip_role_from_delta to strip role='assistant' from the real first content chunk.

Key changes:

streaming_handler.py – explicitly sets model_response.choices = [] before the early return for no-choices chunks, preventing inflation; guards choices[0] access in __next__ / __anext__; ties sent_first_chunk marking to chunks that actually carry choices; adds a new is_chunk_non_empty clause so a chunk with role in its delta is never discarded as empty.
streaming_chunk_builder_utils.py / main.py – both now scan forward to find the first chunk with non-empty choices before accessing choices[0], avoiding IndexError when the very first chunk has choices=[].
A new mock-only regression test covers both sync and async paths end-to-end.

Minor concerns:

In build_base_response, if all chunks have empty choices (degenerate case), the next() fallback is chunk (i.e., self.first_chunk, which also has choices=[]), and the immediately following choices[0] access would still raise IndexError. A None default with an explicit guard would be more defensive.
In main.py, when first_chunk_with_choices is None (all empty choices), the code silently skips TextChoices routing and falls through to build_base_response, which can then hit the same issue. An early return None when no chunk has choices would make this path explicit.

Confidence Score: 4/5

Safe to merge — the fix is well-targeted and the regression test is solid; two minor defensive-coding gaps remain.
The core fix is correct: inflated empty-choices chunks are the root cause, and all four touch-points in the streaming pipeline are updated consistently. The mock-only test validates both sync and async flows and directly exercises the reported failure. Score is 4 rather than 5 because build_base_response still has an unsafe fallback that could raise IndexError in an all-empty-choices edge case, and stream_chunk_builder in main.py silently falls through when no chunk has choices rather than returning None early.
litellm/litellm_core_utils/streaming_chunk_builder_utils.py (unsafe chunk fallback in build_base_response) and litellm/main.py (silent fall-through when all chunks have empty choices).

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/streaming_handler.py	Four targeted changes: (1) `model_response.choices = []` set before early return so empty-choices chunks are not inflated with a default `StreamingChoices`; (2) new `is_chunk_non_empty` clause treats a delta with a non-None `role` as non-empty when `sent_first_chunk` is False; (3) `choices[0]` access in `__next__` and `__anext__` guarded behind `response.choices` truth checks; (4) `sent_first_chunk` only marked for chunks that carry real choices. Changes are logically consistent and correct for the described bug.
litellm/litellm_core_utils/streaming_chunk_builder_utils.py	Adds `next()` scan over all chunks to find the first one with non-empty choices before extracting `role`. Correct for the Azure case, but the fallback to `chunk` (first_chunk, which itself has `choices=[]`) means a degenerate all-empty-choices stream would still raise `IndexError` on line 125.
litellm/main.py	Guards the `TextChoices` routing check in `stream_chunk_builder` so it is only reached when at least one chunk has non-empty `choices`. If all chunks have empty choices, `None` is returned from `next()` and the `isinstance` check is skipped, which silently falls through to `build_base_response`.
tests/test_litellm/litellm_core_utils/test_streaming_handler.py	Adds a well-structured regression test using `ModelResponseListIterator` (no network calls). Covers both sync and async iteration, verifies that the prompt-filter empty-choices chunk is forwarded faithfully, and asserts that at least one chunk contains `role='assistant'` in its delta. Satisfies the mock-only test requirement.

Sequence Diagram

sequenceDiagram
    participant Azure
    participant chunk_creator
    participant is_chunk_non_empty
    participant __next__

    Azure->>chunk_creator: Chunk 1 (choices=[], prompt_filter_results)
    Note over chunk_creator: stream_options.include_usage=True<br/>model_response.choices = [] ← NEW
    chunk_creator-->__next__: model_response (choices=[])
    Note over __next__: sent_first_chunk stays False<br/>(guarded by response.choices) ← NEW

    Azure->>chunk_creator: Chunk 2 (role='assistant', content='')
    chunk_creator->>is_chunk_non_empty: Check if non-empty
    Note over is_chunk_non_empty: NEW: role is not None AND not sent_first_chunk → True
    is_chunk_non_empty-->>chunk_creator: True (non-empty)
    chunk_creator->>chunk_creator: strip_role_from_delta → sent_first_chunk=True
    chunk_creator-->__next__: model_response (role='assistant', content='')

    Azure->>chunk_creator: Chunk 3 (content='Hello!')
    chunk_creator-->__next__: model_response (content='Hello!')

    Azure->>chunk_creator: Chunk 5 (choices=[], usage)
    chunk_creator-->__next__: model_response (choices=[], usage=...)

Comments Outside Diff (1)

litellm/main.py, line 7373-7381 (link)

None fallback silently skips TextChoices detection

When first_chunk_with_choices is None (all chunks have empty choices), the TextChoices routing check is silently skipped and execution falls through to processor.build_base_response(chunks). build_base_response will itself hit the same issue (no chunk with choices) and potentially raise. Adding a guard or an early return makes the failure explicit and easier to debug.

_{Reviews (1): Last reviewed commit: "fix: preserve role='assistant' in Azure ..." | Re-trigger Greptile}

Resolved conflicts: - streaming_handler.py: combined role check (PR #24354, Azure streaming) with reasoning_items check (new in main) — both are independent OR conditions in is_chunk_non_empty() - CI/CD: accepted main's versions throughout - Redis tests migrated to CircleCI (PR #25354): removed enable-redis from GH Actions workflows - E2E UI tests restructured (PR #25365): simplified CircleCI job - Coverage via Codecov added to all GH Actions unit test workflows - Deleted test-litellm-matrix.yml and test-proxy-e2e-azure-batches.yml (removed in main)

- streaming_iterator.py: adopted main's more defensive version of the tool-arg queueing check (.get() instead of [], isinstance guard) — same logic, same behavior, lower crash surface - model_prices_and_context_window.json + backup: combined staging's search_context_cost_per_query fields (PR #24372) with main's new supports_service_tier field — both are independent additions to the same Gemini model entries - test_streaming_handler.py: kept Azure streaming regression test (PR #24354) and added main's two new Gemini legacy vertex finish_reason normalization tests - test_gemini_batch_embeddings.py: kept staging's unsupported-params filtering tests (PR #24370) and added main's index/order test

vercel Bot deployed to Preview March 22, 2026 13:15 View deployment

greptile-apps Bot reviewed Mar 22, 2026

View reviewed changes

Comment thread litellm/litellm_core_utils/streaming_chunk_builder_utils.py

Chesars changed the base branch from main to litellm_staging_03_22_2026 March 22, 2026 14:15

Chesars merged commit 2132db4 into BerriAI:litellm_staging_03_22_2026 Mar 22, 2026
38 of 39 checks passed

ishaan-berri mentioned this pull request Apr 13, 2026

fix: preserve role='assistant' in Azure streaming with include_usage #25638

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve role='assistant' in Azure streaming with include_usage#24354

fix: preserve role='assistant' in Azure streaming with include_usage#24354
Chesars merged 1 commit intoBerriAI:litellm_staging_03_22_2026from
Chesars:fix/azure-streaming-role-include-usage

Chesars commented Mar 22, 2026

Uh oh!

vercel Bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Mar 22, 2026

Uh oh!

greptile-apps Bot commented Mar 22, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Chesars commented Mar 22, 2026

Relevant issues

Pre-Submission checklist

Type

Changes

Uh oh!

vercel Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Mar 22, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps Bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Mar 22, 2026 •

edited

Loading

greptile-apps Bot commented Mar 22, 2026 •

edited

Loading