fix: preserve role='assistant' in Azure streaming with include_usage#24354
Conversation
When Azure sends stream_options.include_usage=True, it emits an initial chunk with choices=[] (prompt_filter_results) before the first content chunk. Previously, LiteLLM inflated this empty-choices chunk with a default StreamingChoices, which consumed the sent_first_chunk flag and caused strip_role_from_delta to strip role from the real first chunk. Additionally, the first real chunk with role='assistant' and content='' was discarded by is_chunk_non_empty as "empty". This fix: - Forwards chunks with choices=[] faithfully (no inflated default) - Only marks sent_first_chunk for chunks with real choices - Treats chunks with role in delta as non-empty - Guards choices[0] access in __next__/__anext__ and stream_chunk_builder Fixes BerriAI#24221
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a role-stripping bug in Azure streaming when Key changes:
Minor concerns:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/streaming_handler.py | Four targeted changes: (1) model_response.choices = [] set before early return so empty-choices chunks are not inflated with a default StreamingChoices; (2) new is_chunk_non_empty clause treats a delta with a non-None role as non-empty when sent_first_chunk is False; (3) choices[0] access in __next__ and __anext__ guarded behind response.choices truth checks; (4) sent_first_chunk only marked for chunks that carry real choices. Changes are logically consistent and correct for the described bug. |
| litellm/litellm_core_utils/streaming_chunk_builder_utils.py | Adds next() scan over all chunks to find the first one with non-empty choices before extracting role. Correct for the Azure case, but the fallback to chunk (first_chunk, which itself has choices=[]) means a degenerate all-empty-choices stream would still raise IndexError on line 125. |
| litellm/main.py | Guards the TextChoices routing check in stream_chunk_builder so it is only reached when at least one chunk has non-empty choices. If all chunks have empty choices, None is returned from next() and the isinstance check is skipped, which silently falls through to build_base_response. |
| tests/test_litellm/litellm_core_utils/test_streaming_handler.py | Adds a well-structured regression test using ModelResponseListIterator (no network calls). Covers both sync and async iteration, verifies that the prompt-filter empty-choices chunk is forwarded faithfully, and asserts that at least one chunk contains role='assistant' in its delta. Satisfies the mock-only test requirement. |
Sequence Diagram
sequenceDiagram
participant Azure
participant chunk_creator
participant is_chunk_non_empty
participant __next__
Azure->>chunk_creator: Chunk 1 (choices=[], prompt_filter_results)
Note over chunk_creator: stream_options.include_usage=True<br/>model_response.choices = [] ← NEW
chunk_creator-->__next__: model_response (choices=[])
Note over __next__: sent_first_chunk stays False<br/>(guarded by response.choices) ← NEW
Azure->>chunk_creator: Chunk 2 (role='assistant', content='')
chunk_creator->>is_chunk_non_empty: Check if non-empty
Note over is_chunk_non_empty: NEW: role is not None AND not sent_first_chunk → True
is_chunk_non_empty-->>chunk_creator: True (non-empty)
chunk_creator->>chunk_creator: strip_role_from_delta → sent_first_chunk=True
chunk_creator-->__next__: model_response (role='assistant', content='')
Azure->>chunk_creator: Chunk 3 (content='Hello!')
chunk_creator-->__next__: model_response (content='Hello!')
Azure->>chunk_creator: Chunk 5 (choices=[], usage)
chunk_creator-->__next__: model_response (choices=[], usage=...)
Comments Outside Diff (1)
-
litellm/main.py, line 7373-7381 (link)Nonefallback silently skips TextChoices detectionWhen
first_chunk_with_choices is None(all chunks have emptychoices), the TextChoices routing check is silently skipped and execution falls through toprocessor.build_base_response(chunks).build_base_responsewill itself hit the same issue (no chunk with choices) and potentially raise. Adding a guard or an early return makes the failure explicit and easier to debug.
Reviews (1): Last reviewed commit: "fix: preserve role='assistant' in Azure ..." | Re-trigger Greptile
2132db4
into
BerriAI:litellm_staging_03_22_2026
Resolved conflicts: - streaming_handler.py: combined role check (PR #24354, Azure streaming) with reasoning_items check (new in main) — both are independent OR conditions in is_chunk_non_empty() - CI/CD: accepted main's versions throughout - Redis tests migrated to CircleCI (PR #25354): removed enable-redis from GH Actions workflows - E2E UI tests restructured (PR #25365): simplified CircleCI job - Coverage via Codecov added to all GH Actions unit test workflows - Deleted test-litellm-matrix.yml and test-proxy-e2e-azure-batches.yml (removed in main)
- streaming_iterator.py: adopted main's more defensive version of the tool-arg queueing check (.get() instead of [], isinstance guard) — same logic, same behavior, lower crash surface - model_prices_and_context_window.json + backup: combined staging's search_context_cost_per_query fields (PR #24372) with main's new supports_service_tier field — both are independent additions to the same Gemini model entries - test_streaming_handler.py: kept Azure streaming regression test (PR #24354) and added main's two new Gemini legacy vertex finish_reason normalization tests - test_gemini_batch_embeddings.py: kept staging's unsupported-params filtering tests (PR #24370) and added main's index/order test
Relevant issues
Fixes #24221
Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
🐛 Bug Fix
Changes
When
stream_options.include_usage=True, Azure sends an initial chunk withchoices=[](prompt_filter_results) before the first content chunk. LiteLLM inflated this empty-choices chunk with a defaultStreamingChoices, which:sent_first_chunkflagstrip_role_from_deltato striprolefrom the real first chunkrole='assistant'andcontent=''was also discarded byis_chunk_non_emptyas "empty"Net result: no chunk ever contained
role='assistant'.Fix (4 files):
streaming_handler.py-chunk_creator: Setmodel_response.choices = []for chunks without choices, forwarding them faithfully instead of inflating with a defaultStreamingChoicesstreaming_handler.py-is_chunk_non_empty: Treat chunks withrolein delta as non-empty (the first chunk withrole='assistant'andcontent=''is a valid OpenAI chunk)streaming_handler.py-__next__/__anext__: Guardchoices[0]access and only marksent_first_chunkfor chunks with real choicesmain.py+streaming_chunk_builder_utils.py: Guardchoices[0]access instream_chunk_builderfor chunks withchoices=[]