Add file content streaming support for OpenAI and related utilities by harish876 · Pull Request #25450 · BerriAI/litellm

harish876 · 2026-04-09T22:24:55Z

Introduced file_content_streaming functions in litellm/files/main.py to handle asynchronous and synchronous file content streaming.
Added FileContentStreamingResponse class in litellm/files/streaming.py to manage streaming responses with logging capabilities.
Updated OpenAI API integration in litellm/llms/openai/openai.py to support new streaming methods.
Enhanced file content retrieval in litellm/proxy/openai_files_endpoints/files_endpoints.py to route requests for streaming.
Added unit tests for the new streaming functionality in tests/test_litellm/llms/openai/test_openai_file_content_streaming.py and tests/test_litellm/proxy/openai_files_endpoint/test_files_endpoint.py.
Refactored type hints and imports for better clarity and organization across modified files.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Summary

This PR adds streaming response enabled for openai file content responses.

Issue

The v1 files content path buffers the full payload in memory before returning it. Under load, that causes elevated RSS and can contribute to OOM behavior when many large file requests run concurrently.

What was implemented

Added a prototype files content endpoint that returns a streaming response for only openai.
Kept the implementation openai-only for now so we can validate the approach before generalizing it.
Added test cases to assert only openai is called through the streaming logic.

Load Test Memory Results (1000 concurrent requests, 65 MB payload)

Metric	Before Fix	After Fix	Delta
Average Memory Usage	3.707 GiB (92.67%)	2.56 GiB (64.0%)	↓ 1.147 GiB
Peak Memory Usage	3.893 GiB (97.32%)	2.621 GiB (65.53%)	↓ 1.272 GiB

Next Steps

The current file content retrieval path goes through afile_content. To make the response object a async iterable needs to be evaluated. The current solution is a prototype, which provider memory savings when a streaming based approach is used

Why this helps

Streaming avoids holding the full file payload in memory per request, which materially lowers peak RSS under concurrent load and reduces the risk of OOM events.

- Introduced `afile_content_streaming` and `file_content_streaming` functions in `litellm/files/main.py` to handle asynchronous and synchronous file content streaming. - Added `FileContentStreamingResponse` class in `litellm/files/streaming.py` to manage streaming responses with logging capabilities. - Updated OpenAI API integration in `litellm/llms/openai/openai.py` to support new streaming methods. - Enhanced file content retrieval in `litellm/proxy/openai_files_endpoints/files_endpoints.py` to route requests for streaming. - Added unit tests for the new streaming functionality in `tests/test_litellm/llms/openai/test_openai_file_content_streaming.py` and `tests/test_litellm/proxy/openai_files_endpoint/test_files_endpoint.py`. - Refactored type hints and imports for better clarity and organization across modified files.

vercel · 2026-04-09T22:25:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 11, 2026 6:58pm

CLAassistant · 2026-04-09T22:25:04Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codspeed-hq · 2026-04-09T22:26:38Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing harish876:oom-file-fix-openai (69eb345) with main (eabb6a3)}

codecov · 2026-04-09T22:28:20Z

Codecov Report

❌ Patch coverage is 73.64341% with 68 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/files/streaming.py	59.25%	55 Missing ⚠️
litellm/llms/openai/openai.py	82.50%	7 Missing ⚠️
litellm/files/main.py	86.20%	4 Missing ⚠️
..._files_endpoints/file_content_streaming_handler.py	95.23%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-04-09T22:41:04Z

Greptile Summary

This PR adds file content streaming support to reduce peak memory usage (from ~3.9 GiB to ~2.6 GiB at 1 000 concurrent requests with a 65 MB payload) by avoiding full in-memory buffering. Key additions include FileContentStreamingResponse for SDK-level logging callbacks, FileContentStreamingHandler in the proxy layer for routing and proxy-level logging, and corresponding unit/integration tests.

The previously-flagged blocking concerns from earlier rounds (wrong provider in routing, dead client variable, FastAPI import outside proxy, streaming gate bypassing non-OpenAI providers) have all been addressed in this revision.

Confidence Score: 5/5

Safe to merge — all prior blocking concerns are resolved; remaining findings are style-level only.

The previously-flagged P1 issues (wrong provider forwarded on routing, dead client variable, FastAPI import outside proxy, streaming gate bypassing non-OpenAI providers) have all been addressed. The three new findings are P2: inline imports that violate CLAUDE.md style, a fragile dict-spread ordering that could in theory override stream=True (but cannot in practice for this endpoint), and an undocumented expansion of the streaming gate to hosted_vllm. None of these block merge.

litellm/proxy/openai_files_endpoints/file_content_streaming_handler.py — inline imports and dict-spread ordering.

Important Files Changed

Filename	Overview
litellm/proxy/openai_files_endpoints/file_content_streaming_handler.py	New handler class for streaming routing; two inline imports inside static methods violate CLAUDE.md style guide, and the `**data` spread ordering in the afile_content call could silently override `stream=True`.
litellm/files/streaming.py	New `FileContentStreamingResponse` wrapper correctly handles sync/async iteration, `aclose` under cancellation via `anyio.CancelScope`, and SDK-level success/failure logging.
litellm/files/main.py	Adds `file_content_streaming` helper; `client` and `litellm_params_dict` are correctly forwarded. Streaming is gated on `_should_sdk_support_streaming` which matches `OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS`.
litellm/llms/openai/openai.py	Adds `afile_content_streaming` and `file_content_streaming` methods; context-manager pattern correctly propagates exceptions to `__aexit__`/`__exit__`.
litellm/proxy/openai_files_endpoints/files_endpoints.py	Routing logic correctly resolves provider before the streaming gate check, so non-OpenAI routed providers fall through to the buffered path.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as files_endpoints.py
    participant Handler as FileContentStreamingHandler
    participant LiteLLM as litellm.afile_content
    participant OpenAI as OpenAIFilesAPI
    participant OAISDK as OpenAI SDK (streaming)

    Client->>Proxy: GET /v1/files/{file_id}/content
    Proxy->>Proxy: resolve custom_llm_provider
    Proxy->>Handler: resolve_streaming_request_params()
    Handler-->>Proxy: resolved_provider, file_id, data
    Proxy->>Handler: should_stream_file_content(resolved_provider)
    alt provider in OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS
        Handler-->>Proxy: True
        Proxy->>Handler: get_streaming_file_content_response()
        Handler->>LiteLLM: afile_content(stream=True, ...)
        LiteLLM->>OpenAI: file_content_streaming(_is_async=True)
        OpenAI->>OAISDK: files.with_streaming_response.content()
        OAISDK-->>OpenAI: streaming context manager
        OpenAI-->>LiteLLM: FileContentStreamingResult(AsyncIterator)
        LiteLLM-->>Handler: FileContentStreamingResult (wrapped in FileContentStreamingResponse)
        Handler-->>Proxy: StreamingResponse
        Proxy-->>Client: HTTP 200 chunked/octet-stream
        loop Each chunk
            OAISDK-->>Client: bytes chunk
        end
        Handler->>Handler: _log_success_async() on StopAsyncIteration
        Handler->>Handler: proxy_logging_obj.update_request_status(success)
    else provider NOT in supported set
        Handler-->>Proxy: False
        Proxy->>LiteLLM: afile_content(stream=False, ...)
        LiteLLM-->>Proxy: HttpxBinaryResponseContent (buffered)
        Proxy-->>Client: HTTP 200 full body
    end

_{Reviews (12): Last reviewed commit: "Refactor file content streaming handling..." | Re-trigger Greptile}

greptile-apps · 2026-04-09T22:41:08Z

+def _should_stream_file_content(
+    *,
+    custom_llm_provider: str,
+    is_base64_unified_file_id: Any,
+) -> bool:
+    return (
+        custom_llm_provider == "openai"
+        and bool(is_base64_unified_file_id) is False
+    )


Unconditional opt-out change breaks existing OpenAI users

Every proxy request where custom_llm_provider == "openai" is now silently rerouted to the streaming path — there is no feature flag or user-controlled opt-in. Callers that expected a buffered HttpxBinaryResponseContent (with Content-Length, synchronous .content, etc.) will receive a StreamingResponse after this change. This is a backwards-incompatible behavioral change for all current OpenAI file-content users.

Per the project style guide, new behavior that changes existing responses should be gated behind a flag (e.g., litellm.use_streaming_file_content = False by default), so existing users are not broken.

Suggested change

def _should_stream_file_content(

*,

custom_llm_provider: str,

is_base64_unified_file_id: Any,

) -> bool:

return (

custom_llm_provider == "openai"

and bool(is_base64_unified_file_id) is False

)

def _should_stream_file_content(

*,

custom_llm_provider: str,

is_base64_unified_file_id: Any,

) -> bool:

import litellm as _litellm

return (

custom_llm_provider == "openai"

and bool(is_base64_unified_file_id) is False

and getattr(_litellm, "use_streaming_file_content", False) is True

)

Rule Used: What: avoid backwards-incompatible changes without... (source)

I don’t think this is accurately characterized as a breaking change for existing OpenAI users.

For the externally supported /v1/files/{file_id}/content contract, the observable behavior remains the same for the validated client path:

HTTP 200
application/octet-stream
content-length preserved
content-disposition preserved
x-request-id present
identical response bytes
We explicitly validated this against the OpenAI SDK client.files.content(file_id=...) flow and confirmed LiteLLM and OpenAI return the same effective response contract. In that path, callers still receive a binary file payload and can access .content exactly as before.

The StreamingResponse vs HttpxBinaryResponseContent distinction is an internal server-side implementation detail, not a client-visible API contract, as long as the wire response and SDK-observable behavior remain equivalent. This change is specifically intended to avoid buffering large files fully in memory while preserving the existing external behavior.

So I agree this changes the internal delivery mechanism, but I don’t agree that it “breaks all current OpenAI file-content users.” The current evidence shows compatibility is preserved for the primary user path we care about.

If there is concern about unvalidated edge cases outside the tested OpenAI SDK flow, I’m happy to narrow the claim and add more coverage, but based on the current contract-level validation this does not appear to require a feature flag for backward compatibility.

Fair point on the SDK validation. If response.content returns identical bytes and headers in the tested path, the client-visible contract is preserved.

However, streaming and buffered responses differ in failure modes:

Mid-stream errors: With buffered responses, errors are caught before any bytes are sent (HTTP 500). With streaming, the client may receive HTTP 200 + partial bytes, then the stream aborts. Does the OpenAI SDK handle this gracefully?

Client disconnects: If a client disconnects mid-download, the buffered path would log completion status based on the full response. The streaming path in _stream_file_content_with_logging only logs success after full iteration — if the client disconnects, does the iterator stop cleanly without logging success?

Content-Length vs Transfer-Encoding: You're preserving content-length in the headers, but FastAPI StreamingResponse typically uses chunked encoding. Does the SDK rely on content-length for any validation?

Suggested validation to address these without requiring a feature flag:

Test the error path: mock afile_content_streaming to raise an exception after yielding one chunk. Verify the SDK sees a stream error, not HTTP 500.

Test client disconnect: verify _stream_file_content_with_logging closes cleanly and doesn't log success if the iteration is incomplete.

Verify chunked encoding doesn't break SDK clients that expect content-length.

If these scenarios pass, I'd agree this preserves compatibility. The load test results are compelling, and if the failure modes are equivalent, the streaming approach is sound.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

greptile-apps · 2026-04-09T22:41:10Z

+def _should_stream_file_content(
+    *,
+    custom_llm_provider: str,
+    is_base64_unified_file_id: Any,
+) -> bool:
+    return (
+        custom_llm_provider == "openai"
+        and bool(is_base64_unified_file_id) is False
+    )


Provider-specific logic belongs in litellm/llms/

_should_stream_file_content hardcodes custom_llm_provider == "openai" inside the proxy layer. The project rule is that provider-specific decisions live under litellm/llms/, not in proxy endpoints. Consider exposing a capability flag from the provider config (e.g., a supports_file_content_streaming property) and checking that here instead.

Rule Used: What: Avoid writing provider-specific code outside... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

I don’t think this is the same kind of provider-specific logic the style rule is meant to prevent.

_should_stream_file_content() is not implementing provider behavior or request/response transformation logic. It is deciding proxy routing policy: whether this proxy endpoint should serve the file-content response via the buffered path or the streaming path. That decision belongs naturally in the proxy layer because it is about endpoint response strategy, not provider semantics.

The provider-specific implementation still lives in litellm/llms/openai/:

the OpenAI file-content streaming call is implemented in litellm/llms/openai/openai.py
the iterator/headers returned by the provider are built there
the proxy is only deciding whether to invoke that streaming path for this endpoint
So the hardcoded custom_llm_provider == "openai" here is closer to:

“this proxy optimization/reroute is currently enabled only for OpenAI” than to
“the proxy is implementing OpenAI protocol logic”
If we later expand this to multiple providers, a capability flag could make sense. But for a targeted incremental rollout, a small hardcoded reroute policy in the proxy is reasonable and keeps the scope explicit. I’d view this as endpoint-level orchestration, not misplaced provider logic.

@harish876 i guess an action item here is to now do this fix for all other providers we support for file content right ? Then we can remove this condition

…e. This provides a 1:1 behaviour mapping similar to the non streaming behaviour.

- Removed unused imports and streamlined type hints in `litellm/utils.py` and `litellm/files/main.py`. - Moved `FileContentStreamingResult` to a new `litellm/files/types.py` for better organization. - Updated `FileContentStreamingResponse` in `litellm/files/streaming.py` to include asynchronous close methods and improved logging capabilities. - Enhanced tests to ensure proper closure of streaming iterators in `tests/test_litellm/llms/openai/test_openai_file_content_streaming.py` and `tests/test_litellm/proxy/openai_files_endpoint/test_files_endpoint.py`.

+            elif hasattr(stream_to_close, "close"):
+                result = cast(Iterator[bytes], stream_to_close).close()  # type: ignore[attr-defined]
+                if result is not None:
+                    await result


ishaan-berri · 2026-04-10T18:54:31Z

@greptile review again

harish876 · 2026-04-10T19:00:47Z

OpenAI File Content Backward Compatibility Note

PR: BerriAI/litellm#25450

Summary

This document captures a direct compatibility check for the OpenAI file content path introduced in PR #25450.

The goal of this check is to verify that, for an OpenAI SDK caller using client.files.content(file_id=...), the LiteLLM response remains compatible with the existing OpenAI behavior at the observable response-contract level.

Specifically, the script verifies that LiteLLM and OpenAI both return:

HTTP 200
content-type: application/octet-stream
the same content-length
the same content-disposition
an x-request-id
identical response bytes

This is a strong compatibility signal for the tested SDK flow because the consumer-visible payload and key headers match across both implementations.

Validation Script

File: openai_file_client.py

import asyncio
import os

from dotenv import load_dotenv
from openai import AsyncOpenAI


load_dotenv()


litellm_client = AsyncOpenAI(
    api_key=os.getenv("LITELLM_API_KEY"),
    base_url="http://34.95.44.152:4000/v1",
)

openai_client = AsyncOpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="https://api.openai.com/v1",
)

file_id = "file-2qexFzUBybCR2BWndU3twx"


async def fetch_file_content(client, label):
    content = await client.files.content(file_id=file_id)
    response = content.response

    print(f"[{label}] Status", response.status_code)

    return response


def assert_file_response(response, label):
    content_type = response.headers.get("content-type")
    content_length = response.headers.get("content-length")
    content_disposition = response.headers.get("content-disposition", "")
    request_id = response.headers.get("x-request-id")

    assert response.status_code == 200, f"{label}: expected status 200, got {response.status_code}"
    assert content_type == "application/octet-stream", (
        f"{label}: unexpected content-type {content_type}"
    )
    assert content_length is not None, f"{label}: missing content-length header"
    assert int(content_length) == len(response.content), (
        f"{label}: content-length header {content_length} != body length {len(response.content)}"
    )
    assert 'filename="dataset.jsonl"' in content_disposition, (
        f"{label}: unexpected content-disposition {content_disposition}"
    )
    assert request_id, f"{label}: missing x-request-id header"


async def main():
    litellm_response, openai_response = await asyncio.gather(
        fetch_file_content(litellm_client, "LiteLLM"),
        fetch_file_content(openai_client, "OpenAI"),
    )

    assert_file_response(litellm_response, "LiteLLM")
    assert_file_response(openai_response, "OpenAI")

    assert (
        litellm_response.headers.get("content-type") == openai_response.headers.get("content-type")
    ), "content-type mismatch between LiteLLM and OpenAI"
    assert (
        litellm_response.headers.get("content-length") == openai_response.headers.get("content-length")
    ), "content-length mismatch between LiteLLM and OpenAI"
    assert (
        litellm_response.headers.get("content-disposition") == openai_response.headers.get("content-disposition")
    ), "content-disposition mismatch between LiteLLM and OpenAI"
    assert litellm_response.content == openai_response.content, "response body mismatch"

    print("All assertions passed.")


if __name__ == "__main__":
    asyncio.run(main())

Command

python3 openai_file_client.py

Output

[OpenAI] Status 200
[LiteLLM] Status 200
All assertions passed.

Non-Mock Header Parity Check

This check was performed against real endpoints using the OpenAI Python SDK, not mocks.

The purpose of this comparison is to show that the new LiteLLM streaming implementation preserves the response contract that an OpenAI SDK caller observes from files.content(...).

The two responses were compared at the header and payload level. The compatibility-relevant result is:

both responses returned HTTP 200
both responses returned content-type: application/octet-stream
both responses returned the same content-length: 68156820
both responses returned the same content-disposition: attachment; filename="dataset.jsonl"
both responses included an x-request-id
both responses returned identical body bytes

Filtered headers from the new LiteLLM streaming path:

{
  "content-type": "application/octet-stream",
  "content-length": "68156820",
  "content-disposition": "attachment; filename=\"dataset.jsonl\"",
  "x-request-id": "req_5f622a75ec644d70bbd5469d2c008abf",
  "openai-version": "2020-10-01",
  "openai-project": "proj_F0P5EBggl8kfWzGtPQWRPchP",
  "x-litellm-version": "1.83.4",
  "x-litellm-key-spend": "0.0"
}

Filtered headers from the OpenAI baseline response:

{
  "content-type": "application/octet-stream",
  "content-length": "68156820",
  "content-disposition": "attachment; filename=\"dataset.jsonl\"",
  "x-request-id": "req_18ded58aec1f4ed9a69256d82e3586d2",
  "openai-version": "2020-10-01",
  "openai-project": "proj_F0P5EBggl8kfWzGtPQWRPchP"
}

Some headers are expected to differ across requests, such as date, cf-ray, set-cookie, and openai-processing-ms. Those are request-specific or infrastructure-specific and are not part of the compatibility contract being validated here.

Why This Supports Backward Compatibility

For the tested OpenAI SDK path, the behavior is backward compatible in the ways that matter to the caller:

the request still succeeds with status 200
the caller still receives a binary file payload
the file metadata exposed through headers remains present
content-length is preserved
the returned bytes are identical to OpenAI

In other words, from the perspective of a client consuming files.content(...), the observable contract is preserved for this scenario.

Scope Of The Claim

This validation demonstrates backward compatibility for the tested OpenAI SDK consumer path. It does not, by itself, prove compatibility for every possible raw HTTP caller or every internal implementation detail. What it does prove is that the end-to-end response contract for this SDK usage remains equivalent across LiteLLM and OpenAI for the validated file.

That is the key argument for PR #25450: the implementation changes the delivery mechanism internally, but preserves the externally observed behavior for the tested OpenAI file content workflow.

harish876 · 2026-04-10T19:09:05Z

@greptile review again

ishaan-berri

Nit - minor change requested

ishaan-berri · 2026-04-10T19:44:10Z

+def _should_stream_file_content(
+    *,
+    custom_llm_provider: str,
+    is_base64_unified_file_id: Any,
+) -> bool:
+    return (
+        custom_llm_provider == "openai"
+        and bool(is_base64_unified_file_id) is False
+    )


@harish876 i guess an action item here is to now do this fix for all other providers we support for file content right ? Then we can remove this condition

ishaan-berri

ishaan-berri · 2026-04-10T20:06:06Z

+
+
+@client
+def file_content_streaming(


this feels like a lot of duplicate code. Why can't we just add a stream=True/False on def file_content ?

That way you don't need this new function

Counterpoint here. I think keeping file_content_streaming() separate is the cleaner choice because this is not just a stream=True transport toggle on the existing API. The streaming path returns a different shape, carries headers alongside an iterator, and has iterator-specific logging and cleanup behavior like aclose() on disconnect. Keeping it separate preserves the existing file_content() contract, makes the rollout to other providers incremental, and keeps the streaming-specific behavior isolated and easier to test. The original function code can be removed once we migrate all paths to a streaming one.

- Static Methods for Streaming Handler Function - Remove the afile_content_streaming wrapper function. Enabled with a stream boolean in afile_content - Cleaned up test cases after refactor

… routing - Updated `FileContentStreamingHandler` to utilize `custom_llm_provider` from credentials for routing. - Added error handling for missing `custom_llm_provider` in credentials. - Introduced new tests to validate streaming behavior with routed providers and non-OpenAI providers. - Cleaned up imports and ensured proper type casting for improved clarity.

…provider routing - Added validation to ensure credentials include a custom LLM provider before routing. - Cleaned up type casting for better readability. - Introduced a new test to verify behavior when a non-OpenAI provider is used, ensuring proper handling of streaming responses. - Updated imports to include necessary modules for testing.

harish876 · 2026-04-10T23:33:26Z

@greptile review again

- Changed the import path for `upload_file_to_storage_backend` in test files to reflect the new module structure. - Ensured consistency in mocking for storage backend service tests.

+from litellm.files.types import FileContentStreamingResult
+
+if TYPE_CHECKING:
+    from litellm.proxy._types import UserAPIKeyAuth


+        from litellm.proxy.openai_files_endpoints.storage_backend_service import (
+            StorageBackendFileService,
+        )


+            from litellm.proxy.openai_files_endpoints.file_content_streaming_handler import (
+                FileContentStreamingHandler,
+            )


greptile-apps · 2026-04-11T00:16:30Z

+    def should_stream_file_content(
+        *,
+        custom_llm_provider: str,
+        is_base64_unified_file_id: Any,
+    ) -> bool:
+        return (
+            custom_llm_provider == "openai"
+            and bool(is_base64_unified_file_id) is False
+        )


Streaming gate passes even when model-routing resolves to a non-OpenAI provider

should_stream_file_content checks only the request-level custom_llm_provider ("openai" by default), but when should_route=True the effective provider comes from credentials["custom_llm_provider"] which can be "azure", "vertex_ai", or "bedrock". file_content_streaming only handles OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS = {"openai", "hosted_vllm"}, so the call in get_streaming_file_content_response raises BadRequestError for any model-routing target outside that set.

Concrete failure: user creates a file via the proxy with a model that routes to Azure → the file ID becomes model-encoded → on retrieval, should_route=True, credentials["custom_llm_provider"] = "azure", streaming is entered, afile_content(custom_llm_provider="azure", stream=True) → BadRequestError.

Simplest fix: pass the resolved effective provider into the gate check so streaming is only entered when the routed provider is actually supported:

@staticmethod def should_stream_file_content( *, custom_llm_provider: str, is_base64_unified_file_id: Any, effective_custom_llm_provider: Optional[str] = None, ) -> bool: from litellm.types.utils import OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS resolved = effective_custom_llm_provider or custom_llm_provider return ( resolved in OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS and bool(is_base64_unified_file_id) is False )

ishaan-berri

requested changes

ishaan-berri · 2026-04-11T01:01:15Z

                )
-
+                                
                response = await litellm.afile_content(


I think your architecture here is wrong.

You should always call litellm.afile_content()

Do not add a new branch as you did above.

Then in litellm.afile_content() add a stream="True"/"False" param. Based on the param handle it accordingly.

I agree. The caveat here is that FileContentStreamingHandler.get_streaming_file_content_response calls acontent_file with stream as True. The same async wrapper function is used. This static method is just in order to facilitate streaming logic within a single function, similar to file_content

My idea here is to replace the async wrapper, which was redundant to afile_content with a stream boolean

will revisit this again

Only one action item, make sure that this is done @harish876

ishaan-berri · 2026-04-11T01:01:48Z

                check_file_id_encoding=True,
            )

+            from litellm.proxy.openai_files_endpoints.file_content_streaming_handler import (


we should not have this section here. user's expect file to go through which your code skips today

if should_route: # Use model-based routing with credentials from config prepare_data_with_credentials(

- Introduced a new method in `FileContentStreamingHandler` to resolve streaming request parameters, enhancing the routing logic based on credentials. - Updated the `should_stream_file_content` method to check against supported providers. - Cleaned up type hints and imports across multiple files for better organization and clarity. - Added comprehensive tests to validate the new routing behavior and ensure original data integrity during streaming requests.

+            ):
+                verbose_proxy_logger.debug(
+                    "Using streaming file content helper for custom_llm_provider=%s, original_file_id=%s, file_id=%s, model_used=%s",
+                    resolved_custom_llm_provider,


+                verbose_proxy_logger.debug(
+                    "Using streaming file content helper for custom_llm_provider=%s, original_file_id=%s, file_id=%s, model_used=%s",
+                    resolved_custom_llm_provider,
+                    original_file_id,


+                    "Using streaming file content helper for custom_llm_provider=%s, original_file_id=%s, file_id=%s, model_used=%s",
+                    resolved_custom_llm_provider,
+                    original_file_id,
+                    resolved_file_id,


+                    resolved_custom_llm_provider,
+                    original_file_id,
+                    resolved_file_id,
+                    model_used,


+    from litellm.litellm_core_utils.litellm_logging import (
+        Logging as LiteLLMLoggingObj,
+    )
+    from litellm.types.utils import StandardLoggingHiddenParams, StandardLoggingPayload


+
+import litellm
+from litellm.files.types import FileContentProvider, FileContentStreamingResult
+from litellm.types.utils import OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS


+
+if TYPE_CHECKING:
+    from litellm.proxy._types import UserAPIKeyAuth
+    from litellm.proxy.utils import ProxyLogging


+            from litellm.proxy.openai_files_endpoints.common_utils import (
+                prepare_data_with_credentials,
+            )


harish876 · 2026-04-11T19:04:42Z

+        from litellm.proxy.common_request_processing import (
+            ProxyBaseLLMRequestProcessing,
+        )


This is outdated. The error was resolved

ishaan-berri · 2026-04-11T19:20:16Z

 ]
 FileDeleteProvider = Literal["openai", "azure", "gemini", "manus", "anthropic"]
 FileListProvider = Literal["openai", "azure", "manus", "anthropic"]
-FileContentProvider = Literal[


why are we deleting this ?

This has been moved to types.py. This is to prevent cyclic imports as the helper class needs to use this type as well

vercel bot deployed to Preview April 9, 2026 22:26 View deployment

remove conftest patch. TODO: make a different PR for this

1310803

vercel bot deployed to Preview April 9, 2026 22:31 View deployment

github-advanced-security AI found potential problems Apr 9, 2026

View reviewed changes

Comment thread litellm/files/main.py Fixed

Comment thread litellm/files/main.py Fixed

Comment thread litellm/files/main.py Fixed

Comment thread litellm/files/main.py Fixed

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Introduced Content-Length response headers into the streaming respons…

af4d4ab

…e. This provides a 1:1 behaviour mapping similar to the non streaming behaviour.

vercel bot deployed to Preview April 10, 2026 06:56 View deployment

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread litellm/files/streaming.py Fixed

Comment thread litellm/files/streaming.py Fixed

Comment thread litellm/files/streaming.py Fixed

Comment thread litellm/files/streaming.py Fixed

Comment thread litellm/llms/openai/openai.py Fixed

Comment thread litellm/files/main.py Fixed

remove unused iterator imports

044d434

vercel bot deployed to Preview April 10, 2026 07:15 View deployment

vercel bot deployed to Preview April 10, 2026 18:32 View deployment

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread litellm/files/streaming.py Fixed

Comment thread litellm/files/streaming.py

elif hasattr(stream_to_close, "close"):

result = cast(Iterator[bytes], stream_to_close).close() # type: ignore[attr-defined]

if result is not None:

await result

E2E test to assert response headers from the openai files change

1c74e17

vercel bot deployed to Preview April 10, 2026 18:51 View deployment

ishaan-berri requested changes Apr 10, 2026

View reviewed changes

Comment thread litellm/files/main.py Outdated

Code Comments incorporated.

ccf3dc3

- Static Methods for Streaming Handler Function - Remove the afile_content_streaming wrapper function. Enabled with a stream boolean in afile_content - Cleaned up test cases after refactor

vercel bot deployed to Preview April 10, 2026 22:43 View deployment

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

Comment thread litellm/proxy/openai_files_endpoints/file_content_streaming_handler.py

resolve dependency cycle

4e5e739

vercel bot deployed to Preview April 10, 2026 23:13 View deployment

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

Comment thread litellm/proxy/openai_files_endpoints/file_content_streaming_handler.py

harish876 added 2 commits April 10, 2026 23:29

vercel bot deployed to Preview April 10, 2026 23:31 View deployment

harish876 added 2 commits April 11, 2026 00:07

resolving dependency issues

68bc6de

Update import paths in tests for StorageBackendFileService

f523ccb

- Changed the import path for `upload_file_to_storage_backend` in test files to reflect the new module structure. - Ensured consistency in mocking for storage backend service tests.

vercel bot deployed to Preview April 11, 2026 00:09 View deployment

github-advanced-security AI found potential problems Apr 11, 2026

View reviewed changes

greptile-apps bot reviewed Apr 11, 2026

View reviewed changes

harish876 requested a review from ishaan-berri April 11, 2026 00:25

ishaan-berri requested changes Apr 11, 2026

View reviewed changes

vercel bot deployed to Preview April 11, 2026 18:58 View deployment

github-advanced-security AI found potential problems Apr 11, 2026

View reviewed changes

ishaan-berri reviewed Apr 11, 2026

View reviewed changes

ishaan-berri changed the base branch from main to litellm_harish_april11 April 11, 2026 19:24

ishaan-berri merged commit c70a3c7 into BerriAI:litellm_harish_april11 Apr 11, 2026
49 of 51 checks passed

Uh oh!

Conversation

harish876 commented Apr 9, 2026 • edited by harish-berri Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Summary

Issue

What was implemented

Load Test Memory Results (1000 concurrent requests, 65 MB payload)

Next Steps

Why this helps

Uh oh!

vercel bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Apr 9, 2026

Uh oh!

codspeed-hq bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ishaan-berri commented Apr 10, 2026

Uh oh!

harish876 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenAI File Content Backward Compatibility Note

Summary

Validation Script

Command

Output

Non-Mock Header Parity Check

Why This Supports Backward Compatibility

Scope Of The Claim

Uh oh!

harish876 commented Apr 10, 2026

Uh oh!

ishaan-berri left a comment

Choose a reason for hiding this comment

harish876 commented Apr 9, 2026 •

edited by harish-berri

Loading

vercel bot commented Apr 9, 2026 •

edited

Loading

codspeed-hq bot commented Apr 9, 2026 •

edited

Loading

codecov bot commented Apr 9, 2026 •

edited

Loading

greptile-apps bot commented Apr 9, 2026 •

edited

Loading

harish876 commented Apr 10, 2026 •

edited

Loading