Skip to content

Add file content streaming support for OpenAI and related utilities#25450

Merged
ishaan-berri merged 13 commits intoBerriAI:litellm_harish_april11from
harish876:oom-file-fix-openai
Apr 11, 2026
Merged

Add file content streaming support for OpenAI and related utilities#25450
ishaan-berri merged 13 commits intoBerriAI:litellm_harish_april11from
harish876:oom-file-fix-openai

Conversation

@harish876
Copy link
Copy Markdown
Contributor

@harish876 harish876 commented Apr 9, 2026

  • Introduced file_content_streaming functions in litellm/files/main.py to handle asynchronous and synchronous file content streaming.
  • Added FileContentStreamingResponse class in litellm/files/streaming.py to manage streaming responses with logging capabilities.
  • Updated OpenAI API integration in litellm/llms/openai/openai.py to support new streaming methods.
  • Enhanced file content retrieval in litellm/proxy/openai_files_endpoints/files_endpoints.py to route requests for streaming.
  • Added unit tests for the new streaming functionality in tests/test_litellm/llms/openai/test_openai_file_content_streaming.py and tests/test_litellm/proxy/openai_files_endpoint/test_files_endpoint.py.
  • Refactored type hints and imports for better clarity and organization across modified files.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Summary

This PR adds streaming response enabled for openai file content responses.

Issue

The v1 files content path buffers the full payload in memory before returning it. Under load, that causes elevated RSS and can contribute to OOM behavior when many large file requests run concurrently.

What was implemented

  • Added a prototype files content endpoint that returns a streaming response for only openai.
  • Kept the implementation openai-only for now so we can validate the approach before generalizing it.
  • Added test cases to assert only openai is called through the streaming logic.

Load Test Memory Results (1000 concurrent requests, 65 MB payload)

Metric Before Fix After Fix Delta
Average Memory Usage 3.707 GiB (92.67%) 2.56 GiB (64.0%) ↓ 1.147 GiB
Peak Memory Usage 3.893 GiB (97.32%) 2.621 GiB (65.53%) ↓ 1.272 GiB

Next Steps

The current file content retrieval path goes through afile_content. To make the response object a async iterable needs to be evaluated. The current solution is a prototype, which provider memory savings when a streaming based approach is used

Why this helps

Streaming avoids holding the full file payload in memory per request, which materially lowers peak RSS under concurrent load and reduces the risk of OOM events.

- Introduced `afile_content_streaming` and `file_content_streaming` functions in `litellm/files/main.py` to handle asynchronous and synchronous file content streaming.
- Added `FileContentStreamingResponse` class in `litellm/files/streaming.py` to manage streaming responses with logging capabilities.
- Updated OpenAI API integration in `litellm/llms/openai/openai.py` to support new streaming methods.
- Enhanced file content retrieval in `litellm/proxy/openai_files_endpoints/files_endpoints.py` to route requests for streaming.
- Added unit tests for the new streaming functionality in `tests/test_litellm/llms/openai/test_openai_file_content_streaming.py` and `tests/test_litellm/proxy/openai_files_endpoint/test_files_endpoint.py`.
- Refactored type hints and imports for better clarity and organization across modified files.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 11, 2026 6:58pm

Request Review

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 9, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing harish876:oom-file-fix-openai (69eb345) with main (eabb6a3)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 73.64341% with 68 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/files/streaming.py 59.25% 55 Missing ⚠️
litellm/llms/openai/openai.py 82.50% 7 Missing ⚠️
litellm/files/main.py 86.20% 4 Missing ⚠️
..._files_endpoints/file_content_streaming_handler.py 95.23% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread litellm/files/main.py Fixed
Comment thread litellm/files/main.py Fixed
Comment thread litellm/files/main.py Fixed
Comment thread litellm/files/main.py Fixed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds file content streaming support to reduce peak memory usage (from ~3.9 GiB to ~2.6 GiB at 1 000 concurrent requests with a 65 MB payload) by avoiding full in-memory buffering. Key additions include FileContentStreamingResponse for SDK-level logging callbacks, FileContentStreamingHandler in the proxy layer for routing and proxy-level logging, and corresponding unit/integration tests.

The previously-flagged blocking concerns from earlier rounds (wrong provider in routing, dead client variable, FastAPI import outside proxy, streaming gate bypassing non-OpenAI providers) have all been addressed in this revision.

Confidence Score: 5/5

Safe to merge — all prior blocking concerns are resolved; remaining findings are style-level only.

The previously-flagged P1 issues (wrong provider forwarded on routing, dead client variable, FastAPI import outside proxy, streaming gate bypassing non-OpenAI providers) have all been addressed. The three new findings are P2: inline imports that violate CLAUDE.md style, a fragile dict-spread ordering that could in theory override stream=True (but cannot in practice for this endpoint), and an undocumented expansion of the streaming gate to hosted_vllm. None of these block merge.

litellm/proxy/openai_files_endpoints/file_content_streaming_handler.py — inline imports and dict-spread ordering.

Important Files Changed

Filename Overview
litellm/proxy/openai_files_endpoints/file_content_streaming_handler.py New handler class for streaming routing; two inline imports inside static methods violate CLAUDE.md style guide, and the **data spread ordering in the afile_content call could silently override stream=True.
litellm/files/streaming.py New FileContentStreamingResponse wrapper correctly handles sync/async iteration, aclose under cancellation via anyio.CancelScope, and SDK-level success/failure logging.
litellm/files/main.py Adds file_content_streaming helper; client and litellm_params_dict are correctly forwarded. Streaming is gated on _should_sdk_support_streaming which matches OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS.
litellm/llms/openai/openai.py Adds afile_content_streaming and file_content_streaming methods; context-manager pattern correctly propagates exceptions to __aexit__/__exit__.
litellm/proxy/openai_files_endpoints/files_endpoints.py Routing logic correctly resolves provider before the streaming gate check, so non-OpenAI routed providers fall through to the buffered path.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as files_endpoints.py
    participant Handler as FileContentStreamingHandler
    participant LiteLLM as litellm.afile_content
    participant OpenAI as OpenAIFilesAPI
    participant OAISDK as OpenAI SDK (streaming)

    Client->>Proxy: GET /v1/files/{file_id}/content
    Proxy->>Proxy: resolve custom_llm_provider
    Proxy->>Handler: resolve_streaming_request_params()
    Handler-->>Proxy: resolved_provider, file_id, data
    Proxy->>Handler: should_stream_file_content(resolved_provider)
    alt provider in OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS
        Handler-->>Proxy: True
        Proxy->>Handler: get_streaming_file_content_response()
        Handler->>LiteLLM: afile_content(stream=True, ...)
        LiteLLM->>OpenAI: file_content_streaming(_is_async=True)
        OpenAI->>OAISDK: files.with_streaming_response.content()
        OAISDK-->>OpenAI: streaming context manager
        OpenAI-->>LiteLLM: FileContentStreamingResult(AsyncIterator)
        LiteLLM-->>Handler: FileContentStreamingResult (wrapped in FileContentStreamingResponse)
        Handler-->>Proxy: StreamingResponse
        Proxy-->>Client: HTTP 200 chunked/octet-stream
        loop Each chunk
            OAISDK-->>Client: bytes chunk
        end
        Handler->>Handler: _log_success_async() on StopAsyncIteration
        Handler->>Handler: proxy_logging_obj.update_request_status(success)
    else provider NOT in supported set
        Handler-->>Proxy: False
        Proxy->>LiteLLM: afile_content(stream=False, ...)
        LiteLLM-->>Proxy: HttpxBinaryResponseContent (buffered)
        Proxy-->>Client: HTTP 200 full body
    end
Loading

Reviews (12): Last reviewed commit: "Refactor file content streaming handling..." | Re-trigger Greptile

Comment on lines +66 to +74
def _should_stream_file_content(
*,
custom_llm_provider: str,
is_base64_unified_file_id: Any,
) -> bool:
return (
custom_llm_provider == "openai"
and bool(is_base64_unified_file_id) is False
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unconditional opt-out change breaks existing OpenAI users

Every proxy request where custom_llm_provider == "openai" is now silently rerouted to the streaming path — there is no feature flag or user-controlled opt-in. Callers that expected a buffered HttpxBinaryResponseContent (with Content-Length, synchronous .content, etc.) will receive a StreamingResponse after this change. This is a backwards-incompatible behavioral change for all current OpenAI file-content users.

Per the project style guide, new behavior that changes existing responses should be gated behind a flag (e.g., litellm.use_streaming_file_content = False by default), so existing users are not broken.

Suggested change
def _should_stream_file_content(
*,
custom_llm_provider: str,
is_base64_unified_file_id: Any,
) -> bool:
return (
custom_llm_provider == "openai"
and bool(is_base64_unified_file_id) is False
)
def _should_stream_file_content(
*,
custom_llm_provider: str,
is_base64_unified_file_id: Any,
) -> bool:
import litellm as _litellm
return (
custom_llm_provider == "openai"
and bool(is_base64_unified_file_id) is False
and getattr(_litellm, "use_streaming_file_content", False) is True
)

Rule Used: What: avoid backwards-incompatible changes without... (source)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is accurately characterized as a breaking change for existing OpenAI users.

For the externally supported /v1/files/{file_id}/content contract, the observable behavior remains the same for the validated client path:

HTTP 200
application/octet-stream
content-length preserved
content-disposition preserved
x-request-id present
identical response bytes
We explicitly validated this against the OpenAI SDK client.files.content(file_id=...) flow and confirmed LiteLLM and OpenAI return the same effective response contract. In that path, callers still receive a binary file payload and can access .content exactly as before.

The StreamingResponse vs HttpxBinaryResponseContent distinction is an internal server-side implementation detail, not a client-visible API contract, as long as the wire response and SDK-observable behavior remain equivalent. This change is specifically intended to avoid buffering large files fully in memory while preserving the existing external behavior.

So I agree this changes the internal delivery mechanism, but I don’t agree that it “breaks all current OpenAI file-content users.” The current evidence shows compatibility is preserved for the primary user path we care about.

If there is concern about unvalidated edge cases outside the tested OpenAI SDK flow, I’m happy to narrow the claim and add more coverage, but based on the current contract-level validation this does not appear to require a feature flag for backward compatibility.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point on the SDK validation. If response.content returns identical bytes and headers in the tested path, the client-visible contract is preserved.

However, streaming and buffered responses differ in failure modes:

  1. Mid-stream errors: With buffered responses, errors are caught before any bytes are sent (HTTP 500). With streaming, the client may receive HTTP 200 + partial bytes, then the stream aborts. Does the OpenAI SDK handle this gracefully?

  2. Client disconnects: If a client disconnects mid-download, the buffered path would log completion status based on the full response. The streaming path in _stream_file_content_with_logging only logs success after full iteration — if the client disconnects, does the iterator stop cleanly without logging success?

  3. Content-Length vs Transfer-Encoding: You're preserving content-length in the headers, but FastAPI StreamingResponse typically uses chunked encoding. Does the SDK rely on content-length for any validation?

Suggested validation to address these without requiring a feature flag:

  1. Test the error path: mock afile_content_streaming to raise an exception after yielding one chunk. Verify the SDK sees a stream error, not HTTP 500.
  2. Test client disconnect: verify _stream_file_content_with_logging closes cleanly and doesn't log success if the iteration is incomplete.
  3. Verify chunked encoding doesn't break SDK clients that expect content-length.

If these scenarios pass, I'd agree this preserves compatibility. The load test results are compelling, and if the failure modes are equivalent, the streaming approach is sound.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules and .greptile/config.

Comment on lines +66 to +74
def _should_stream_file_content(
*,
custom_llm_provider: str,
is_base64_unified_file_id: Any,
) -> bool:
return (
custom_llm_provider == "openai"
and bool(is_base64_unified_file_id) is False
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Provider-specific logic belongs in litellm/llms/

_should_stream_file_content hardcodes custom_llm_provider == "openai" inside the proxy layer. The project rule is that provider-specific decisions live under litellm/llms/, not in proxy endpoints. Consider exposing a capability flag from the provider config (e.g., a supports_file_content_streaming property) and checking that here instead.

Rule Used: What: Avoid writing provider-specific code outside... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is the same kind of provider-specific logic the style rule is meant to prevent.

_should_stream_file_content() is not implementing provider behavior or request/response transformation logic. It is deciding proxy routing policy: whether this proxy endpoint should serve the file-content response via the buffered path or the streaming path. That decision belongs naturally in the proxy layer because it is about endpoint response strategy, not provider semantics.

The provider-specific implementation still lives in litellm/llms/openai/:

the OpenAI file-content streaming call is implemented in litellm/llms/openai/openai.py
the iterator/headers returned by the provider are built there
the proxy is only deciding whether to invoke that streaming path for this endpoint
So the hardcoded custom_llm_provider == "openai" here is closer to:

“this proxy optimization/reroute is currently enabled only for OpenAI” than to
“the proxy is implementing OpenAI protocol logic”
If we later expand this to multiple providers, a capability flag could make sense. But for a targeted incremental rollout, a small hardcoded reroute policy in the proxy is reasonable and keeps the scope explicit. I’d view this as endpoint-level orchestration, not misplaced provider logic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harish876 i guess an action item here is to now do this fix for all other providers we support for file content right ? Then we can remove this condition

Comment thread litellm/files/main.py Outdated
…e. This provides a 1:1 behaviour mapping similar to the non streaming behaviour.
Comment thread litellm/files/streaming.py Fixed
Comment thread litellm/files/streaming.py Fixed
Comment thread litellm/files/streaming.py Fixed
Comment thread litellm/files/streaming.py Fixed
Comment thread litellm/llms/openai/openai.py Fixed
Comment thread litellm/files/main.py Fixed
- Removed unused imports and streamlined type hints in `litellm/utils.py` and `litellm/files/main.py`.
- Moved `FileContentStreamingResult` to a new `litellm/files/types.py` for better organization.
- Updated `FileContentStreamingResponse` in `litellm/files/streaming.py` to include asynchronous close methods and improved logging capabilities.
- Enhanced tests to ensure proper closure of streaming iterators in `tests/test_litellm/llms/openai/test_openai_file_content_streaming.py` and `tests/test_litellm/proxy/openai_files_endpoint/test_files_endpoint.py`.
Comment thread litellm/files/streaming.py Fixed
elif hasattr(stream_to_close, "close"):
result = cast(Iterator[bytes], stream_to_close).close() # type: ignore[attr-defined]
if result is not None:
await result
@ishaan-berri
Copy link
Copy Markdown
Contributor

@greptile review again

@harish876
Copy link
Copy Markdown
Contributor Author

harish876 commented Apr 10, 2026

OpenAI File Content Backward Compatibility Note

PR: BerriAI/litellm#25450

Summary

This document captures a direct compatibility check for the OpenAI file content path introduced in PR #25450.

The goal of this check is to verify that, for an OpenAI SDK caller using client.files.content(file_id=...), the LiteLLM response remains compatible with the existing OpenAI behavior at the observable response-contract level.

Specifically, the script verifies that LiteLLM and OpenAI both return:

  • HTTP 200
  • content-type: application/octet-stream
  • the same content-length
  • the same content-disposition
  • an x-request-id
  • identical response bytes

This is a strong compatibility signal for the tested SDK flow because the consumer-visible payload and key headers match across both implementations.

Validation Script

File: openai_file_client.py

import asyncio
import os

from dotenv import load_dotenv
from openai import AsyncOpenAI


load_dotenv()


litellm_client = AsyncOpenAI(
    api_key=os.getenv("LITELLM_API_KEY"),
    base_url="http://34.95.44.152:4000/v1",
)

openai_client = AsyncOpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="https://api.openai.com/v1",
)

file_id = "file-2qexFzUBybCR2BWndU3twx"


async def fetch_file_content(client, label):
    content = await client.files.content(file_id=file_id)
    response = content.response

    print(f"[{label}] Status", response.status_code)

    return response


def assert_file_response(response, label):
    content_type = response.headers.get("content-type")
    content_length = response.headers.get("content-length")
    content_disposition = response.headers.get("content-disposition", "")
    request_id = response.headers.get("x-request-id")

    assert response.status_code == 200, f"{label}: expected status 200, got {response.status_code}"
    assert content_type == "application/octet-stream", (
        f"{label}: unexpected content-type {content_type}"
    )
    assert content_length is not None, f"{label}: missing content-length header"
    assert int(content_length) == len(response.content), (
        f"{label}: content-length header {content_length} != body length {len(response.content)}"
    )
    assert 'filename="dataset.jsonl"' in content_disposition, (
        f"{label}: unexpected content-disposition {content_disposition}"
    )
    assert request_id, f"{label}: missing x-request-id header"


async def main():
    litellm_response, openai_response = await asyncio.gather(
        fetch_file_content(litellm_client, "LiteLLM"),
        fetch_file_content(openai_client, "OpenAI"),
    )

    assert_file_response(litellm_response, "LiteLLM")
    assert_file_response(openai_response, "OpenAI")

    assert (
        litellm_response.headers.get("content-type") == openai_response.headers.get("content-type")
    ), "content-type mismatch between LiteLLM and OpenAI"
    assert (
        litellm_response.headers.get("content-length") == openai_response.headers.get("content-length")
    ), "content-length mismatch between LiteLLM and OpenAI"
    assert (
        litellm_response.headers.get("content-disposition") == openai_response.headers.get("content-disposition")
    ), "content-disposition mismatch between LiteLLM and OpenAI"
    assert litellm_response.content == openai_response.content, "response body mismatch"

    print("All assertions passed.")


if __name__ == "__main__":
    asyncio.run(main())

Command

python3 openai_file_client.py

Output

[OpenAI] Status 200
[LiteLLM] Status 200
All assertions passed.

Non-Mock Header Parity Check

This check was performed against real endpoints using the OpenAI Python SDK, not mocks.

The purpose of this comparison is to show that the new LiteLLM streaming implementation preserves the response contract that an OpenAI SDK caller observes from files.content(...).

The two responses were compared at the header and payload level. The compatibility-relevant result is:

  • both responses returned HTTP 200
  • both responses returned content-type: application/octet-stream
  • both responses returned the same content-length: 68156820
  • both responses returned the same content-disposition: attachment; filename="dataset.jsonl"
  • both responses included an x-request-id
  • both responses returned identical body bytes

Filtered headers from the new LiteLLM streaming path:

{
  "content-type": "application/octet-stream",
  "content-length": "68156820",
  "content-disposition": "attachment; filename=\"dataset.jsonl\"",
  "x-request-id": "req_5f622a75ec644d70bbd5469d2c008abf",
  "openai-version": "2020-10-01",
  "openai-project": "proj_F0P5EBggl8kfWzGtPQWRPchP",
  "x-litellm-version": "1.83.4",
  "x-litellm-key-spend": "0.0"
}

Filtered headers from the OpenAI baseline response:

{
  "content-type": "application/octet-stream",
  "content-length": "68156820",
  "content-disposition": "attachment; filename=\"dataset.jsonl\"",
  "x-request-id": "req_18ded58aec1f4ed9a69256d82e3586d2",
  "openai-version": "2020-10-01",
  "openai-project": "proj_F0P5EBggl8kfWzGtPQWRPchP"
}

Some headers are expected to differ across requests, such as date, cf-ray, set-cookie, and openai-processing-ms. Those are request-specific or infrastructure-specific and are not part of the compatibility contract being validated here.

Why This Supports Backward Compatibility

For the tested OpenAI SDK path, the behavior is backward compatible in the ways that matter to the caller:

  • the request still succeeds with status 200
  • the caller still receives a binary file payload
  • the file metadata exposed through headers remains present
  • content-length is preserved
  • the returned bytes are identical to OpenAI

In other words, from the perspective of a client consuming files.content(...), the observable contract is preserved for this scenario.

Scope Of The Claim

This validation demonstrates backward compatibility for the tested OpenAI SDK consumer path. It does not, by itself, prove compatibility for every possible raw HTTP caller or every internal implementation detail. What it does prove is that the end-to-end response contract for this SDK usage remains equivalent across LiteLLM and OpenAI for the validated file.

That is the key argument for PR #25450: the implementation changes the delivery mechanism internally, but preserves the externally observed behavior for the tested OpenAI file content workflow.

@harish876
Copy link
Copy Markdown
Contributor Author

@greptile review again

Copy link
Copy Markdown
Contributor

@ishaan-berri ishaan-berri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - minor change requested

Comment thread litellm/proxy/openai_files_endpoints/files_endpoints.py Outdated
Comment on lines +66 to +74
def _should_stream_file_content(
*,
custom_llm_provider: str,
is_base64_unified_file_id: Any,
) -> bool:
return (
custom_llm_provider == "openai"
and bool(is_base64_unified_file_id) is False
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harish876 i guess an action item here is to now do this fix for all other providers we support for file content right ? Then we can remove this condition

Copy link
Copy Markdown
Contributor

@ishaan-berri ishaan-berri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread litellm/files/main.py


@client
def file_content_streaming(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a lot of duplicate code. Why can't we just add a stream=True/False on def file_content ?

That way you don't need this new function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counterpoint here. I think keeping file_content_streaming() separate is the cleaner choice because this is not just a stream=True transport toggle on the existing API. The streaming path returns a different shape, carries headers alongside an iterator, and has iterator-specific logging and cleanup behavior like aclose() on disconnect. Keeping it separate preserves the existing file_content() contract, makes the rollout to other providers incremental, and keeps the streaming-specific behavior isolated and easier to test. The original function code can be removed once we migrate all paths to a streaming one.

Comment thread litellm/files/main.py Outdated
 - Static Methods for Streaming Handler Function

 - Remove the afile_content_streaming wrapper function. Enabled with a stream boolean in afile_content

 - Cleaned up test cases after refactor
Comment thread litellm/files/file_content_streaming_handler.py Fixed
Comment thread litellm/proxy/openai_files_endpoints/files_endpoints.py Fixed
Comment thread litellm/proxy/openai_files_endpoints/files_endpoints.py Fixed
… routing

- Updated `FileContentStreamingHandler` to utilize `custom_llm_provider` from credentials for routing.
- Added error handling for missing `custom_llm_provider` in credentials.
- Introduced new tests to validate streaming behavior with routed providers and non-OpenAI providers.
- Cleaned up imports and ensured proper type casting for improved clarity.
…provider routing

- Added validation to ensure credentials include a custom LLM provider before routing.
- Cleaned up type casting for better readability.
- Introduced a new test to verify behavior when a non-OpenAI provider is used, ensuring proper handling of streaming responses.
- Updated imports to include necessary modules for testing.
@harish876
Copy link
Copy Markdown
Contributor Author

@greptile review again

- Changed the import path for `upload_file_to_storage_backend` in test files to reflect the new module structure.
- Ensured consistency in mocking for storage backend service tests.
from litellm.files.types import FileContentStreamingResult

if TYPE_CHECKING:
from litellm.proxy._types import UserAPIKeyAuth
Comment on lines +160 to +162
from litellm.proxy.openai_files_endpoints.storage_backend_service import (
StorageBackendFileService,
)
Comment on lines +735 to +737
from litellm.proxy.openai_files_endpoints.file_content_streaming_handler import (
FileContentStreamingHandler,
)
Comment on lines +15 to +23
def should_stream_file_content(
*,
custom_llm_provider: str,
is_base64_unified_file_id: Any,
) -> bool:
return (
custom_llm_provider == "openai"
and bool(is_base64_unified_file_id) is False
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Streaming gate passes even when model-routing resolves to a non-OpenAI provider

should_stream_file_content checks only the request-level custom_llm_provider ("openai" by default), but when should_route=True the effective provider comes from credentials["custom_llm_provider"] which can be "azure", "vertex_ai", or "bedrock". file_content_streaming only handles OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS = {"openai", "hosted_vllm"}, so the call in get_streaming_file_content_response raises BadRequestError for any model-routing target outside that set.

Concrete failure: user creates a file via the proxy with a model that routes to Azure → the file ID becomes model-encoded → on retrieval, should_route=True, credentials["custom_llm_provider"] = "azure", streaming is entered, afile_content(custom_llm_provider="azure", stream=True)BadRequestError.

Simplest fix: pass the resolved effective provider into the gate check so streaming is only entered when the routed provider is actually supported:

@staticmethod
def should_stream_file_content(
    *,
    custom_llm_provider: str,
    is_base64_unified_file_id: Any,
    effective_custom_llm_provider: Optional[str] = None,
) -> bool:
    from litellm.types.utils import OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS
    resolved = effective_custom_llm_provider or custom_llm_provider
    return (
        resolved in OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS
        and bool(is_base64_unified_file_id) is False
    )

@harish876 harish876 requested a review from ishaan-berri April 11, 2026 00:25
Copy link
Copy Markdown
Contributor

@ishaan-berri ishaan-berri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested changes

)

response = await litellm.afile_content(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your architecture here is wrong.

You should always call litellm.afile_content()

Do not add a new branch as you did above.

Then in litellm.afile_content() add a stream="True"/"False" param. Based on the param handle it accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The caveat here is that FileContentStreamingHandler.get_streaming_file_content_response calls acontent_file with stream as True. The same async wrapper function is used. This static method is just in order to facilitate streaming logic within a single function, similar to file_content

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea here is to replace the async wrapper, which was redundant to afile_content with a stream boolean

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revisit this again

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one action item, make sure that this is done @harish876

check_file_id_encoding=True,
)

from litellm.proxy.openai_files_endpoints.file_content_streaming_handler import (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not have this section here. user's expect file to go through which your code skips today

if should_route:
                # Use model-based routing with credentials from config
                prepare_data_with_credentials(

- Introduced a new method in `FileContentStreamingHandler` to resolve streaming request parameters, enhancing the routing logic based on credentials.
- Updated the `should_stream_file_content` method to check against supported providers.
- Cleaned up type hints and imports across multiple files for better organization and clarity.
- Added comprehensive tests to validate the new routing behavior and ensure original data integrity during streaming requests.
):
verbose_proxy_logger.debug(
"Using streaming file content helper for custom_llm_provider=%s, original_file_id=%s, file_id=%s, model_used=%s",
resolved_custom_llm_provider,
verbose_proxy_logger.debug(
"Using streaming file content helper for custom_llm_provider=%s, original_file_id=%s, file_id=%s, model_used=%s",
resolved_custom_llm_provider,
original_file_id,
"Using streaming file content helper for custom_llm_provider=%s, original_file_id=%s, file_id=%s, model_used=%s",
resolved_custom_llm_provider,
original_file_id,
resolved_file_id,
resolved_custom_llm_provider,
original_file_id,
resolved_file_id,
model_used,
from litellm.litellm_core_utils.litellm_logging import (
Logging as LiteLLMLoggingObj,
)
from litellm.types.utils import StandardLoggingHiddenParams, StandardLoggingPayload

import litellm
from litellm.files.types import FileContentProvider, FileContentStreamingResult
from litellm.types.utils import OPENAI_COMPATIBLE_BATCH_AND_FILES_PROVIDERS

if TYPE_CHECKING:
from litellm.proxy._types import UserAPIKeyAuth
from litellm.proxy.utils import ProxyLogging
Comment on lines +36 to +38
from litellm.proxy.openai_files_endpoints.common_utils import (
prepare_data_with_credentials,
)
Comment on lines +106 to +108
from litellm.proxy.common_request_processing import (
ProxyBaseLLMRequestProcessing,
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outdated. The error was resolved

Comment thread litellm/files/main.py
]
FileDeleteProvider = Literal["openai", "azure", "gemini", "manus", "anthropic"]
FileListProvider = Literal["openai", "azure", "manus", "anthropic"]
FileContentProvider = Literal[
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been moved to types.py. This is to prevent cyclic imports as the helper class needs to use this type as well

@ishaan-berri ishaan-berri changed the base branch from main to litellm_harish_april11 April 11, 2026 19:24
@ishaan-berri ishaan-berri merged commit c70a3c7 into BerriAI:litellm_harish_april11 Apr 11, 2026
49 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants