[Fix] Align image URL fetch with validated HTTP client in Bedrock and token counter paths#26272
Conversation
Greptile SummaryThis PR routes three image-URL fetch sites ( The two P2 findings (post-read buffering in Confidence Score: 5/5Safe to merge — all remaining findings are P2 improvement suggestions, not blockers. The SSRF-protection change is correct and consistent with the existing safe_get/async_safe_get pattern. The two open findings (post-read buffering in token_counter.py and missing size limits in Bedrock paths) are incremental hardening items that don't introduce regressions. All P2 — score stays at 5. No files require special attention before merging.
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/prompt_templates/factory.py | Routes _load_image_from_url and both BedrockImageProcessor fetch methods through safe_get/async_safe_get for SSRF protection; no size bounds added to Bedrock paths. |
| litellm/litellm_core_utils/token_counter.py | Adds Content-Length pre-check and post-read size guard to get_image_dimensions; post-read check still buffers the full response before the limit fires. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User-supplied image URL] --> B{safe_get / async_safe_get}
B --> C[validate_url: DNS resolve + block-list check]
C -->|blocked IP| D[SSRFError raised]
C -->|allowed| E[client.get with follow_redirects=False]
E -->|redirect| F{is_redirect?}
F -->|yes, hop < 10| C
F -->|too many hops| G[SSRFError: Too many redirects]
F -->|no| H[Final response]
H --> I1[factory.py BedrockImageProcessor - response.content buffered, no size limit]
H --> I2[factory.py _load_image_from_url - response.content buffered, no size limit]
H --> I3[token_counter.py get_image_dimensions - Content-Length pre-check, response.read then len check]
I3 -->|exceeds limit| J[ValueError raised]
I3 -->|within limit| K[Parse image dimensions]
Comments Outside Diff (1)
-
litellm/litellm_core_utils/prompt_templates/factory.py, line 3565-3586 (link)Bedrock image fetch paths lack a download size bound
_post_call_image_processingaccessesresponse.content(fully buffered) without any size limit. Bothget_image_details_asyncandget_image_detailsnow have SSRF protection, but a server can still serve an arbitrarily large response that gets loaded entirely into memory before the call returns. Theimage_handling.pypattern (Content-Length pre-check +iter_bytesstreaming) would be the natural guard to add here as well.
Reviews (1): Last reviewed commit: "fix: align image URL fetch with validate..." | Re-trigger Greptile
| img_data = response.read() | ||
| if len(img_data) > max_bytes: | ||
| raise ValueError("Image response exceeds size limit") |
There was a problem hiding this comment.
Post-read size check still buffers entire response into memory
response.read() fully buffers the response body before the len(img_data) > max_bytes check fires. If the server omits Content-Length (or sends a deliberately low value), a very large image can exhaust heap before the guard triggers. The existing image_handling.py avoids this with iter_bytes(chunk_size=8192) and an in-flight byte counter. Aligning with that approach here would give a hard memory cap rather than just a post-hoc check.
Summary
BedrockImageProcessor.get_image_detailsandget_image_details_asyncinfactory.pywere fetching image URLs via rawclient.get()calls withfollow_redirects=True. These are now routed throughsafe_get/async_safe_get, which validates the destination before connecting and handles redirects with per-hop validation._load_image_from_urlinfactory.pyhad the same pattern and is updated to usesafe_get.get_image_dimensionsintoken_counter.pyusedsafe_get(address validation already present) but then calledresponse.read()without a size bound. Added a Content-Length pre-check and a post-read size check using the existingMAX_IMAGE_URL_DOWNLOAD_SIZE_MBconstant.The
safe_get/async_safe_gethelpers andvalidate_urlalready existed inlitellm_core_utils/url_utils.pyand were already used byimage_handling.pyandbase_ingestion.py; these changes bring the remaining image-fetch sites in line with that pattern.Testing
:4000.pytest tests/test_litellm/litellm_core_utils/test_token_counter.py tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py— 232 passed, 1 skipped.Type
🐛 Bug Fix
✅ Test