fix(proxy): add URL validation for user-supplied URLs#25906
fix(proxy): add URL validation for user-supplied URLs#25906yuneng-berri merged 15 commits intoBerriAI:litellm_yj_apr17from
Conversation
…lied URLs Add validate_url() utility that resolves DNS once, validates all IPs against private network ranges, and rewrites the URL to connect to the validated IP directly. Prevents DNS rebinding by pinning to the resolved IP. Disable follow_redirects to prevent redirect-based SSRF bypasses. Applied to all user-supplied URL entry points: - Image URL fetching in chat completions - Token counter image dimension fetching - RAG file ingestion - MCP OpenAPI spec loading
Add safe_get() and async_safe_get() helpers that validate each redirect hop before following. For HTTPS, rely on TLS certificate binding instead of URL rewriting. Simplify call sites to use the new helpers.
…ests Check URL scheme before calling safe_get in token counter to avoid unnecessary DNS resolution on base64-encoded image data. Add 14 unit tests for validate_url covering blocked networks, scheme validation, URL rewriting, and DNS failure handling.
…sabled _is_blocked_ip now returns True (blocked) for unparseable addresses instead of False (allowed). HTTPS URLs are rewritten to validated IPs when ssl_verify is disabled, closing the DNS rebinding window that exists without TLS certificate binding.
Read Location header directly instead of response.next_request (which is None when follow_redirects=False). Resolve relative redirect URLs with httpx.URL.join(). Remove unused imports.
Pass follow_redirects through in HTTPHandler.get() — previously the parameter was accepted but never forwarded to the underlying httpx client, making sync redirect protection ineffective. Include port in Host header when non-default (e.g. example.com:8080). Fix redirect loop to read Location header directly instead of response.next_request (which is None when follow_redirects=False).
SDK core modules (image_handling, token_counter) should not import from litellm.proxy. Move url_utils.py to litellm_core_utils/ so bare SDK installs without proxy dependencies still work.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds SSRF protection to all user-supplied URL entry points (image fetching, token counting, RAG ingestion, MCP OpenAPI spec loading) via a new Confidence Score: 5/5Safe to merge — SSRF protection is correctly implemented with no active bugs in current callers. All previously raised concerns (inline httpx import, Azure Wire Server gap, no opt-out for internal hosts) have been addressed. Remaining findings are P2: the async retry loop silently swallows SSRFError without logging (sync path does log), and the **kwargs pass-through creates a latent TypeError risk for future callers using HTTPHandler. Neither affects correctness today. litellm/litellm_core_utils/prompt_templates/image_handling.py (silent SSRFError in async retry loop)
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/url_utils.py | New SSRF utility: DNS resolve-and-rewrite for HTTP, TLS binding for HTTPS, RFC 6890 is_global check plus Azure Wire Server exception, per-hop redirect validation, admin allowlist and master switch. |
| litellm/litellm_core_utils/prompt_templates/image_handling.py | Replaces bare client.get() with safe_get/async_safe_get; async path silently swallows SSRFError across all 3 retries with no log entry, while sync path does log it. |
| litellm/litellm_core_utils/token_counter.py | get_image_dimensions now guards URL fetch behind startswith check and uses safe_get; SSRFError is silently swallowed and falls through to base64 decode (same behaviour as prior network failures). |
| litellm/rag/ingestion/base_ingestion.py | Single-line swap of client.get() to async_safe_get(); SSRFError propagates to caller unchanged, which is acceptable. |
| litellm/proxy/_experimental/mcp_server/openapi_to_mcp_generator.py | Single-line swap to async_safe_get() for HTTP spec loading; SSRF validation applied before spec fetch. |
| tests/test_litellm/litellm_core_utils/test_url_utils.py | Comprehensive mocked unit tests covering blocked IPs, DNS failure, allowlist, redirect hostname preservation, and master switch — no real network calls. |
| tests/test_litellm/litellm_core_utils/test_image_handling.py | Uses autouse fixture to bypass SSRF in image handling tests; existing size-limit and streaming tests preserved with no weakened assertions. |
| tests/mcp_tests/test_openapi_spec_path_url.py | Mocked async handler tests for URL and local-file spec loading; correctly monkeypatches async_safe_get to bypass DNS in test environment. |
| litellm/init.py | Adds user_url_validation (bool, default True) and user_url_allowed_hosts (List[str], default []) globals for admin opt-out. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User-supplied URL] --> B{user_url_validation\nenabled?}
B -- No --> C[client.get with follow_redirects=True]
B -- Yes --> D[validate_url]
D --> E{Scheme\nhttp/https?}
E -- No --> F[SSRFError: scheme not allowed]
E -- Yes --> G[socket.getaddrinfo]
G --> H{DNS\nresolution OK?}
H -- No --> I[SSRFError: DNS failed]
H -- Yes --> J{Host in\nallowlist?}
J -- No --> K{All IPs\nglobal + not\ncloud-fabric?}
K -- No --> L[SSRFError: blocked address]
K -- Yes --> M{HTTPS +\nssl_verify=True?}
J -- Yes --> M
M -- Yes --> N[Return original URL\nTLS pins hostname]
M -- No --> O[Rewrite URL to\nvalidated IP]
N --> P[client.get\nfollow_redirects=False]
O --> P
P --> Q{Response\nis redirect?}
Q -- No --> R[Return response]
Q -- Yes --> S{Redirect\ncount < 10?}
S -- No --> T[SSRFError: too many redirects]
S -- Yes --> U[Extract Location header\nresolve vs original hostname]
U --> D
Reviews (5): Last reviewed commit: "style: use 'is not None' for port check ..." | Re-trigger Greptile
Greptile P1: six tests in test_url_utils.py performed real DNS lookups to example.com, violating the tests/test_litellm/ mock-only rule and risking offline CI failures. Add mock_dns_public and mock_dns_failure fixtures that monkeypatch socket.getaddrinfo on the url_utils module. Greptile P2: move 'import httpx' from inside _extract_redirect_url to module-level imports per CLAUDE.md style guide.
Two litellm-level flags wired through litellm_settings YAML: - user_url_validation (bool, default True): master switch. When False, safe_get/async_safe_get bypass validation and call client.get directly. - user_url_allowed_hosts (List[str], default []): per-host allowlist. Entries are 'host' (matches any port) or 'host:port' (port-specific). Matched hosts skip the blocked-networks check but still resolve DNS and still rewrite HTTP to the validated IP, preserving rebinding protection within the permitted name. Also fix an existing Host header bug: IPv6 literals (e.g. 2001:db8::1) were emitted unbracketed, producing ambiguous values like '2001:db8::1:8080' per RFC 7230 5.4. Bracket them consistently in _format_host_header.
…icast and Azure Wire Server
Replace the hand-maintained _BLOCKED_NETWORKS CIDR list with a
default-deny check based on ipaddress.is_global (RFC 6890 semantics,
implemented by Python's stdlib). Also reject multicast explicitly —
is_global returns True for public multicast allocations, which are
not legitimate HTTP targets.
Only globally-routable cloud-fabric IPs need explicit exceptions; the
canonical list contains one entry today: Azure Wire Server
(168.63.129.16), an in-fabric service reachable from any Azure VM.
Coverage delta picked up automatically via is_global:
- Alibaba Cloud metadata (100.100.100.200, CGNAT)
- Legacy Oracle metadata (192.0.0.192, IETF Protocol Assignments)
- IPv4 documentation ranges (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24)
- IPv4 reserved/future-use (240.0.0.0/4) and broadcast
- IPv6 documentation (2001:db8::/32)
Also fix two issues Greptile flagged:
- HTTP relative-redirect hops lost the original hostname because
_extract_redirect_url joined the Location against the rewritten
(IP-based) URL. Join against the pre-rewrite URL so the next hop's
Host header keeps the original hostname.
- Two unit tests performed real socket.getaddrinfo('localhost')
calls. Monkeypatch them.
Add coverage tests for every cloud-metadata IP from the canonical
SSRF dictionary (AWS/GCP/Azure/Alibaba/Oracle/DO/OpenStack) plus the
new multicast/reserved/documentation/broadcast ranges, and a
regression test for redirect-hostname preservation.
7c66edb
into
BerriAI:litellm_yj_apr17
Relevant issues
Adds SSRF protection for user-supplied URLs across multiple endpoints.
Reopens #25837 (original base
litellm_yj_apr15was deleted); rebased ontolitellm_internal_staging. No content changes.Pre-Submission checklist
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewType
🐛 Bug Fix
Changes
1. New shared utility:
litellm/litellm_core_utils/url_utils.pyvalidate_url(url)— resolves DNS, validates all IPs against private network ranges (RFC1918, link-local, loopback, IMDS, carrier-grade NAT). For HTTP URLs, rewrites the URL to the validated IP to prevent DNS rebinding. For HTTPS, relies on TLS certificate binding.safe_get(client, url)/async_safe_get(client, url)— fetch with SSRF protection on every redirect hop. Each redirect target is validated before the request is made. Caps at 10 redirects.2. Applied to user-supplied URL entry points
image_handling.py) —convert_url_to_base64andasync_convert_url_to_base64now usesafe_get/async_safe_gettoken_counter.py) — image dimension fetching usessafe_getbase_ingestion.py) — file URL fetching usesasync_safe_getopenapi_to_mcp_generator.py) — spec URL fetching usesasync_safe_getProtection against three SSRF vectors