[Fix] Enforce format constraints on provider URL parameters#26287
[Fix] Enforce format constraints on provider URL parameters#26287yuneng-berri merged 1 commit intolitellm_yj_apr22from
Conversation
Brings the Snowflake, S3 Vectors, Vertex AI, and Bedrock URL construction paths in line with the existing pattern of validating interpolated values before use.
Greptile SummaryThis PR adds regex format validation to five URL-construction sites across Snowflake, S3 Vectors, Vertex AI, and Bedrock integrations to prevent URL injection via caller-supplied parameters. The approach is sound, but one regex is too restrictive.
Confidence Score: 4/5Not safe to merge as-is due to a backwards-incompatible breaking change in Snowflake account ID validation. The Snowflake regex rejects the documented legacy account identifier format (dots), which is a P1 backwards-incompatible breakage for existing users. The other four validation sites are correct. Fixing the Snowflake regex (e.g., to litellm/llms/snowflake/utils.py — the account_id regex must be widened to allow dots before merging.
|
| Filename | Overview |
|---|---|
| litellm/llms/snowflake/utils.py | Adds regex validation for account_id, but ^[a-zA-Z0-9_-]+$ rejects the Snowflake legacy locator format (dots), breaking existing callers. |
| litellm/llms/s3_vectors/vector_stores/transformation.py | Adds ^[a-z][a-z0-9-]*$ validation for aws_region_name; regex is correct for all AWS region names. |
| litellm/llms/vertex_ai/common_utils.py | Adds regex validation for vertex_location; regex is correct for known region names but None input silently produces "https://None-aiplatform.googleapis.com". |
| litellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py | Pass-through copy of get_vertex_base_url gets same validation; same None passthrough issue as common_utils. |
| litellm/llms/bedrock/batches/transformation.py | Validates ARN region segment with ^[a-z][a-z0-9-]*$ before URL construction; regex is correct for all AWS regions. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Caller supplies parameter] --> B{Provider}
B -->|Snowflake| C["validate account_id\n^[a-zA-Z0-9_-]+$\n⚠️ dots not allowed"]
B -->|S3 Vectors| D["validate aws_region_name\n^[a-z][a-z0-9-]*$\n✅ correct"]
B -->|Vertex AI| E["vertex_location == 'global'?"]
B -->|Bedrock ARN| F["validate region from ARN\n^[a-z][a-z0-9-]*$\n✅ correct"]
C -->|invalid| X1[ValueError]
C -->|valid| Y1["https://account_id.snowflakecomputing.com/api/v2"]
D -->|invalid| X2[ValueError]
D -->|valid| Y2["https://s3vectors.region.api.aws"]
E -->|yes| Y3["https://aiplatform.googleapis.com"]
E -->|no & None| Z["⚠️ skips validation\nhttps://None-aiplatform.googleapis.com"]
E -->|no & not None| G["validate ^[a-z][a-z0-9-]*$"]
G -->|invalid| X3[ValueError]
G -->|valid| Y4["https://location-aiplatform.googleapis.com"]
F -->|invalid| X4[ValueError]
F -->|valid| Y5["https://bedrock.region.amazonaws.com/..."]
Reviews (1): Last reviewed commit: "fix: enforce format constraints on provi..." | Re-trigger Greptile
| if not re.match(r"^[a-zA-Z0-9_-]+$", account_id): | ||
| raise ValueError("Invalid account_id format") |
There was a problem hiding this comment.
Regex rejects valid Snowflake account identifiers containing dots
Snowflake supports a legacy account identifier format <account_locator>.<region_id>.<cloud> (e.g., xy12345.us-east-1.aws) that is interpolated directly into the URL as https://{account_id}.snowflakecomputing.com/api/v2. The regex ^[a-zA-Z0-9_-]+$ forbids dots, so any existing caller using the locator+region format will start getting ValueError after this change — a silent backwards-incompatible breakage (see rule on avoiding breaking changes without feature flags). The regex needs to permit dots, or the validation should be scoped to the new-format account names only.
Rule Used: What: avoid backwards-incompatible changes without... (source)
| if vertex_location is not None and not re.match( | ||
| r"^[a-z][a-z0-9-]*$", vertex_location | ||
| ): | ||
| raise ValueError("Invalid vertex_location format") | ||
| return f"https://{vertex_location}-aiplatform.googleapis.com" |
There was a problem hiding this comment.
None location silently produces an invalid URL
When vertex_location is None, the is not None guard skips the regex check and the function returns "https://None-aiplatform.googleapis.com" — a syntactically valid URL that will simply 404 or resolve to an unintended host. The same issue exists in the pass-through copy (llm_passthrough_endpoints.py). While this bug pre-dates the PR, the guard clause makes it explicit and adds a clear hook to fix it. Consider raising ValueError when vertex_location is None instead of silently passing through.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Summary
Provider integrations (Snowflake, S3 Vectors, Vertex AI, Bedrock) construct outbound request URLs by interpolating caller-supplied parameters such as
account_id,aws_region_name, andvertex_locationdirectly into URL strings. These parameters had no format constraints, so a caller could supply values containing URL metacharacters that alter the constructed hostname.This PR adds regex format validation at each URL construction point before the parameter is interpolated. Invalid values raise
ValueErrorimmediately.Changes
litellm/llms/snowflake/utils.py— validateaccount_idmatches^[a-zA-Z0-9_-]+$before building the Snowflake endpoint URLlitellm/llms/s3_vectors/vector_stores/transformation.py— validateaws_region_namematches^[a-z][a-z0-9-]*$before building the S3 Vectors endpoint URLlitellm/llms/vertex_ai/common_utils.py— validatevertex_locationmatches^[a-z][a-z0-9-]*$(or equals"global") inget_vertex_base_urllitellm/proxy/pass_through_endpoints/llm_passthrough_endpoints.py— same validation in the localget_vertex_base_urlcopy used by the pass-through handlerlitellm/llms/bedrock/batches/transformation.py— validate the region segment extracted from a Bedrock ARN matches^[a-z][a-z0-9-]*$before building the Bedrock endpoint URLTesting
attacker.com#,attacker.com?x=,attacker.com/evil, etc.) now raiseValueErrorat construction time for all three integrations.myorg-account123,us-east-1,us-central1,global) continue to produce correct URLs.pytest tests/test_litellm/llms/snowflake/ tests/test_litellm/llms/s3_vectors/ tests/test_litellm/llms/vertex_ai/test_vertex_ai_common_utils.py— 66 passed, 1 skippedpytest tests/local_testing/test_auth_utils.py— 39 passedType
🐛 Bug Fix