fix(proxy): model_max_budget silently broken for routed models#25549
Conversation
… key The model_max_budget limiter tracks spend in one code path (async_log_success_event) and enforces budget limits in another (is_key_within_model_budget via user_api_key_auth). These two paths used different model name formats to build cache keys: - Tracking used standard_logging_payload["model"], which is the deployment-level model name (e.g. "vertex_ai/claude-opus-4-6@default") - Enforcement used request_data["model"], which is the model group alias (e.g. "claude-opus-4-6") Because the cache keys never matched, the enforcement path always read None for current spend, silently allowing all requests through even after the budget was exceeded. This affected any provider that decorates model names with provider prefixes or version suffixes (Vertex AI, Bedrock, etc.). Fix: use model_group (the user-facing alias) from StandardLoggingPayload for spend tracking, falling back to model when model_group is None. This aligns the tracking cache key with the enforcement cache key. Fixes the same root cause reported in BerriAI#15223 and BerriAI#10052. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a longstanding bug where The fix is a single-line change: prefer Confidence Score: 5/5Safe to merge — targeted one-line fix with no risk to existing working deployments. The change is minimal and surgical: a single expression change in async_log_success_event with a correct fallback for non-router usage. All three new tests are mock-based, don't touch existing assertions, and exercise the new model_group path as well as the fallback. No existing tests were modified. No security, data-loss, or backwards-compatibility concerns; deployments where model_group == model (e.g. direct OpenAI usage) are unaffected by the or-chain logic. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/hooks/model_max_budget_limiter.py | One-line fix in async_log_success_event: use model_group (user-facing alias) over model (deployment name) for spend tracking cache key, with correct fallback for non-router usage. |
| tests/proxy_unit_tests/test_unit_test_max_model_budget_limiter.py | Three new unit tests added covering model_group path, None-fallback path, and end-user model_group path; all mock-based with no real network calls; existing tests are unchanged. |
Sequence Diagram
sequenceDiagram
participant Client
participant Proxy as Proxy (user_api_key_auth)
participant Limiter as VirtualKeyModelMaxBudgetLimiter
participant Cache as DualCache
Note over Client,Cache: Before fix — cache key mismatch
Client->>Proxy: POST /chat/completions model=claude-opus-4-6
Proxy->>Limiter: is_key_within_model_budget(model="claude-opus-4-6")
Limiter->>Cache: GET virtual_key_spend:{hash}:claude-opus-4-6:86400
Cache-->>Limiter: None (miss — always)
Limiter-->>Proxy: within budget (false positive)
Proxy-->>Client: 200 OK
Proxy->>Limiter: async_log_success_event
Limiter->>Cache: SET virtual_key_spend:{hash}:vertex_ai/claude-opus-4-6@default:86400
Note over Client,Cache: After fix — cache keys aligned
Client->>Proxy: POST /chat/completions model=claude-opus-4-6
Proxy->>Limiter: is_key_within_model_budget(model="claude-opus-4-6")
Limiter->>Cache: GET virtual_key_spend:{hash}:claude-opus-4-6:86400
Cache-->>Limiter: 0.95 (tracked spend)
Limiter-->>Proxy: within budget (correct)
Proxy-->>Client: 200 OK
Proxy->>Limiter: async_log_success_event (model_group="claude-opus-4-6")
Limiter->>Cache: SET virtual_key_spend:{hash}:claude-opus-4-6:86400
Reviews (1): Last reviewed commit: "fix(proxy): use model_group for model_ma..." | Re-trigger Greptile
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
17e145a
into
BerriAI:litellm_oss_staging_04_11_2026
Summary
model_max_budgetspend tracking and enforcement use different model name formats, causing budgets to never be enforced for providers that decorate model names (Vertex AI, Bedrock, etc.)model_group(the user-facing alias) fromStandardLoggingPayloadfor spend tracking cache keys, aligning them with the enforcement pathRoot cause
The
_PROXY_VirtualKeyModelMaxBudgetLimiterhas two code paths that must use the same cache key for budget enforcement to work:async_log_success_event): called after a successful LLM call, increments spend inDualCacheusing a cache key derived from the model nameis_key_within_model_budgetviauser_api_key_auth): called before each request, reads current spend fromDualCacheusing a cache key derived from the model nameThe problem: these paths derive the model name from different sources:
standard_logging_payload["model"]vertex_ai/claude-opus-4-6@defaultrequest_data["model"]claude-opus-4-6The existing fallback (
_get_model_without_custom_llm_provider) strips the provider prefix (vertex_ai/) but not version suffixes (@default), so the stripped valueclaude-opus-4-6@defaultstill doesn't matchclaude-opus-4-6.Result: spend is tracked under cache key
virtual_key_spend:{hash}:vertex_ai/claude-opus-4-6@default:86400but enforcement reads fromvirtual_key_spend:{hash}:claude-opus-4-6:86400— always returningNone, silently allowing all requests through regardless of budget.Reproduction steps
Configure a proxy with a Vertex AI model deployment:
Create a virtual key with
model_max_budget:Send requests exceeding the $1.00 budget — all succeed because spend tracking and enforcement use mismatched cache keys
Confirm via
/key/infothatmodel_spendremains{}(empty) despite recorded spend in/spend/logsFix
In
async_log_success_event, usestandard_logging_payload["model_group"](the user-facing model alias, e.g.claude-opus-4-6) instead ofstandard_logging_payload["model"](the deployment name). Falls back tomodelwhenmodel_groupisNone(non-proxy/non-router usage).This aligns the tracking cache key with the enforcement cache key so budgets are actually enforced.
Affected providers
Any provider where the deployment model name differs from the model group alias:
vertex_ai/claude-opus-4-6@defaultvsclaude-opus-4-6bedrock/anthropic.claude-v2vsclaude-v2azure/gpt-4vsgpt-4Related issues
Test plan
test_async_log_success_event_uses_model_group_for_cache_key— verifies spend is tracked under model_group name when presenttest_async_log_success_event_falls_back_to_model_when_no_model_group— verifies fallback to model field when model_group is Nonetest_async_log_success_event_end_user_uses_model_group— verifies end-user budget tracking also uses model_groupmake test-unitontest_unit_test_max_model_budget_limiter.pyGenerated with Claude Code