fix(proxy): model_max_budget silently broken for routed models by dkindlund · Pull Request #25549 · BerriAI/litellm

dkindlund · 2026-04-11T03:28:59Z

Summary

model_max_budget spend tracking and enforcement use different model name formats, causing budgets to never be enforced for providers that decorate model names (Vertex AI, Bedrock, etc.)
Fix: use model_group (the user-facing alias) from StandardLoggingPayload for spend tracking cache keys, aligning them with the enforcement path

Root cause

The _PROXY_VirtualKeyModelMaxBudgetLimiter has two code paths that must use the same cache key for budget enforcement to work:

Spend tracking (async_log_success_event): called after a successful LLM call, increments spend in DualCache using a cache key derived from the model name
Budget enforcement (is_key_within_model_budget via user_api_key_auth): called before each request, reads current spend from DualCache using a cache key derived from the model name

The problem: these paths derive the model name from different sources:

Path	Source	Example value
Tracking	`standard_logging_payload["model"]`	`vertex_ai/claude-opus-4-6@default`
Enforcement	`request_data["model"]`	`claude-opus-4-6`

The existing fallback (_get_model_without_custom_llm_provider) strips the provider prefix (vertex_ai/) but not version suffixes (@default), so the stripped value claude-opus-4-6@default still doesn't match claude-opus-4-6.

Result: spend is tracked under cache key virtual_key_spend:{hash}:vertex_ai/claude-opus-4-6@default:86400 but enforcement reads from virtual_key_spend:{hash}:claude-opus-4-6:86400 — always returning None, silently allowing all requests through regardless of budget.

Reproduction steps

Configure a proxy with a Vertex AI model deployment:

model_list:
  - model_name: claude-opus-4-6
    litellm_params:
      model: vertex_ai/claude-opus-4-6@default
      vertex_project: my-project
      vertex_location: us-east4

Create a virtual key with model_max_budget:

curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master" \
  -d '{
    "model_max_budget": {
      "claude-opus-4-6": {"budget_limit": "1.00", "time_period": "1d"}
    }
  }'

Send requests exceeding the $1.00 budget — all succeed because spend tracking and enforcement use mismatched cache keys
Confirm via /key/info that model_spend remains {} (empty) despite recorded spend in /spend/logs

Fix

In async_log_success_event, use standard_logging_payload["model_group"] (the user-facing model alias, e.g. claude-opus-4-6) instead of standard_logging_payload["model"] (the deployment name). Falls back to model when model_group is None (non-proxy/non-router usage).

This aligns the tracking cache key with the enforcement cache key so budgets are actually enforced.

Affected providers

Any provider where the deployment model name differs from the model group alias:

Vertex AI: vertex_ai/claude-opus-4-6@default vs claude-opus-4-6
Bedrock: bedrock/anthropic.claude-v2 vs claude-v2
Azure: azure/gpt-4 vs gpt-4
Any custom model name with provider prefix or version suffix

Related issues

[Bug]: Virtual Key Budget tracking does not work properly for routing #15223 — "Virtual Key Budget tracking does not work properly for routing" (same root cause, closed as stale)
Issues with Key-Level Model Limits (TPM/RPM/Budget) Enforcement and Router Fallback (DB Mode, Enterprise) #10052 — "Issues with Key-Level Model Limits Enforcement" (same root cause, closed without fix)

Test plan

New test: test_async_log_success_event_uses_model_group_for_cache_key — verifies spend is tracked under model_group name when present
New test: test_async_log_success_event_falls_back_to_model_when_no_model_group — verifies fallback to model field when model_group is None
New test: test_async_log_success_event_end_user_uses_model_group — verifies end-user budget tracking also uses model_group
All 11 existing + new tests pass: make test-unit on test_unit_test_max_model_budget_limiter.py

Generated with Claude Code

… key The model_max_budget limiter tracks spend in one code path (async_log_success_event) and enforces budget limits in another (is_key_within_model_budget via user_api_key_auth). These two paths used different model name formats to build cache keys: - Tracking used standard_logging_payload["model"], which is the deployment-level model name (e.g. "vertex_ai/claude-opus-4-6@default") - Enforcement used request_data["model"], which is the model group alias (e.g. "claude-opus-4-6") Because the cache keys never matched, the enforcement path always read None for current spend, silently allowing all requests through even after the budget was exceeded. This affected any provider that decorates model names with provider prefixes or version suffixes (Vertex AI, Bedrock, etc.). Fix: use model_group (the user-facing alias) from StandardLoggingPayload for spend tracking, falling back to model when model_group is None. This aligns the tracking cache key with the enforcement cache key. Fixes the same root cause reported in BerriAI#15223 and BerriAI#10052. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-04-11T03:29:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 11, 2026 3:30am

codspeed-hq · 2026-04-11T03:31:21Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing dkindlund:fix/model-max-budget-name-mismatch (1c599ca) with main (4e12d3c)}

greptile-apps · 2026-04-11T03:32:02Z

Greptile Summary

This PR fixes a longstanding bug where model_max_budget enforcement was silently bypassed for any provider that decorates model names (Vertex AI, Bedrock, Azure). The root cause was a cache-key mismatch: spend tracking used standard_logging_payload[\"model\"] (e.g. vertex_ai/claude-opus-4-6@default) while budget enforcement used request_data[\"model\"] (e.g. claude-opus-4-6), so _get_virtual_key_spend_for_model always returned None and every request passed through.

The fix is a single-line change: prefer model_group (the user-facing alias populated by the router) over model when building the tracking cache key, with a safe fallback to model for non-router/non-proxy usage where model_group is None.

Confidence Score: 5/5

Safe to merge — targeted one-line fix with no risk to existing working deployments.

The change is minimal and surgical: a single expression change in async_log_success_event with a correct fallback for non-router usage. All three new tests are mock-based, don't touch existing assertions, and exercise the new model_group path as well as the fallback. No existing tests were modified. No security, data-loss, or backwards-compatibility concerns; deployments where model_group == model (e.g. direct OpenAI usage) are unaffected by the or-chain logic.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/proxy/hooks/model_max_budget_limiter.py	One-line fix in async_log_success_event: use model_group (user-facing alias) over model (deployment name) for spend tracking cache key, with correct fallback for non-router usage.
tests/proxy_unit_tests/test_unit_test_max_model_budget_limiter.py	Three new unit tests added covering model_group path, None-fallback path, and end-user model_group path; all mock-based with no real network calls; existing tests are unchanged.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as Proxy (user_api_key_auth)
    participant Limiter as VirtualKeyModelMaxBudgetLimiter
    participant Cache as DualCache

    Note over Client,Cache: Before fix — cache key mismatch

    Client->>Proxy: POST /chat/completions model=claude-opus-4-6
    Proxy->>Limiter: is_key_within_model_budget(model="claude-opus-4-6")
    Limiter->>Cache: GET virtual_key_spend:{hash}:claude-opus-4-6:86400
    Cache-->>Limiter: None (miss — always)
    Limiter-->>Proxy: within budget (false positive)
    Proxy-->>Client: 200 OK
    Proxy->>Limiter: async_log_success_event
    Limiter->>Cache: SET virtual_key_spend:{hash}:vertex_ai/claude-opus-4-6@default:86400

    Note over Client,Cache: After fix — cache keys aligned

    Client->>Proxy: POST /chat/completions model=claude-opus-4-6
    Proxy->>Limiter: is_key_within_model_budget(model="claude-opus-4-6")
    Limiter->>Cache: GET virtual_key_spend:{hash}:claude-opus-4-6:86400
    Cache-->>Limiter: 0.95 (tracked spend)
    Limiter-->>Proxy: within budget (correct)
    Proxy-->>Client: 200 OK
    Proxy->>Limiter: async_log_success_event (model_group="claude-opus-4-6")
    Limiter->>Cache: SET virtual_key_spend:{hash}:claude-opus-4-6:86400

_{Reviews (1): Last reviewed commit: "fix(proxy): use model_group for model_ma..." | Re-trigger Greptile}

codecov · 2026-04-11T03:32:13Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/hooks/model_max_budget_limiter.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

vercel bot deployed to Preview April 11, 2026 03:30 View deployment

dkindlund mentioned this pull request Apr 11, 2026

update_cache resurrects stale key objects after /key/update, bypassing changes indefinitely under load #25553

Open

krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_11_2026 April 12, 2026 02:37

krrish-berri-2 merged commit 17e145a into BerriAI:litellm_oss_staging_04_11_2026 Apr 12, 2026
48 of 51 checks passed

krrish-berri-2 mentioned this pull request Apr 12, 2026

fix: use model_group instead of model for per-model budget cache keys… #25381

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(proxy): model_max_budget silently broken for routed models#25549

fix(proxy): model_max_budget silently broken for routed models#25549
krrish-berri-2 merged 1 commit intoBerriAI:litellm_oss_staging_04_11_2026from
dkindlund:fix/model-max-budget-name-mismatch

dkindlund commented Apr 11, 2026

Uh oh!

vercel bot commented Apr 11, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 11, 2026

Uh oh!

greptile-apps bot commented Apr 11, 2026

Important Files Changed

Uh oh!

codecov bot commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dkindlund commented Apr 11, 2026

Summary

Root cause

Reproduction steps

Fix

Affected providers

Related issues

Test plan

Uh oh!

vercel bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Apr 11, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Apr 11, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

codecov bot commented Apr 11, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Apr 11, 2026 •

edited

Loading