Skip to content

fix(proxy): model_max_budget silently broken for routed models#25549

Merged
krrish-berri-2 merged 1 commit intoBerriAI:litellm_oss_staging_04_11_2026from
dkindlund:fix/model-max-budget-name-mismatch
Apr 12, 2026
Merged

fix(proxy): model_max_budget silently broken for routed models#25549
krrish-berri-2 merged 1 commit intoBerriAI:litellm_oss_staging_04_11_2026from
dkindlund:fix/model-max-budget-name-mismatch

Conversation

@dkindlund
Copy link
Copy Markdown
Contributor

Summary

  • model_max_budget spend tracking and enforcement use different model name formats, causing budgets to never be enforced for providers that decorate model names (Vertex AI, Bedrock, etc.)
  • Fix: use model_group (the user-facing alias) from StandardLoggingPayload for spend tracking cache keys, aligning them with the enforcement path

Root cause

The _PROXY_VirtualKeyModelMaxBudgetLimiter has two code paths that must use the same cache key for budget enforcement to work:

  1. Spend tracking (async_log_success_event): called after a successful LLM call, increments spend in DualCache using a cache key derived from the model name
  2. Budget enforcement (is_key_within_model_budget via user_api_key_auth): called before each request, reads current spend from DualCache using a cache key derived from the model name

The problem: these paths derive the model name from different sources:

Path Source Example value
Tracking standard_logging_payload["model"] vertex_ai/claude-opus-4-6@default
Enforcement request_data["model"] claude-opus-4-6

The existing fallback (_get_model_without_custom_llm_provider) strips the provider prefix (vertex_ai/) but not version suffixes (@default), so the stripped value claude-opus-4-6@default still doesn't match claude-opus-4-6.

Result: spend is tracked under cache key virtual_key_spend:{hash}:vertex_ai/claude-opus-4-6@default:86400 but enforcement reads from virtual_key_spend:{hash}:claude-opus-4-6:86400 — always returning None, silently allowing all requests through regardless of budget.

Reproduction steps

  1. Configure a proxy with a Vertex AI model deployment:

    model_list:
      - model_name: claude-opus-4-6
        litellm_params:
          model: vertex_ai/claude-opus-4-6@default
          vertex_project: my-project
          vertex_location: us-east4
  2. Create a virtual key with model_max_budget:

    curl -X POST http://localhost:4000/key/generate \
      -H "Authorization: Bearer sk-master" \
      -d '{
        "model_max_budget": {
          "claude-opus-4-6": {"budget_limit": "1.00", "time_period": "1d"}
        }
      }'
  3. Send requests exceeding the $1.00 budget — all succeed because spend tracking and enforcement use mismatched cache keys

  4. Confirm via /key/info that model_spend remains {} (empty) despite recorded spend in /spend/logs

Fix

In async_log_success_event, use standard_logging_payload["model_group"] (the user-facing model alias, e.g. claude-opus-4-6) instead of standard_logging_payload["model"] (the deployment name). Falls back to model when model_group is None (non-proxy/non-router usage).

This aligns the tracking cache key with the enforcement cache key so budgets are actually enforced.

Affected providers

Any provider where the deployment model name differs from the model group alias:

  • Vertex AI: vertex_ai/claude-opus-4-6@default vs claude-opus-4-6
  • Bedrock: bedrock/anthropic.claude-v2 vs claude-v2
  • Azure: azure/gpt-4 vs gpt-4
  • Any custom model name with provider prefix or version suffix

Related issues

Test plan

  • New test: test_async_log_success_event_uses_model_group_for_cache_key — verifies spend is tracked under model_group name when present
  • New test: test_async_log_success_event_falls_back_to_model_when_no_model_group — verifies fallback to model field when model_group is None
  • New test: test_async_log_success_event_end_user_uses_model_group — verifies end-user budget tracking also uses model_group
  • All 11 existing + new tests pass: make test-unit on test_unit_test_max_model_budget_limiter.py

Generated with Claude Code

… key

The model_max_budget limiter tracks spend in one code path
(async_log_success_event) and enforces budget limits in another
(is_key_within_model_budget via user_api_key_auth). These two paths
used different model name formats to build cache keys:

- Tracking used standard_logging_payload["model"], which is the
  deployment-level model name (e.g. "vertex_ai/claude-opus-4-6@default")
- Enforcement used request_data["model"], which is the model group
  alias (e.g. "claude-opus-4-6")

Because the cache keys never matched, the enforcement path always read
None for current spend, silently allowing all requests through even
after the budget was exceeded. This affected any provider that decorates
model names with provider prefixes or version suffixes (Vertex AI,
Bedrock, etc.).

Fix: use model_group (the user-facing alias) from StandardLoggingPayload
for spend tracking, falling back to model when model_group is None.
This aligns the tracking cache key with the enforcement cache key.

Fixes the same root cause reported in BerriAI#15223 and BerriAI#10052.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 11, 2026 3:30am

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 11, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing dkindlund:fix/model-max-budget-name-mismatch (1c599ca) with main (4e12d3c)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR fixes a longstanding bug where model_max_budget enforcement was silently bypassed for any provider that decorates model names (Vertex AI, Bedrock, Azure). The root cause was a cache-key mismatch: spend tracking used standard_logging_payload[\"model\"] (e.g. vertex_ai/claude-opus-4-6@default) while budget enforcement used request_data[\"model\"] (e.g. claude-opus-4-6), so _get_virtual_key_spend_for_model always returned None and every request passed through.

The fix is a single-line change: prefer model_group (the user-facing alias populated by the router) over model when building the tracking cache key, with a safe fallback to model for non-router/non-proxy usage where model_group is None.

Confidence Score: 5/5

Safe to merge — targeted one-line fix with no risk to existing working deployments.

The change is minimal and surgical: a single expression change in async_log_success_event with a correct fallback for non-router usage. All three new tests are mock-based, don't touch existing assertions, and exercise the new model_group path as well as the fallback. No existing tests were modified. No security, data-loss, or backwards-compatibility concerns; deployments where model_group == model (e.g. direct OpenAI usage) are unaffected by the or-chain logic.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/hooks/model_max_budget_limiter.py One-line fix in async_log_success_event: use model_group (user-facing alias) over model (deployment name) for spend tracking cache key, with correct fallback for non-router usage.
tests/proxy_unit_tests/test_unit_test_max_model_budget_limiter.py Three new unit tests added covering model_group path, None-fallback path, and end-user model_group path; all mock-based with no real network calls; existing tests are unchanged.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Proxy as Proxy (user_api_key_auth)
    participant Limiter as VirtualKeyModelMaxBudgetLimiter
    participant Cache as DualCache

    Note over Client,Cache: Before fix — cache key mismatch

    Client->>Proxy: POST /chat/completions model=claude-opus-4-6
    Proxy->>Limiter: is_key_within_model_budget(model="claude-opus-4-6")
    Limiter->>Cache: GET virtual_key_spend:{hash}:claude-opus-4-6:86400
    Cache-->>Limiter: None (miss — always)
    Limiter-->>Proxy: within budget (false positive)
    Proxy-->>Client: 200 OK
    Proxy->>Limiter: async_log_success_event
    Limiter->>Cache: SET virtual_key_spend:{hash}:vertex_ai/claude-opus-4-6@default:86400

    Note over Client,Cache: After fix — cache keys aligned

    Client->>Proxy: POST /chat/completions model=claude-opus-4-6
    Proxy->>Limiter: is_key_within_model_budget(model="claude-opus-4-6")
    Limiter->>Cache: GET virtual_key_spend:{hash}:claude-opus-4-6:86400
    Cache-->>Limiter: 0.95 (tracked spend)
    Limiter-->>Proxy: within budget (correct)
    Proxy-->>Client: 200 OK
    Proxy->>Limiter: async_log_success_event (model_group="claude-opus-4-6")
    Limiter->>Cache: SET virtual_key_spend:{hash}:claude-opus-4-6:86400
Loading

Reviews (1): Last reviewed commit: "fix(proxy): use model_group for model_ma..." | Re-trigger Greptile

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/proxy/hooks/model_max_budget_limiter.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@krrish-berri-2 krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_11_2026 April 12, 2026 02:37
@krrish-berri-2 krrish-berri-2 merged commit 17e145a into BerriAI:litellm_oss_staging_04_11_2026 Apr 12, 2026
48 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants