fix(proxy): fix virtual key projected-spend soft budget alerts#25838
Conversation
…d alerts The projected-spend alert in _update_key_cache read from existing_spend_obj.litellm_budget_table["soft_budget"], but the nested dict is never populated for virtual keys (the combined_view SQL maps budget fields to flat top-level attributes instead). This made the check dead code — it silently short-circuited on every request, and when unblocked, crashed update_cache with a Pydantic ValidationError because _get_projected_spend_over_limit returns a date object but CallInfo.projected_exceeded_date expects str. Fixes: read from the flat existing_spend_obj.soft_budget field that IS populated, and stringify projected_exceeded_date. Also marks team soft budget email alerts as enterprise in docs. Closes #20324
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes two bugs in the virtual key projected-spend soft budget alerting path in Confidence Score: 5/5Safe to merge — both changes are targeted bug fixes with no backward-incompatible impact. Both fixes are straightforward and correct: reading from the populated flat field instead of the empty nested dict, and converting a date to string for a string-typed field. The only findings are P2: a pre-existing unimplemented cooldown placeholder (now more observable since the alert path is unblocked) and missing unit tests. Neither blocks merge. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/proxy_server.py | Fixes the virtual key projected-spend soft budget check: reads soft_budget from the flat LiteLLM_VerificationTokenView field (which is populated) instead of the never-populated litellm_budget_table dict, and stringifies the date return value from _get_projected_spend_over_limit to satisfy CallInfo.projected_exceeded_date: Optional[str]. |
| docs/my-website/docs/proxy/ui_team_soft_budget_alerts.md | Adds an enterprise callout at the top of the team soft budget alerts doc page, marking the feature as requiring an enterprise license. |
Sequence Diagram
sequenceDiagram
participant Req as Incoming Request
participant UC as update_cache()
participant KC as _update_key_cache()
participant Cache as user_api_key_cache
participant PS as _is_projected_spend_over_limit()
participant GP as _get_projected_spend_over_limit()
participant Alert as proxy_logging_obj.budget_alerts()
Req->>UC: response_cost
UC->>KC: token, response_cost
KC->>Cache: async_get_cache(hashed_token)
Cache-->>KC: LiteLLM_VerificationTokenView
KC->>KC: new_spend = existing_spend + response_cost
KC->>PS: current_spend=new_spend, soft_budget_limit=obj.soft_budget
Note over PS: BEFORE: read from litellm_budget_table dict (never populated) AFTER: read from flat soft_budget field
PS-->>KC: True / False
alt projected spend over limit
KC->>GP: current_spend=new_spend, soft_budget_limit=obj.soft_budget
GP-->>KC: (projected_spend, date_object)
KC->>KC: projected_exceeded_date = str(date_object)
Note over KC: BEFORE: passed raw date → Pydantic ValidationError AFTER: str() converts to ISO string
KC->>Alert: CallInfo(projected_exceeded_date=str, ...)
Alert-->>Req: email alert dispatched
end
KC->>Cache: update spend in cache
Comments Outside Diff (1)
-
litellm/proxy/proxy_server.py, line 1904 (link)Cooldown is never actually set
The
# set cooldown on alertcomment at line 1904 is a placeholder with no implementation —soft_budget_cooldownis never flipped toTrueanywhere in the codebase. This means every request after the threshold is crossed will re-trigger the projected-spend alert rather than rate-limiting notifications. Now that the alert path is unblocked by this fix, this pre-existing gap becomes observable in production. Consider settingexisting_spend_obj.soft_budget_cooldown = True(and updating the cache) after firing the alert to suppress repeated notifications within the same cache TTL window.
Reviews (1): Last reviewed commit: "fix(proxy): use flat soft_budget field f..." | Re-trigger Greptile
| projected_spend, projected_exceeded_date = _get_projected_spend_over_limit( | ||
| current_spend=new_spend, | ||
| soft_budget_limit=existing_spend_obj.litellm_budget_table.get( | ||
| "soft_budget", None | ||
| ), | ||
| soft_budget_limit=existing_spend_obj.soft_budget, | ||
| ) # type: ignore |
There was a problem hiding this comment.
No unit test for the corrected code path
The CLAUDE.md template asks for at least one test in tests/litellm/. The existing tests in test_proxy_utils.py cover _get_projected_spend_over_limit in isolation but not the _update_key_cache path that reads soft_budget from the flat field. A minimal test with a mocked LiteLLM_VerificationTokenView (with soft_budget set and litellm_budget_table=None) would guard against the regression re-appearing.
2dd060b
into
litellm_internal_staging
Summary
_update_key_cache(proxy_server.py) read fromexisting_spend_obj.litellm_budget_table["soft_budget"]— a nested dict that is never populated for virtual keys. The combined_view SQL maps budget fields to flat top-level attributes (soft_budget,max_budget, etc.) but never constructs the dict. This made the projected-spend check dead code that silently short-circuited on every request.ValidationErrorbecause_get_projected_spend_over_limitreturns adateobject butCallInfo.projected_exceeded_dateexpectsstr.Fix
existing_spend_obj.soft_budget(the flat field that IS populated via the combined_view SQL mapping) instead ofexisting_spend_obj.litellm_budget_table["soft_budget"]projected_exceeded_datebefore passing toCallInfoScreenshots
before

after


Test plan
soft_budget=0.0001assigned to a user with emailupdate_cachecrashed and spend stopped accumulating)ValidationErrorin logsCloses #20324