Conversation
…ATE (#25227) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR adds three related features: (1) per-entity multi-window budget tracking ( Several issues were flagged in the prior review round and remain open (merge conflict markers in
Confidence Score: 4/5Not ready to merge — open P0/P1 findings from prior rounds (merge conflict breaks test file, budget_limits missing from Prisma schema breaks all multi-budget-window runtime paths) plus a new P1 unbounded-query/N+1 write pattern in reset_budget_windows. Score of 4 is appropriate: there is one confirmed new P1 finding (unbounded find_many + N+1 updates in reset_budget_windows), and prior-round P1/P0 items (merge conflict markers in test file, budget_limits schema drift, raw SQL) are still open per file inspection. Multiple P1s keep the score at 4 rather than 5. litellm/proxy/common_utils/reset_budget_job.py (new N+1 / unbounded query), tests/test_litellm/llms/bedrock/test_bedrock_common_utils.py (merge conflict), litellm/proxy/schema.prisma + schema.prisma + litellm-proxy-extras/litellm_proxy_extras/schema.prisma (budget_limits column missing), enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py (raw SQL), litellm/proxy/proxy_server.py (tools/system dropped from count_tokens)
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_utils/reset_budget_job.py | Added reset_budget_windows() with unbounded find_many (no take/skip) and N+1 individual update calls inside a for-loop, violating CLAUDE.md DB rules for large result sets and batch writes |
| enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py | Extracted _expire_stale_rows using raw SQL execute_raw with batch LIMIT; raw SQL pattern flagged in prior review as violating CLAUDE.md |
| litellm/proxy/_types.py | Added BudgetLimitEntry, budget_limits on GenerateRequestBase/TeamBase/UpdateTeamRequest/LiteLLM_VerificationToken, allowed_models on LiteLLM_BudgetTable; all syntactically valid in current HEAD |
| litellm/proxy/auth/auth_checks.py | Added _check_team_member_model_access, _team_multi_budget_check, _virtual_key_multi_budget_check; all correctly follow the existing get_team_membership caching pattern |
| litellm/proxy/proxy_server.py | Added per-window spend counter increments for budget_limits keys/teams; tools/system params dropped from count_tokens call (flagged in prior review) |
| tests/test_litellm/llms/bedrock/test_bedrock_common_utils.py | Unresolved git merge conflict markers at lines 37-43 prevent file from parsing as valid Python — all tests in module fail at collection time (flagged in prior review) |
| litellm/constants.py | Added STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override, max(1,...) guard, and sensible default of 1000; clean addition |
| litellm-proxy-extras/litellm_proxy_extras/schema.prisma | Added allowed_models to LiteLLM_BudgetTable and default_team_member_models to LiteLLM_TeamTable; budget_limits still absent from both LiteLLM_TeamTable and LiteLLM_VerificationToken (flagged in prior review) |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming Request] --> B[user_api_key_auth]
B --> C[common_checks]
C --> D{team_object present?}
D -- Yes --> E[_team_max_budget_check]
D -- Yes --> F[_team_multi_budget_check]
F --> F1[get_current_spend\nspend:team:ID:window:DURATION]
F1 --> F2{spend >= max_budget?}
F2 -- Yes --> ERR[BudgetExceededError]
D -- Yes, with user_id --> G[_check_team_member_model_access]
G --> G1[get_team_membership\ncache-first DB lookup]
G1 --> G2{allowed_models non-empty?}
G2 -- Yes --> G3[_can_object_call_model]
G3 -- denied --> ERR
C --> H{valid_token present?}
H -- Yes --> I[_virtual_key_multi_budget_check]
I --> I1[get_current_spend\nspend:key:TOKEN:window:DURATION]
I1 --> I2{spend >= max_budget?}
I2 -- Yes --> ERR
C --> J[Request proceeds]
K[Response complete] --> L[increment_spend_counters]
L --> M[increment key spend counter]
L --> N[increment per-window key counters\nspend:key:TOKEN:window:DURATION]
L --> O[increment team spend counter]
L --> P[increment per-window team counters\nspend:team:ID:window:DURATION]
Q[ResetBudgetJob poll] --> R[reset_budget_windows]
R --> R1[find_many keys with budget_limits\n⚠️ no take/skip limit]
R1 --> R2[for each expired window\nreset Redis counter\nupdate DB one-by-one\n⚠️ N+1 writes]
R --> R3[find_many teams with budget_limits\n⚠️ no take/skip limit]
R3 --> R4[for each expired window\nreset Redis counter\nupdate DB one-by-one\n⚠️ N+1 writes]
Reviews (6): Last reviewed commit: "fix(tests): update upsert tests to refle..." | Re-trigger Greptile
| return await self.prisma_client.db.execute_raw( | ||
| """ | ||
| UPDATE "LiteLLM_ManagedObjectTable" | ||
| SET "status" = 'stale_expired' | ||
| WHERE "id" IN ( | ||
| SELECT "id" FROM "LiteLLM_ManagedObjectTable" | ||
| WHERE "file_purpose" = 'response' | ||
| AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired') | ||
| AND "created_at" < $1::timestamptz | ||
| ORDER BY "created_at" ASC | ||
| LIMIT $2 | ||
| ) | ||
| """, | ||
| cutoff, | ||
| batch_size, | ||
| ) |
There was a problem hiding this comment.
The CLAUDE.md rule says: "Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of execute_raw / query_raw". A Prisma-native implementation avoids hand-written SQL, keeps the code testable with simple mocks, and removes schema-drift risk — while still bounding the batch:
async def _expire_stale_rows(self, cutoff: datetime, batch_size: int) -> int:
stale = await self.prisma_client.db.litellm_managedobjecttable.find_many(
where={
"file_purpose": "response",
"status": {"not_in": ["completed", "complete", "failed", "expired", "cancelled", "stale_expired"]},
"created_at": {"lt": cutoff},
},
order={"created_at": "asc"},
take=batch_size,
select={"id": True},
)
if not stale:
return 0
await self.prisma_client.db.litellm_managedobjecttable.update_many(
where={"id": {"in": [r.id for r in stale]}},
data={"status": "stale_expired"},
)
return len(stale)This is two DB round-trips instead of one, but it stays in the ORM layer and matches how the rest of the proxy interacts with the DB. Note: spend_log_cleanup.py also uses execute_raw as a precedent for DELETE … WHERE … IN (SELECT … LIMIT n) — this is a style nudge rather than a hard blocker, but worth aligning with the stated convention.
Context Used: CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| async def _expire_stale_rows( | ||
| self, cutoff: datetime, batch_size: int | ||
| ) -> int: | ||
| """Execute the bounded UPDATE that marks stale rows as 'stale_expired'. | ||
|
|
||
| Isolated so it can be swapped / mocked in tests without touching the | ||
| orchestration logic in ``_cleanup_stale_managed_objects``. | ||
|
|
||
| Uses PostgreSQL syntax (``$1::timestamptz``, ``LIMIT``, double-quoted | ||
| identifiers) which is the only dialect the proxy supports — every | ||
| ``schema.prisma`` in the repo sets ``provider = "postgresql"``. | ||
| Same pattern as ``spend_log_cleanup.py``. | ||
| """ | ||
| return await self.prisma_client.db.execute_raw( | ||
| """ | ||
| UPDATE "LiteLLM_ManagedObjectTable" | ||
| SET "status" = 'stale_expired' | ||
| WHERE "id" IN ( | ||
| SELECT "id" FROM "LiteLLM_ManagedObjectTable" | ||
| WHERE "file_purpose" = 'response' | ||
| AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired') | ||
| AND "created_at" < $1::timestamptz | ||
| ORDER BY "created_at" ASC | ||
| LIMIT $2 | ||
| ) | ||
| """, | ||
| cutoff, | ||
| batch_size, | ||
| ) |
There was a problem hiding this comment.
No tests added for the new method
The PR's pre-submission checklist shows the test checkbox unchecked, and CLAUDE.md states: "Adding at least 1 test is a hard requirement". The _expire_stale_rows docstring explicitly calls out that it was "Isolated so it can be swapped / mocked in tests without touching the orchestration logic" — but no tests were added.
At minimum, a unit test in tests/test_litellm/ that mocks prisma_client.db.execute_raw should verify:
- The batch cap (
STALE_OBJECT_CLEANUP_BATCH_SIZE) is passed correctly. - The affected-row count is returned and triggers the warning log.
- Zero rows → no warning emitted.
Without tests the batch-size guard and the refactored flow cannot be automatically regressed against.
…nt (#21352) * return actual status code - /count_tokens endpoint * Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix greptile suggestion * rollback file * add test case --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
| try: | ||
| result = await provider_counter.count_tokens( | ||
| model_to_use=model_to_use or "", | ||
| messages=messages, # type: ignore | ||
| contents=contents, | ||
| deployment=deployment, | ||
| request_model=request.model, | ||
| ) |
There was a problem hiding this comment.
tools and system parameters silently dropped from token counting call
The refactor to add the httpx.HTTPStatusError handler also accidentally removed tools=tools and system=system from the count_tokens call. Both variables are captured from the request at lines 9064-9065 and are actively used by the Anthropic and Gemini token counters to include tool definitions and system prompts in the server-side token count.
With this change:
- Requests that include
toolswill receive an underestimated token count (tool definitions can be hundreds/thousands of tokens) - Requests that include a
systemprompt will similarly receive an undercount
The BaseTokenCounter.count_tokens abstract signature explicitly accepts both params (tools: Optional[List[Dict[str, Any]]] = None, system: Optional[Any] = None), and AnthropicTokenCounter passes them straight through to anthropic_count_tokens_handler.handle_count_tokens_request.
| try: | |
| result = await provider_counter.count_tokens( | |
| model_to_use=model_to_use or "", | |
| messages=messages, # type: ignore | |
| contents=contents, | |
| deployment=deployment, | |
| request_model=request.model, | |
| ) | |
| try: | |
| result = await provider_counter.count_tokens( | |
| model_to_use=model_to_use or "", | |
| messages=messages, # type: ignore | |
| contents=contents, | |
| deployment=deployment, | |
| request_model=request.model, | |
| tools=tools, | |
| system=system, | |
| ) |
…#24950) * fix(bedrock): strip [1m]/[200k] context window suffixes before cost lookup * test(bedrock): add test for [1m] context window suffix stripping in cost lookup * schema: add allowed_models to BudgetTable, default_team_member_models to TeamTable * migration: add allowed_models and default_team_member_models columns * types: add allowed_models to TeamMemberAddRequest, TeamMemberUpdateRequest, UpdateTeamRequest * utils: add allowed_models param to add_new_member, persist to budget table * common_utils: add allowed_models to _upsert_budget_and_membership * team endpoints: seed allowed_models on member_add, persist on member_update and team/update * auth: enforce per-member allowed_models at request time * networking: add allowed_models to Member type and teamMemberUpdateCall * TeamMemberTab: add Model Scope column showing per-member allowed_models * EditMembership: add Allowed Models multi-select field * TeamInfo: add default_team_member_models field in Settings tab * chore: sync schema.prisma copies from root * fix(team_member_update): update existing budget in-place instead of creating new one When a member already has a budget_id, patch only the fields the caller provided rather than always creating a fresh budget record. The old code ignored existing_budget_id entirely, so updating only allowed_models silently dropped the stored max_budget / tpm_limit / rpm_limit values. * fix(auth): pass llm_router to _check_team_member_model_access Without the router, _can_object_call_model cannot resolve wildcard model names (e.g. openai/*) or access-group names in allowed_models, causing legitimate requests to be denied. Thread the existing llm_router from _run_common_checks through to the new member-scope check. * feat(ui): add Team Member Settings accordion to Create Team modal Groups default_team_member_models, member budget/key duration, and tpm/rpm defaults into a single collapsible section. The model picker is filtered to only show the models selected for the team, and the copy distinguishes it from the team-level Models field. * feat(ui): consolidate Team Member Settings into accordion in edit team form Moves default_team_member_models + per-member budget/key/tpm/rpm fields into a collapsible "Team Member Settings" panel. Keeps the top-level form focused on team-wide settings (team models, team budget, tpm/rpm). * fix(ui): use tremor Accordion for Team Member Settings in edit team form * fix(ui): move Team Member Settings accordion above budget fields in Create Team * chore: fixes --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
| } | null; | ||
| created_at: string; | ||
| access_group_ids?: string[]; | ||
| <<<<<<< worktree-rustling-wishing-kite |
| access_group_ids?: string[]; | ||
| <<<<<<< worktree-rustling-wishing-kite | ||
| default_team_member_models?: string[]; | ||
| ======= |
| access_group_models?: string[]; | ||
| access_group_mcp_server_ids?: string[]; | ||
| access_group_agent_ids?: string[]; | ||
| >>>>>>> main |
| <<<<<<< worktree-rustling-wishing-kite | ||
| assert base_model == "anthropic.claude-3-5-sonnet-20240620-v1:0" | ||
|
|
||
| ======= | ||
| assert base_model == "anthropic.claude-haiku-4-5-20251001-v1:0" | ||
|
|
||
| >>>>>>> main |
There was a problem hiding this comment.
Unresolved git merge conflict markers
Lines 37–43 contain live conflict markers (<<<<<<< worktree-rustling-wishing-kite, =======, >>>>>>> main). Python cannot parse this file, so every test in test_bedrock_common_utils.py fails with a SyntaxError at import time.
The main-branch version is correct — the model under test is bedrock/us-gov.anthropic.claude-haiku-4-5-20251001-v1:0, so stripping the us-gov. cross-region prefix should yield anthropic.claude-haiku-4-5-20251001-v1:0. The worktree assertion (anthropic.claude-3-5-sonnet-20240620-v1:0) refers to a completely different model and would be wrong.
Resolve by keeping only the correct assertion:
| <<<<<<< worktree-rustling-wishing-kite | |
| assert base_model == "anthropic.claude-3-5-sonnet-20240620-v1:0" | |
| ======= | |
| assert base_model == "anthropic.claude-haiku-4-5-20251001-v1:0" | |
| >>>>>>> main | |
| assert base_model == "anthropic.claude-haiku-4-5-20251001-v1:0" |
#25109) * feat: multiple concurrent budget windows per API key and team (#24883) * feat(proxy): add BudgetLimitEntry type and wire budget_limits into key/team models * feat(schema): add budget_limits Json column to VerificationToken and TeamTable * feat(migrations): add migration for budget_limits column on keys and teams * feat(keys): initialize budget_limits windows with reset_at on key create/update * feat(teams): initialize budget_limits windows with reset_at on team create/update * feat(auth): add _virtual_key_multi_budget_check and _team_multi_budget_check * feat(auth): call multi-budget checks from common_checks for keys and teams * feat(proxy): increment per-window Redis spend counters after each request * feat(budget): reset individual budget windows on schedule via reset_budget_job * feat(ui): add hourly option to BudgetDurationDropdown * feat(ui): add budget_limits field to KeyResponse type * feat(ui): add Budget Windows editor to key edit view * feat(ui): add Budget Windows editor to create key form * fix(proxy): strip budget_limits=None before Prisma upsert to fix login 500 Prisma rejects nullable JSON fields (Json? without @default) when passed as Python None — it needs the field omitted entirely so the DB stores NULL via the column's nullable constraint. This was breaking /v2/login because the UI session key creation path hit the upsert with budget_limits=None. * ui(key-edit): use antd InputNumber+Button for budget windows, add reset hints * ui(create-key): use antd InputNumber+Button for budget windows, add reset hints * docs(users): add multiple budget windows section with API + dashboard walkthrough * fix: BudgetExceededError returns HTTP 429 instead of 400 - Add status_code=429 to BudgetExceededError class - auth_exception_handler hardcoded code=400 → code=429 * fix: no-op else branch in multi-budget auth checks causes KeyError - BudgetLimitEntry objects must be coerced via model_dump() not left as-is - Move _virtual_key_multi_budget_check into common_checks (was asymmetric with _team_multi_budget_check which already lived there) * fix: len() on JSON string returns char count not window count Guard with isinstance check + json.loads() before iterating per-window Redis counters in increment_spend_counters * fix: silent except:pass hides Redis reset failures in reset_budget_windows Log Redis counter reset failures as warnings so they are observable * test: add unit tests for multi-budget window enforcement 5 tests covering: no budget_limits passes, under budget passes, over hourly window raises 429, over monthly window raises 429, BudgetLimitEntry objects coerced without KeyError * fix: key per-window counters stable across reorders (duration key, not index) * fix: team+key per-window spend increments use duration key, not index * fix: budget window reset uses duration key; log failures instead of swallowing * refactor: extract BudgetWindowsEditor to shared component * refactor: key_edit_view imports BudgetWindowsEditor from shared component * refactor: create_key_button imports BudgetWindowsEditor from shared component --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * fix(reset_budget_job): extract _reset_expired_window helper to fix PLR0915 too many statements * feat(skills): Skills Registry & Hub — register skills, browse in AI Hub, public skill hub (#25118) * feat(skills): add domain and namespace fields to plugin types * feat(skills): store and return domain/namespace inside manifest_json * feat(skills): add /public/skill_hub endpoint for unauthenticated access * feat(skills): whitelist /public/skill_hub from auth requirements * feat(skills): add domain, namespace to Plugin and RegisterPluginRequest types * feat(skills): smart URL parser — paste github URL, auto-detect source type and name * feat(skills): replace enable toggle with Public badge, make rows clickable * feat(skills): add skill detail view with Overview and How to Use tabs * feat(skills): add MakeSkillPublicForm modal for publishing skills to the hub * feat(skills): rename panel to Skills, wire in skill detail view on row click * feat(skills): add skill hub table columns — name, description, domain, source, status * feat(skills): add SkillHubDashboard with stats row, domain dropdown filter, and table * feat(skills): add Skill Hub tab to AI Hub with Select Skills to Make Public button * feat(skills): move Skills to top-level nav item directly under MCP Servers * feat(skills): add skillHubPublicCall and NEXT_PUBLIC_BASE_URL support * feat(skills): add Skill Hub tab to public AI Hub page * feat(skills): add skills page routing in main app router * feat(skills): add /skills page route * chore: update package-lock after npm install * docs(skills): add Skills Gateway doc page with mermaid architecture diagram * docs(skills): add Skills Gateway to sidebar under Agent & MCP Gateway * docs(skills): add loom walkthrough video to Skills Gateway doc * chore: fixes --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
| budget_limits: Optional[List[BudgetLimitEntry]] = ( | ||
| None # multiple concurrent budget windows | ||
| default_team_member_models: Optional[List[str]] = ( | ||
| None # default allowed_models seeded onto new team members | ||
| ) |
There was a problem hiding this comment.
Syntax Error: missing closing
) on budget_limits field
The budget_limits field opens a parenthesis on line 1764 that is never closed before default_team_member_models begins on line 1766. The only ) on line 1768 is intended for default_team_member_models, so Python's implicit line continuation keeps the budget_limits expression open and sees an annotation statement (default_team_member_models: ...) inside a value expression — which raises SyntaxError: invalid syntax at import time. This makes the entire proxy unlaunchable.
| budget_limits: Optional[List[BudgetLimitEntry]] = ( | |
| None # multiple concurrent budget windows | |
| default_team_member_models: Optional[List[str]] = ( | |
| None # default allowed_models seeded onto new team members | |
| ) | |
| budget_limits: Optional[List[BudgetLimitEntry]] = ( | |
| None # multiple concurrent budget windows | |
| ) | |
| default_team_member_models: Optional[List[str]] = ( | |
| None # default allowed_models seeded onto new team members | |
| ) |
| import { useQueryClient } from "@tanstack/react-query"; | ||
| import { Accordion, AccordionBody, AccordionHeader, Button, Col, Grid, Text, TextInput, Title } from "@tremor/react"; | ||
| import { Button as Button2, Form, Input, Modal, Radio, Select, Switch, Tag, Tooltip } from "antd"; | ||
| import { Button as Button2, Form, Input, InputNumber, Modal, Radio, Select, Switch, Tag, Tooltip } from "antd"; |
| import { InfoCircleOutlined } from "@ant-design/icons"; | ||
| import { TextInput, Button as TremorButton } from "@tremor/react"; | ||
| import { Form, Input, Select, Switch, Tooltip } from "antd"; | ||
| import { Button as AntButton, Form, Input, InputNumber, Select, Switch, Tooltip } from "antd"; |
| For keys and teams with budget_limits, reset any individual windows where | ||
| reset_at <= now. Only the expired windows are reset; other windows are untouched. | ||
| """ | ||
| from litellm.proxy.proxy_server import spend_counter_cache |
| all_keys = await self.prisma_client.db.litellm_verificationtoken.find_many( | ||
| where={"budget_limits": {"not": None}} # type: ignore[arg-type] | ||
| ) | ||
| for key in all_keys: | ||
| raw = key.budget_limits # type: ignore[attr-defined] | ||
| if not raw: | ||
| continue | ||
| windows: list = raw if isinstance(raw, list) else json.loads(raw) | ||
| changed = False | ||
| for window in windows: | ||
| counter_key = f"spend:key:{key.token}:window:{window['budget_duration']}" | ||
| if await ResetBudgetJob._reset_expired_window( | ||
| window, counter_key, spend_counter_cache, now | ||
| ): | ||
| changed = True | ||
| if changed: | ||
| await self.prisma_client.db.litellm_verificationtoken.update( | ||
| where={"token": key.token}, | ||
| data={"budget_limits": json.dumps(windows)}, # type: ignore[arg-type] | ||
| ) | ||
| except Exception as e: | ||
| verbose_proxy_logger.exception( | ||
| "Failed to reset budget windows for keys: %s", e | ||
| ) | ||
|
|
||
| # --- Teams --- | ||
| try: | ||
| all_teams = await self.prisma_client.db.litellm_teamtable.find_many( | ||
| where={"budget_limits": {"not": None}} # type: ignore[arg-type] | ||
| ) | ||
| for team in all_teams: | ||
| raw = team.budget_limits # type: ignore[attr-defined] | ||
| if not raw: | ||
| continue | ||
| windows = raw if isinstance(raw, list) else json.loads(raw) | ||
| changed = False | ||
| for window in windows: | ||
| counter_key = f"spend:team:{team.team_id}:window:{window['budget_duration']}" | ||
| if await ResetBudgetJob._reset_expired_window( | ||
| window, counter_key, spend_counter_cache, now | ||
| ): | ||
| changed = True | ||
| if changed: | ||
| await self.prisma_client.db.litellm_teamtable.update( | ||
| where={"team_id": team.team_id}, | ||
| data={"budget_limits": json.dumps(windows)}, # type: ignore[arg-type] |
There was a problem hiding this comment.
budget_limits absent from all Prisma schema files — feature broken at runtime
The migration 20260401000000_add_budget_limits/migration.sql correctly adds budget_limits JSONB to LiteLLM_VerificationToken and LiteLLM_TeamTable in the database. However, none of the three Prisma schema files (schema.prisma, litellm/proxy/schema.prisma, litellm-proxy-extras/litellm_proxy_extras/schema.prisma) declare this field for those two models — confirmed by reading the current file, neither model block contains budget_limits.
Because Prisma generates its client from the schema (not from live DB introspection), every budget_limits reference in reset_budget_windows() will fail at runtime:
find_many(where={"budget_limits": {"not": None}})— Prisma rejects unknown filter fields with a validation errorkey.budget_limits/team.budget_limitsattribute accesses — returned objects never carry this field (alwaysNoneorAttributeError)update(data={"budget_limits": json.dumps(windows)})— Prisma rejects unknown write fields
All of these failures are silently swallowed by the surrounding except Exception blocks, so budget windows will never reset and per-window spend counters will accumulate indefinitely. The # type: ignore[arg-type] / # type: ignore[attr-defined] annotations throughout confirm the author is aware the field is absent from the Prisma schema.
Fix: Add budget_limits Json? to LiteLLM_VerificationToken and LiteLLM_TeamTable in all three schema.prisma files and regenerate the Prisma client. Per CLAUDE.md, schema changes must be kept in sync across all copies.
…or for existing budget_id
…ATE (#25227)
Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.
Two fixes to _cleanup_stale_managed_objects:
Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma).
Extract _expire_stale_rows as a separate method for testability.
Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes