Skip to content

[April 6th] - Ishaan#25238

Closed
ishaan-berri wants to merge 8 commits intomainfrom
litellm_ishaan_april6
Closed

[April 6th] - Ishaan#25238
ishaan-berri wants to merge 8 commits intomainfrom
litellm_ishaan_april6

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

…ATE (#25227)

  • Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once.

  • Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

  1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma).

  2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

…ATE (#25227)

* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.

* Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 7, 2026 0:27am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 6, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ otaviofbrito
❌ ishaan-berri
❌ github-actions[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 6, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_ishaan_april6 (1c238b6) with main (d132b1b)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 6, 2026

Greptile Summary

This PR adds three related features: (1) per-entity multi-window budget tracking (budget_limits) for both keys and teams, (2) per-team-member model scoping (allowed_models on the budget table, default_team_member_models on the team table), and (3) a batched stale managed-object cleanup for the responses API.

Several issues were flagged in the prior review round and remain open (merge conflict markers in test_bedrock_common_utils.py, budget_limits absent from all three schema.prisma files, raw SQL in _expire_stale_rows, tools/system dropped from the count_tokens call). This round surfaces one new finding:

  • reset_budget_windows (P1): The new reset_budget_jobs.py method loads all keys and teams with budget_limits in a single unbounded find_many (no take/skip) and issues one update() per dirty record inside the loop. Both patterns violate CLAUDE.md's DB rules (bound large result sets with cursor pagination; batch writes rather than per-row calls)."

Confidence Score: 4/5

Not ready to merge — open P0/P1 findings from prior rounds (merge conflict breaks test file, budget_limits missing from Prisma schema breaks all multi-budget-window runtime paths) plus a new P1 unbounded-query/N+1 write pattern in reset_budget_windows.

Score of 4 is appropriate: there is one confirmed new P1 finding (unbounded find_many + N+1 updates in reset_budget_windows), and prior-round P1/P0 items (merge conflict markers in test file, budget_limits schema drift, raw SQL) are still open per file inspection. Multiple P1s keep the score at 4 rather than 5.

litellm/proxy/common_utils/reset_budget_job.py (new N+1 / unbounded query), tests/test_litellm/llms/bedrock/test_bedrock_common_utils.py (merge conflict), litellm/proxy/schema.prisma + schema.prisma + litellm-proxy-extras/litellm_proxy_extras/schema.prisma (budget_limits column missing), enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py (raw SQL), litellm/proxy/proxy_server.py (tools/system dropped from count_tokens)

Important Files Changed

Filename Overview
litellm/proxy/common_utils/reset_budget_job.py Added reset_budget_windows() with unbounded find_many (no take/skip) and N+1 individual update calls inside a for-loop, violating CLAUDE.md DB rules for large result sets and batch writes
enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py Extracted _expire_stale_rows using raw SQL execute_raw with batch LIMIT; raw SQL pattern flagged in prior review as violating CLAUDE.md
litellm/proxy/_types.py Added BudgetLimitEntry, budget_limits on GenerateRequestBase/TeamBase/UpdateTeamRequest/LiteLLM_VerificationToken, allowed_models on LiteLLM_BudgetTable; all syntactically valid in current HEAD
litellm/proxy/auth/auth_checks.py Added _check_team_member_model_access, _team_multi_budget_check, _virtual_key_multi_budget_check; all correctly follow the existing get_team_membership caching pattern
litellm/proxy/proxy_server.py Added per-window spend counter increments for budget_limits keys/teams; tools/system params dropped from count_tokens call (flagged in prior review)
tests/test_litellm/llms/bedrock/test_bedrock_common_utils.py Unresolved git merge conflict markers at lines 37-43 prevent file from parsing as valid Python — all tests in module fail at collection time (flagged in prior review)
litellm/constants.py Added STALE_OBJECT_CLEANUP_BATCH_SIZE constant with env-var override, max(1,...) guard, and sensible default of 1000; clean addition
litellm-proxy-extras/litellm_proxy_extras/schema.prisma Added allowed_models to LiteLLM_BudgetTable and default_team_member_models to LiteLLM_TeamTable; budget_limits still absent from both LiteLLM_TeamTable and LiteLLM_VerificationToken (flagged in prior review)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B[user_api_key_auth]
    B --> C[common_checks]
    C --> D{team_object present?}
    D -- Yes --> E[_team_max_budget_check]
    D -- Yes --> F[_team_multi_budget_check]
    F --> F1[get_current_spend\nspend:team:ID:window:DURATION]
    F1 --> F2{spend >= max_budget?}
    F2 -- Yes --> ERR[BudgetExceededError]
    D -- Yes, with user_id --> G[_check_team_member_model_access]
    G --> G1[get_team_membership\ncache-first DB lookup]
    G1 --> G2{allowed_models non-empty?}
    G2 -- Yes --> G3[_can_object_call_model]
    G3 -- denied --> ERR
    C --> H{valid_token present?}
    H -- Yes --> I[_virtual_key_multi_budget_check]
    I --> I1[get_current_spend\nspend:key:TOKEN:window:DURATION]
    I1 --> I2{spend >= max_budget?}
    I2 -- Yes --> ERR
    C --> J[Request proceeds]

    K[Response complete] --> L[increment_spend_counters]
    L --> M[increment key spend counter]
    L --> N[increment per-window key counters\nspend:key:TOKEN:window:DURATION]
    L --> O[increment team spend counter]
    L --> P[increment per-window team counters\nspend:team:ID:window:DURATION]

    Q[ResetBudgetJob poll] --> R[reset_budget_windows]
    R --> R1[find_many keys with budget_limits\n⚠️ no take/skip limit]
    R1 --> R2[for each expired window\nreset Redis counter\nupdate DB one-by-one\n⚠️ N+1 writes]
    R --> R3[find_many teams with budget_limits\n⚠️ no take/skip limit]
    R3 --> R4[for each expired window\nreset Redis counter\nupdate DB one-by-one\n⚠️ N+1 writes]
Loading

Reviews (6): Last reviewed commit: "fix(tests): update upsert tests to refle..." | Re-trigger Greptile

Comment on lines +49 to +64
return await self.prisma_client.db.execute_raw(
"""
UPDATE "LiteLLM_ManagedObjectTable"
SET "status" = 'stale_expired'
WHERE "id" IN (
SELECT "id" FROM "LiteLLM_ManagedObjectTable"
WHERE "file_purpose" = 'response'
AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired')
AND "created_at" < $1::timestamptz
ORDER BY "created_at" ASC
LIMIT $2
)
""",
cutoff,
batch_size,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Raw SQL bypasses ORM layer

The CLAUDE.md rule says: "Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of execute_raw / query_raw". A Prisma-native implementation avoids hand-written SQL, keeps the code testable with simple mocks, and removes schema-drift risk — while still bounding the batch:

async def _expire_stale_rows(self, cutoff: datetime, batch_size: int) -> int:
    stale = await self.prisma_client.db.litellm_managedobjecttable.find_many(
        where={
            "file_purpose": "response",
            "status": {"not_in": ["completed", "complete", "failed", "expired", "cancelled", "stale_expired"]},
            "created_at": {"lt": cutoff},
        },
        order={"created_at": "asc"},
        take=batch_size,
        select={"id": True},
    )
    if not stale:
        return 0
    await self.prisma_client.db.litellm_managedobjecttable.update_many(
        where={"id": {"in": [r.id for r in stale]}},
        data={"status": "stale_expired"},
    )
    return len(stale)

This is two DB round-trips instead of one, but it stays in the ORM layer and matches how the rest of the proxy interacts with the DB. Note: spend_log_cleanup.py also uses execute_raw as a precedent for DELETE … WHERE … IN (SELECT … LIMIT n) — this is a style nudge rather than a hard blocker, but worth aligning with the stated convention.

Context Used: CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +36 to +64
async def _expire_stale_rows(
self, cutoff: datetime, batch_size: int
) -> int:
"""Execute the bounded UPDATE that marks stale rows as 'stale_expired'.

Isolated so it can be swapped / mocked in tests without touching the
orchestration logic in ``_cleanup_stale_managed_objects``.

Uses PostgreSQL syntax (``$1::timestamptz``, ``LIMIT``, double-quoted
identifiers) which is the only dialect the proxy supports — every
``schema.prisma`` in the repo sets ``provider = "postgresql"``.
Same pattern as ``spend_log_cleanup.py``.
"""
return await self.prisma_client.db.execute_raw(
"""
UPDATE "LiteLLM_ManagedObjectTable"
SET "status" = 'stale_expired'
WHERE "id" IN (
SELECT "id" FROM "LiteLLM_ManagedObjectTable"
WHERE "file_purpose" = 'response'
AND "status" NOT IN ('completed', 'complete', 'failed', 'expired', 'cancelled', 'stale_expired')
AND "created_at" < $1::timestamptz
ORDER BY "created_at" ASC
LIMIT $2
)
""",
cutoff,
batch_size,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No tests added for the new method

The PR's pre-submission checklist shows the test checkbox unchecked, and CLAUDE.md states: "Adding at least 1 test is a hard requirement". The _expire_stale_rows docstring explicitly calls out that it was "Isolated so it can be swapped / mocked in tests without touching the orchestration logic" — but no tests were added.

At minimum, a unit test in tests/test_litellm/ that mocks prisma_client.db.execute_raw should verify:

  1. The batch cap (STALE_OBJECT_CLEANUP_BATCH_SIZE) is passed correctly.
  2. The affected-row count is returned and triggers the warning log.
  3. Zero rows → no warning emitted.

Without tests the batch-size guard and the refactored flow cannot be automatically regressed against.

…nt (#21352)

* return actual status code - /count_tokens endpoint

* Apply suggestion from @greptile-apps[bot]

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix greptile suggestion

* rollback file

* add test case

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
Comment on lines +9120 to +9127
try:
result = await provider_counter.count_tokens(
model_to_use=model_to_use or "",
messages=messages, # type: ignore
contents=contents,
deployment=deployment,
request_model=request.model,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 tools and system parameters silently dropped from token counting call

The refactor to add the httpx.HTTPStatusError handler also accidentally removed tools=tools and system=system from the count_tokens call. Both variables are captured from the request at lines 9064-9065 and are actively used by the Anthropic and Gemini token counters to include tool definitions and system prompts in the server-side token count.

With this change:

  • Requests that include tools will receive an underestimated token count (tool definitions can be hundreds/thousands of tokens)
  • Requests that include a system prompt will similarly receive an undercount

The BaseTokenCounter.count_tokens abstract signature explicitly accepts both params (tools: Optional[List[Dict[str, Any]]] = None, system: Optional[Any] = None), and AnthropicTokenCounter passes them straight through to anthropic_count_tokens_handler.handle_count_tokens_request.

Suggested change
try:
result = await provider_counter.count_tokens(
model_to_use=model_to_use or "",
messages=messages, # type: ignore
contents=contents,
deployment=deployment,
request_model=request.model,
)
try:
result = await provider_counter.count_tokens(
model_to_use=model_to_use or "",
messages=messages, # type: ignore
contents=contents,
deployment=deployment,
request_model=request.model,
tools=tools,
system=system,
)

…#24950)

* fix(bedrock): strip [1m]/[200k] context window suffixes before cost lookup

* test(bedrock): add test for [1m] context window suffix stripping in cost lookup

* schema: add allowed_models to BudgetTable, default_team_member_models to TeamTable

* migration: add allowed_models and default_team_member_models columns

* types: add allowed_models to TeamMemberAddRequest, TeamMemberUpdateRequest, UpdateTeamRequest

* utils: add allowed_models param to add_new_member, persist to budget table

* common_utils: add allowed_models to _upsert_budget_and_membership

* team endpoints: seed allowed_models on member_add, persist on member_update and team/update

* auth: enforce per-member allowed_models at request time

* networking: add allowed_models to Member type and teamMemberUpdateCall

* TeamMemberTab: add Model Scope column showing per-member allowed_models

* EditMembership: add Allowed Models multi-select field

* TeamInfo: add default_team_member_models field in Settings tab

* chore: sync schema.prisma copies from root

* fix(team_member_update): update existing budget in-place instead of creating new one

When a member already has a budget_id, patch only the fields the caller
provided rather than always creating a fresh budget record.  The old
code ignored existing_budget_id entirely, so updating only allowed_models
silently dropped the stored max_budget / tpm_limit / rpm_limit values.

* fix(auth): pass llm_router to _check_team_member_model_access

Without the router, _can_object_call_model cannot resolve wildcard model
names (e.g. openai/*) or access-group names in allowed_models, causing
legitimate requests to be denied.  Thread the existing llm_router from
_run_common_checks through to the new member-scope check.

* feat(ui): add Team Member Settings accordion to Create Team modal

Groups default_team_member_models, member budget/key duration, and
tpm/rpm defaults into a single collapsible section. The model picker
is filtered to only show the models selected for the team, and the
copy distinguishes it from the team-level Models field.

* feat(ui): consolidate Team Member Settings into accordion in edit team form

Moves default_team_member_models + per-member budget/key/tpm/rpm fields
into a collapsible "Team Member Settings" panel. Keeps the top-level
form focused on team-wide settings (team models, team budget, tpm/rpm).

* fix(ui): use tremor Accordion for Team Member Settings in edit team form

* fix(ui): move Team Member Settings accordion above budget fields in Create Team

* chore: fixes

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
} | null;
created_at: string;
access_group_ids?: string[];
<<<<<<< worktree-rustling-wishing-kite
access_group_ids?: string[];
<<<<<<< worktree-rustling-wishing-kite
default_team_member_models?: string[];
=======
access_group_models?: string[];
access_group_mcp_server_ids?: string[];
access_group_agent_ids?: string[];
>>>>>>> main
Comment on lines +37 to +43
<<<<<<< worktree-rustling-wishing-kite
assert base_model == "anthropic.claude-3-5-sonnet-20240620-v1:0"

=======
assert base_model == "anthropic.claude-haiku-4-5-20251001-v1:0"

>>>>>>> main
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Unresolved git merge conflict markers

Lines 37–43 contain live conflict markers (<<<<<<< worktree-rustling-wishing-kite, =======, >>>>>>> main). Python cannot parse this file, so every test in test_bedrock_common_utils.py fails with a SyntaxError at import time.

The main-branch version is correct — the model under test is bedrock/us-gov.anthropic.claude-haiku-4-5-20251001-v1:0, so stripping the us-gov. cross-region prefix should yield anthropic.claude-haiku-4-5-20251001-v1:0. The worktree assertion (anthropic.claude-3-5-sonnet-20240620-v1:0) refers to a completely different model and would be wrong.

Resolve by keeping only the correct assertion:

Suggested change
<<<<<<< worktree-rustling-wishing-kite
assert base_model == "anthropic.claude-3-5-sonnet-20240620-v1:0"
=======
assert base_model == "anthropic.claude-haiku-4-5-20251001-v1:0"
>>>>>>> main
assert base_model == "anthropic.claude-haiku-4-5-20251001-v1:0"

#25109)

* feat: multiple concurrent budget windows per API key and team (#24883)

* feat(proxy): add BudgetLimitEntry type and wire budget_limits into key/team models

* feat(schema): add budget_limits Json column to VerificationToken and TeamTable

* feat(migrations): add migration for budget_limits column on keys and teams

* feat(keys): initialize budget_limits windows with reset_at on key create/update

* feat(teams): initialize budget_limits windows with reset_at on team create/update

* feat(auth): add _virtual_key_multi_budget_check and _team_multi_budget_check

* feat(auth): call multi-budget checks from common_checks for keys and teams

* feat(proxy): increment per-window Redis spend counters after each request

* feat(budget): reset individual budget windows on schedule via reset_budget_job

* feat(ui): add hourly option to BudgetDurationDropdown

* feat(ui): add budget_limits field to KeyResponse type

* feat(ui): add Budget Windows editor to key edit view

* feat(ui): add Budget Windows editor to create key form

* fix(proxy): strip budget_limits=None before Prisma upsert to fix login 500

Prisma rejects nullable JSON fields (Json? without @default) when passed as
Python None — it needs the field omitted entirely so the DB stores NULL via
the column's nullable constraint. This was breaking /v2/login because the UI
session key creation path hit the upsert with budget_limits=None.

* ui(key-edit): use antd InputNumber+Button for budget windows, add reset hints

* ui(create-key): use antd InputNumber+Button for budget windows, add reset hints

* docs(users): add multiple budget windows section with API + dashboard walkthrough

* fix: BudgetExceededError returns HTTP 429 instead of 400

- Add status_code=429 to BudgetExceededError class
- auth_exception_handler hardcoded code=400 → code=429

* fix: no-op else branch in multi-budget auth checks causes KeyError

- BudgetLimitEntry objects must be coerced via model_dump() not left as-is
- Move _virtual_key_multi_budget_check into common_checks (was asymmetric
  with _team_multi_budget_check which already lived there)

* fix: len() on JSON string returns char count not window count

Guard with isinstance check + json.loads() before iterating per-window
Redis counters in increment_spend_counters

* fix: silent except:pass hides Redis reset failures in reset_budget_windows

Log Redis counter reset failures as warnings so they are observable

* test: add unit tests for multi-budget window enforcement

5 tests covering: no budget_limits passes, under budget passes,
over hourly window raises 429, over monthly window raises 429,
BudgetLimitEntry objects coerced without KeyError

* fix: key per-window counters stable across reorders (duration key, not index)

* fix: team+key per-window spend increments use duration key, not index

* fix: budget window reset uses duration key; log failures instead of swallowing

* refactor: extract BudgetWindowsEditor to shared component

* refactor: key_edit_view imports BudgetWindowsEditor from shared component

* refactor: create_key_button imports BudgetWindowsEditor from shared component

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* fix(reset_budget_job): extract _reset_expired_window helper to fix PLR0915 too many statements

* feat(skills): Skills Registry & Hub — register skills, browse in AI Hub, public skill hub (#25118)

* feat(skills): add domain and namespace fields to plugin types

* feat(skills): store and return domain/namespace inside manifest_json

* feat(skills): add /public/skill_hub endpoint for unauthenticated access

* feat(skills): whitelist /public/skill_hub from auth requirements

* feat(skills): add domain, namespace to Plugin and RegisterPluginRequest types

* feat(skills): smart URL parser — paste github URL, auto-detect source type and name

* feat(skills): replace enable toggle with Public badge, make rows clickable

* feat(skills): add skill detail view with Overview and How to Use tabs

* feat(skills): add MakeSkillPublicForm modal for publishing skills to the hub

* feat(skills): rename panel to Skills, wire in skill detail view on row click

* feat(skills): add skill hub table columns — name, description, domain, source, status

* feat(skills): add SkillHubDashboard with stats row, domain dropdown filter, and table

* feat(skills): add Skill Hub tab to AI Hub with Select Skills to Make Public button

* feat(skills): move Skills to top-level nav item directly under MCP Servers

* feat(skills): add skillHubPublicCall and NEXT_PUBLIC_BASE_URL support

* feat(skills): add Skill Hub tab to public AI Hub page

* feat(skills): add skills page routing in main app router

* feat(skills): add /skills page route

* chore: update package-lock after npm install

* docs(skills): add Skills Gateway doc page with mermaid architecture diagram

* docs(skills): add Skills Gateway to sidebar under Agent & MCP Gateway

* docs(skills): add loom walkthrough video to Skills Gateway doc

* chore: fixes

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
@ishaan-berri ishaan-berri had a problem deploying to integration-redis-postgres April 6, 2026 21:02 — with GitHub Actions Failure
@ishaan-berri ishaan-berri had a problem deploying to integration-postgres April 6, 2026 21:02 — with GitHub Actions Failure
@ishaan-berri ishaan-berri had a problem deploying to integration-postgres April 6, 2026 21:02 — with GitHub Actions Failure
@ishaan-berri ishaan-berri had a problem deploying to integration-postgres April 6, 2026 21:02 — with GitHub Actions Failure
@ishaan-berri ishaan-berri had a problem deploying to integration-postgres April 6, 2026 21:02 — with GitHub Actions Failure
Comment thread litellm/proxy/_types.py
Comment on lines +1764 to +1768
budget_limits: Optional[List[BudgetLimitEntry]] = (
None # multiple concurrent budget windows
default_team_member_models: Optional[List[str]] = (
None # default allowed_models seeded onto new team members
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Syntax Error: missing closing ) on budget_limits field

The budget_limits field opens a parenthesis on line 1764 that is never closed before default_team_member_models begins on line 1766. The only ) on line 1768 is intended for default_team_member_models, so Python's implicit line continuation keeps the budget_limits expression open and sees an annotation statement (default_team_member_models: ...) inside a value expression — which raises SyntaxError: invalid syntax at import time. This makes the entire proxy unlaunchable.

Suggested change
budget_limits: Optional[List[BudgetLimitEntry]] = (
None # multiple concurrent budget windows
default_team_member_models: Optional[List[str]] = (
None # default allowed_models seeded onto new team members
)
budget_limits: Optional[List[BudgetLimitEntry]] = (
None # multiple concurrent budget windows
)
default_team_member_models: Optional[List[str]] = (
None # default allowed_models seeded onto new team members
)

import { useQueryClient } from "@tanstack/react-query";
import { Accordion, AccordionBody, AccordionHeader, Button, Col, Grid, Text, TextInput, Title } from "@tremor/react";
import { Button as Button2, Form, Input, Modal, Radio, Select, Switch, Tag, Tooltip } from "antd";
import { Button as Button2, Form, Input, InputNumber, Modal, Radio, Select, Switch, Tag, Tooltip } from "antd";
import { InfoCircleOutlined } from "@ant-design/icons";
import { TextInput, Button as TremorButton } from "@tremor/react";
import { Form, Input, Select, Switch, Tooltip } from "antd";
import { Button as AntButton, Form, Input, InputNumber, Select, Switch, Tooltip } from "antd";
For keys and teams with budget_limits, reset any individual windows where
reset_at <= now. Only the expired windows are reset; other windows are untouched.
"""
from litellm.proxy.proxy_server import spend_counter_cache
Comment on lines +599 to +644
all_keys = await self.prisma_client.db.litellm_verificationtoken.find_many(
where={"budget_limits": {"not": None}} # type: ignore[arg-type]
)
for key in all_keys:
raw = key.budget_limits # type: ignore[attr-defined]
if not raw:
continue
windows: list = raw if isinstance(raw, list) else json.loads(raw)
changed = False
for window in windows:
counter_key = f"spend:key:{key.token}:window:{window['budget_duration']}"
if await ResetBudgetJob._reset_expired_window(
window, counter_key, spend_counter_cache, now
):
changed = True
if changed:
await self.prisma_client.db.litellm_verificationtoken.update(
where={"token": key.token},
data={"budget_limits": json.dumps(windows)}, # type: ignore[arg-type]
)
except Exception as e:
verbose_proxy_logger.exception(
"Failed to reset budget windows for keys: %s", e
)

# --- Teams ---
try:
all_teams = await self.prisma_client.db.litellm_teamtable.find_many(
where={"budget_limits": {"not": None}} # type: ignore[arg-type]
)
for team in all_teams:
raw = team.budget_limits # type: ignore[attr-defined]
if not raw:
continue
windows = raw if isinstance(raw, list) else json.loads(raw)
changed = False
for window in windows:
counter_key = f"spend:team:{team.team_id}:window:{window['budget_duration']}"
if await ResetBudgetJob._reset_expired_window(
window, counter_key, spend_counter_cache, now
):
changed = True
if changed:
await self.prisma_client.db.litellm_teamtable.update(
where={"team_id": team.team_id},
data={"budget_limits": json.dumps(windows)}, # type: ignore[arg-type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 budget_limits absent from all Prisma schema files — feature broken at runtime

The migration 20260401000000_add_budget_limits/migration.sql correctly adds budget_limits JSONB to LiteLLM_VerificationToken and LiteLLM_TeamTable in the database. However, none of the three Prisma schema files (schema.prisma, litellm/proxy/schema.prisma, litellm-proxy-extras/litellm_proxy_extras/schema.prisma) declare this field for those two models — confirmed by reading the current file, neither model block contains budget_limits.

Because Prisma generates its client from the schema (not from live DB introspection), every budget_limits reference in reset_budget_windows() will fail at runtime:

  • find_many(where={"budget_limits": {"not": None}}) — Prisma rejects unknown filter fields with a validation error
  • key.budget_limits / team.budget_limits attribute accesses — returned objects never carry this field (always None or AttributeError)
  • update(data={"budget_limits": json.dumps(windows)}) — Prisma rejects unknown write fields

All of these failures are silently swallowed by the surrounding except Exception blocks, so budget windows will never reset and per-window spend counters will accumulate indefinitely. The # type: ignore[arg-type] / # type: ignore[attr-defined] annotations throughout confirm the author is aware the field is absent from the Prisma schema.

Fix: Add budget_limits Json? to LiteLLM_VerificationToken and LiteLLM_TeamTable in all three schema.prisma files and regenerate the Prisma client. Per CLAUDE.md, schema changes must be kept in sync across all copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants