Litellm ishaan april6#25256
Conversation
…ATE (#25227) * Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant Configurable batch limit (default 1000) for stale managed object cleanup, preventing unbounded UPDATE queries from hitting 300K+ rows at once. * Batch-limit stale managed object cleanup with single bounded SQL query Two fixes to _cleanup_stale_managed_objects: 1. Replace unbounded update_many with a single execute_raw using a subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE rows. Zero rows loaded into Python memory — everything stays in Postgres. Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py (the proxy requires PostgreSQL per schema.prisma). 2. Extract _expire_stale_rows as a separate method for testability. Keeps the file_purpose='response' filter to avoid incorrectly expiring long-running batch or fine-tune jobs that legitimately exceed the staleness cutoff.
…nt (#21352) * return actual status code - /count_tokens endpoint * Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix greptile suggestion * rollback file * add test case --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
…#24950) * fix(bedrock): strip [1m]/[200k] context window suffixes before cost lookup * test(bedrock): add test for [1m] context window suffix stripping in cost lookup * schema: add allowed_models to BudgetTable, default_team_member_models to TeamTable * migration: add allowed_models and default_team_member_models columns * types: add allowed_models to TeamMemberAddRequest, TeamMemberUpdateRequest, UpdateTeamRequest * utils: add allowed_models param to add_new_member, persist to budget table * common_utils: add allowed_models to _upsert_budget_and_membership * team endpoints: seed allowed_models on member_add, persist on member_update and team/update * auth: enforce per-member allowed_models at request time * networking: add allowed_models to Member type and teamMemberUpdateCall * TeamMemberTab: add Model Scope column showing per-member allowed_models * EditMembership: add Allowed Models multi-select field * TeamInfo: add default_team_member_models field in Settings tab * chore: sync schema.prisma copies from root * fix(team_member_update): update existing budget in-place instead of creating new one When a member already has a budget_id, patch only the fields the caller provided rather than always creating a fresh budget record. The old code ignored existing_budget_id entirely, so updating only allowed_models silently dropped the stored max_budget / tpm_limit / rpm_limit values. * fix(auth): pass llm_router to _check_team_member_model_access Without the router, _can_object_call_model cannot resolve wildcard model names (e.g. openai/*) or access-group names in allowed_models, causing legitimate requests to be denied. Thread the existing llm_router from _run_common_checks through to the new member-scope check. * feat(ui): add Team Member Settings accordion to Create Team modal Groups default_team_member_models, member budget/key duration, and tpm/rpm defaults into a single collapsible section. The model picker is filtered to only show the models selected for the team, and the copy distinguishes it from the team-level Models field. * feat(ui): consolidate Team Member Settings into accordion in edit team form Moves default_team_member_models + per-member budget/key/tpm/rpm fields into a collapsible "Team Member Settings" panel. Keeps the top-level form focused on team-wide settings (team models, team budget, tpm/rpm). * fix(ui): use tremor Accordion for Team Member Settings in edit team form * fix(ui): move Team Member Settings accordion above budget fields in Create Team * chore: fixes --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
#25109) * feat: multiple concurrent budget windows per API key and team (#24883) * feat(proxy): add BudgetLimitEntry type and wire budget_limits into key/team models * feat(schema): add budget_limits Json column to VerificationToken and TeamTable * feat(migrations): add migration for budget_limits column on keys and teams * feat(keys): initialize budget_limits windows with reset_at on key create/update * feat(teams): initialize budget_limits windows with reset_at on team create/update * feat(auth): add _virtual_key_multi_budget_check and _team_multi_budget_check * feat(auth): call multi-budget checks from common_checks for keys and teams * feat(proxy): increment per-window Redis spend counters after each request * feat(budget): reset individual budget windows on schedule via reset_budget_job * feat(ui): add hourly option to BudgetDurationDropdown * feat(ui): add budget_limits field to KeyResponse type * feat(ui): add Budget Windows editor to key edit view * feat(ui): add Budget Windows editor to create key form * fix(proxy): strip budget_limits=None before Prisma upsert to fix login 500 Prisma rejects nullable JSON fields (Json? without @default) when passed as Python None — it needs the field omitted entirely so the DB stores NULL via the column's nullable constraint. This was breaking /v2/login because the UI session key creation path hit the upsert with budget_limits=None. * ui(key-edit): use antd InputNumber+Button for budget windows, add reset hints * ui(create-key): use antd InputNumber+Button for budget windows, add reset hints * docs(users): add multiple budget windows section with API + dashboard walkthrough * fix: BudgetExceededError returns HTTP 429 instead of 400 - Add status_code=429 to BudgetExceededError class - auth_exception_handler hardcoded code=400 → code=429 * fix: no-op else branch in multi-budget auth checks causes KeyError - BudgetLimitEntry objects must be coerced via model_dump() not left as-is - Move _virtual_key_multi_budget_check into common_checks (was asymmetric with _team_multi_budget_check which already lived there) * fix: len() on JSON string returns char count not window count Guard with isinstance check + json.loads() before iterating per-window Redis counters in increment_spend_counters * fix: silent except:pass hides Redis reset failures in reset_budget_windows Log Redis counter reset failures as warnings so they are observable * test: add unit tests for multi-budget window enforcement 5 tests covering: no budget_limits passes, under budget passes, over hourly window raises 429, over monthly window raises 429, BudgetLimitEntry objects coerced without KeyError * fix: key per-window counters stable across reorders (duration key, not index) * fix: team+key per-window spend increments use duration key, not index * fix: budget window reset uses duration key; log failures instead of swallowing * refactor: extract BudgetWindowsEditor to shared component * refactor: key_edit_view imports BudgetWindowsEditor from shared component * refactor: create_key_button imports BudgetWindowsEditor from shared component --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * fix(reset_budget_job): extract _reset_expired_window helper to fix PLR0915 too many statements * feat(skills): Skills Registry & Hub — register skills, browse in AI Hub, public skill hub (#25118) * feat(skills): add domain and namespace fields to plugin types * feat(skills): store and return domain/namespace inside manifest_json * feat(skills): add /public/skill_hub endpoint for unauthenticated access * feat(skills): whitelist /public/skill_hub from auth requirements * feat(skills): add domain, namespace to Plugin and RegisterPluginRequest types * feat(skills): smart URL parser — paste github URL, auto-detect source type and name * feat(skills): replace enable toggle with Public badge, make rows clickable * feat(skills): add skill detail view with Overview and How to Use tabs * feat(skills): add MakeSkillPublicForm modal for publishing skills to the hub * feat(skills): rename panel to Skills, wire in skill detail view on row click * feat(skills): add skill hub table columns — name, description, domain, source, status * feat(skills): add SkillHubDashboard with stats row, domain dropdown filter, and table * feat(skills): add Skill Hub tab to AI Hub with Select Skills to Make Public button * feat(skills): move Skills to top-level nav item directly under MCP Servers * feat(skills): add skillHubPublicCall and NEXT_PUBLIC_BASE_URL support * feat(skills): add Skill Hub tab to public AI Hub page * feat(skills): add skills page routing in main app router * feat(skills): add /skills page route * chore: update package-lock after npm install * docs(skills): add Skills Gateway doc page with mermaid architecture diagram * docs(skills): add Skills Gateway to sidebar under Agent & MCP Gateway * docs(skills): add loom walkthrough video to Skills Gateway doc * chore: fixes --------- Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
…or for existing budget_id
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| For keys and teams with budget_limits, reset any individual windows where | ||
| reset_at <= now. Only the expired windows are reset; other windows are untouched. | ||
| """ | ||
| from litellm.proxy.proxy_server import spend_counter_cache |
Greptile SummaryThis PR bundles four features: multi-window concurrent budget enforcement for keys and teams, per-team-member model scoping with a
Confidence Score: 4/5Safe to merge after fixing the Two P1 docstring errors (same root cause, different endpoints) mean anyone trying
|
| Filename | Overview |
|---|---|
| litellm/proxy/auth/auth_checks.py | Adds multi-window budget enforcement and per-member model scope checks to the critical auth path; both use the existing get_team_membership cache so no net new DB queries for warm caches. |
| litellm/proxy/common_utils/reset_budget_job.py | New reset_budget_windows method issues N+1 DB writes for changed keys/teams; timezone-stripping in _reset_expired_window can cause stale windows for non-UTC deployments (flagged in prior review thread). |
| litellm/proxy/management_endpoints/key_management_endpoints.py | Adds budget_limits to key generation/update; docstring examples use wrong field names (budget_limit/time_period) that will cause 422 errors for API callers; window reset timestamps are not preserved across updates. |
| litellm/proxy/management_endpoints/team_endpoints.py | Adds budget_limits and default_team_member_models to team create/update; docstring examples have wrong field names; _set_budget_reset_at resets all window timers on every update rather than preserving existing ones. |
| litellm/proxy/management_endpoints/common_utils.py | Correctly refactors _upsert_budget_and_membership to update existing budgets in-place rather than always creating new records; adds allowed_models field support. |
| litellm/proxy/proxy_server.py | Adds per-window spend counter increments for multi-budget keys and teams using cache lookups; extracts _try_provider_token_count helper that correctly propagates HTTP status codes from provider token-counting APIs. |
| litellm/proxy/_types.py | Adds BudgetLimitEntry, budget_limits to key/team types, allowed_models to LiteLLM_BudgetTable, and default_team_member_models to TeamBase; closes unclosed paren in TeamBase.budget_limits. |
| tests/test_litellm/proxy/auth/test_multi_budget_windows.py | New unit tests for multi-window budget enforcement; all mock-based, no real network calls; covers under-budget, over-first-window, over-second-window, and Pydantic object coercion cases. |
| tests/test_litellm/proxy/common_utils/test_upsert_budget_membership.py | Tests updated to reflect the new in-place update behavior for existing budget IDs; assertions correctly flip from create to update calls. |
Sequence Diagram
sequenceDiagram
participant Client
participant common_checks
participant auth_checks
participant Cache as UserKeyCache
participant DB as PrismaDB
participant SpendCounter
Client->>common_checks: request(model, team, virtualkey)
common_checks->>auth_checks: _check_team_member_model_access
auth_checks->>Cache: get_team_membership(user_id, team_id)
alt cache miss
Cache->>DB: find_unique litellm_teammembership
DB-->>Cache: membership + allowed_models
end
Cache-->>auth_checks: membership
auth_checks-->>common_checks: 401 if model not allowed
common_checks->>auth_checks: _team_multi_budget_check
auth_checks->>SpendCounter: get_current_spend(team:window)
auth_checks-->>common_checks: 429 if window exceeded
common_checks->>auth_checks: _virtual_key_multi_budget_check
auth_checks->>SpendCounter: get_current_spend(key:window)
auth_checks-->>common_checks: 429 if window exceeded
common_checks-->>Client: allowed
Note over SpendCounter: Post-response spend tracking
Client->>SpendCounter: increment_spend_counters(cost)
SpendCounter->>SpendCounter: incr spend:key (global)
SpendCounter->>SpendCounter: incr spend:key:window (per BudgetLimitEntry)
SpendCounter->>SpendCounter: incr spend:team (global)
SpendCounter->>SpendCounter: incr spend:team:window (per BudgetLimitEntry)
Reviews (5): Last reviewed commit: "style: run black formatter on files from..." | Re-trigger Greptile
| now: datetime, | ||
| ) -> bool: | ||
| """Reset a single budget window if expired. Returns True if the window was reset.""" | ||
| from litellm.proxy.common_utils.timezone_utils import get_budget_reset_time | ||
|
|
||
| reset_at_str = window.get("reset_at") | ||
| if not reset_at_str: | ||
| return False | ||
| reset_at = datetime.fromisoformat( | ||
| reset_at_str.replace("Z", "+00:00") | ||
| ).replace(tzinfo=None) | ||
| if reset_at > now: | ||
| return False | ||
| spend_counter_cache.in_memory_cache.set_cache(key=counter_key, value=0.0) | ||
| if spend_counter_cache.redis_cache is not None: |
There was a problem hiding this comment.
Timezone stripping causes incorrect window reset timing for non-UTC deployments
_reset_expired_window strips timezone info from the stored reset_at string with .replace(tzinfo=None), then compares against datetime.utcnow() (also naive). When litellm_settings.timezone is configured to a non-UTC zone (e.g. America/New_York, UTC−5), get_budget_reset_time returns a TZ-aware datetime whose .isoformat() looks like 2026-04-07T05:00:00-05:00. After stripping the offset, the naive value 2026-04-07T05:00:00 is compared against UTC now (2026-04-07T10:00:00) — making 05:00 > 10:00 → False — so the window is not reset even though it expired 5 hours ago.
A safer approach keeps both sides timezone-aware:
from datetime import timezone
now = datetime.now(timezone.utc)
reset_at_str = window.get("reset_at")
if not reset_at_str:
return False
reset_at = datetime.fromisoformat(reset_at_str.replace("Z", "+00:00"))
# reset_at is timezone-aware; compare directly without stripping
if reset_at > now:
return FalseCo-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
|
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ort, function or class' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…s from deleted team record
…_VerificationToken
…_VerificationToken
…_VerificationToken
…migration recovery
…ails on pooler URL When DIRECT_URL is not set and DATABASE_URL is a Neon pooler URL, prisma migrate diff fails (pooler doesn't support extended query protocol for schema introspection). Previously _resolve_all_migrations returned early without applying any migrations, leaving the budget_limits column missing and causing test_auth_callback_new_user to fail. Now falls back to running each migration SQL file via prisma db execute --file, which works with pooler URLs and is safe to re-run due to IF NOT EXISTS guards.
page_utils.test.ts enforces that every menuGroups entry has a matching description and vice versa. The left nav uses 'skills' but page_metadata.ts still had 'claude-code-plugins', causing two test failures.
…ending When prisma migrate deploy reports 'No pending migrations to apply' the DB already matches schema — running _resolve_all_migrations (migrate diff + prisma db execute) adds 25+ seconds unnecessarily, causing the proxy to miss the 90-second startup timeout in test_litellm_proxy_server_config_no_general_settings.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Relevant issues
Changes
Pre-Submission checklist
tests/test_litellm/make test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links: