Skip to content

Litellm ishaan april6#25256

Merged
ishaan-berri merged 51 commits intolitellm_internal_stagingfrom
litellm_ishaan_april6
Apr 17, 2026
Merged

Litellm ishaan april6#25256
ishaan-berri merged 51 commits intolitellm_internal_stagingfrom
litellm_ishaan_april6

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

Relevant issues

Changes

Pre-Submission checklist

  • Added testing in tests/test_litellm/
  • PR passes all unit tests on make test-unit

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

ishaan-berri and others added 8 commits April 6, 2026 13:26
…ATE (#25227)

* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.

* Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
…nt (#21352)

* return actual status code - /count_tokens endpoint

* Apply suggestion from @greptile-apps[bot]

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix greptile suggestion

* rollback file

* add test case

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
…#24950)

* fix(bedrock): strip [1m]/[200k] context window suffixes before cost lookup

* test(bedrock): add test for [1m] context window suffix stripping in cost lookup

* schema: add allowed_models to BudgetTable, default_team_member_models to TeamTable

* migration: add allowed_models and default_team_member_models columns

* types: add allowed_models to TeamMemberAddRequest, TeamMemberUpdateRequest, UpdateTeamRequest

* utils: add allowed_models param to add_new_member, persist to budget table

* common_utils: add allowed_models to _upsert_budget_and_membership

* team endpoints: seed allowed_models on member_add, persist on member_update and team/update

* auth: enforce per-member allowed_models at request time

* networking: add allowed_models to Member type and teamMemberUpdateCall

* TeamMemberTab: add Model Scope column showing per-member allowed_models

* EditMembership: add Allowed Models multi-select field

* TeamInfo: add default_team_member_models field in Settings tab

* chore: sync schema.prisma copies from root

* fix(team_member_update): update existing budget in-place instead of creating new one

When a member already has a budget_id, patch only the fields the caller
provided rather than always creating a fresh budget record.  The old
code ignored existing_budget_id entirely, so updating only allowed_models
silently dropped the stored max_budget / tpm_limit / rpm_limit values.

* fix(auth): pass llm_router to _check_team_member_model_access

Without the router, _can_object_call_model cannot resolve wildcard model
names (e.g. openai/*) or access-group names in allowed_models, causing
legitimate requests to be denied.  Thread the existing llm_router from
_run_common_checks through to the new member-scope check.

* feat(ui): add Team Member Settings accordion to Create Team modal

Groups default_team_member_models, member budget/key duration, and
tpm/rpm defaults into a single collapsible section. The model picker
is filtered to only show the models selected for the team, and the
copy distinguishes it from the team-level Models field.

* feat(ui): consolidate Team Member Settings into accordion in edit team form

Moves default_team_member_models + per-member budget/key/tpm/rpm fields
into a collapsible "Team Member Settings" panel. Keeps the top-level
form focused on team-wide settings (team models, team budget, tpm/rpm).

* fix(ui): use tremor Accordion for Team Member Settings in edit team form

* fix(ui): move Team Member Settings accordion above budget fields in Create Team

* chore: fixes

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
#25109)

* feat: multiple concurrent budget windows per API key and team (#24883)

* feat(proxy): add BudgetLimitEntry type and wire budget_limits into key/team models

* feat(schema): add budget_limits Json column to VerificationToken and TeamTable

* feat(migrations): add migration for budget_limits column on keys and teams

* feat(keys): initialize budget_limits windows with reset_at on key create/update

* feat(teams): initialize budget_limits windows with reset_at on team create/update

* feat(auth): add _virtual_key_multi_budget_check and _team_multi_budget_check

* feat(auth): call multi-budget checks from common_checks for keys and teams

* feat(proxy): increment per-window Redis spend counters after each request

* feat(budget): reset individual budget windows on schedule via reset_budget_job

* feat(ui): add hourly option to BudgetDurationDropdown

* feat(ui): add budget_limits field to KeyResponse type

* feat(ui): add Budget Windows editor to key edit view

* feat(ui): add Budget Windows editor to create key form

* fix(proxy): strip budget_limits=None before Prisma upsert to fix login 500

Prisma rejects nullable JSON fields (Json? without @default) when passed as
Python None — it needs the field omitted entirely so the DB stores NULL via
the column's nullable constraint. This was breaking /v2/login because the UI
session key creation path hit the upsert with budget_limits=None.

* ui(key-edit): use antd InputNumber+Button for budget windows, add reset hints

* ui(create-key): use antd InputNumber+Button for budget windows, add reset hints

* docs(users): add multiple budget windows section with API + dashboard walkthrough

* fix: BudgetExceededError returns HTTP 429 instead of 400

- Add status_code=429 to BudgetExceededError class
- auth_exception_handler hardcoded code=400 → code=429

* fix: no-op else branch in multi-budget auth checks causes KeyError

- BudgetLimitEntry objects must be coerced via model_dump() not left as-is
- Move _virtual_key_multi_budget_check into common_checks (was asymmetric
  with _team_multi_budget_check which already lived there)

* fix: len() on JSON string returns char count not window count

Guard with isinstance check + json.loads() before iterating per-window
Redis counters in increment_spend_counters

* fix: silent except:pass hides Redis reset failures in reset_budget_windows

Log Redis counter reset failures as warnings so they are observable

* test: add unit tests for multi-budget window enforcement

5 tests covering: no budget_limits passes, under budget passes,
over hourly window raises 429, over monthly window raises 429,
BudgetLimitEntry objects coerced without KeyError

* fix: key per-window counters stable across reorders (duration key, not index)

* fix: team+key per-window spend increments use duration key, not index

* fix: budget window reset uses duration key; log failures instead of swallowing

* refactor: extract BudgetWindowsEditor to shared component

* refactor: key_edit_view imports BudgetWindowsEditor from shared component

* refactor: create_key_button imports BudgetWindowsEditor from shared component

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

* fix(reset_budget_job): extract _reset_expired_window helper to fix PLR0915 too many statements

* feat(skills): Skills Registry & Hub — register skills, browse in AI Hub, public skill hub (#25118)

* feat(skills): add domain and namespace fields to plugin types

* feat(skills): store and return domain/namespace inside manifest_json

* feat(skills): add /public/skill_hub endpoint for unauthenticated access

* feat(skills): whitelist /public/skill_hub from auth requirements

* feat(skills): add domain, namespace to Plugin and RegisterPluginRequest types

* feat(skills): smart URL parser — paste github URL, auto-detect source type and name

* feat(skills): replace enable toggle with Public badge, make rows clickable

* feat(skills): add skill detail view with Overview and How to Use tabs

* feat(skills): add MakeSkillPublicForm modal for publishing skills to the hub

* feat(skills): rename panel to Skills, wire in skill detail view on row click

* feat(skills): add skill hub table columns — name, description, domain, source, status

* feat(skills): add SkillHubDashboard with stats row, domain dropdown filter, and table

* feat(skills): add Skill Hub tab to AI Hub with Select Skills to Make Public button

* feat(skills): move Skills to top-level nav item directly under MCP Servers

* feat(skills): add skillHubPublicCall and NEXT_PUBLIC_BASE_URL support

* feat(skills): add Skill Hub tab to public AI Hub page

* feat(skills): add skills page routing in main app router

* feat(skills): add /skills page route

* chore: update package-lock after npm install

* docs(skills): add Skills Gateway doc page with mermaid architecture diagram

* docs(skills): add Skills Gateway to sidebar under Agent & MCP Gateway

* docs(skills): add loom walkthrough video to Skills Gateway doc

* chore: fixes

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 7, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 17, 2026 11:01pm

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 7, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_ishaan_april6 (e5824b1) with main (5cb9b08)

Open in CodSpeed

Comment thread ui/litellm-dashboard/src/components/organisms/create_key_button.tsx Fixed
Comment thread ui/litellm-dashboard/src/components/team/TeamInfo.tsx Fixed
Comment thread ui/litellm-dashboard/src/components/team/TeamInfo.tsx Fixed
Comment thread ui/litellm-dashboard/src/components/team/TeamInfo.tsx Fixed
Comment thread ui/litellm-dashboard/src/components/templates/key_edit_view.tsx Fixed
For keys and teams with budget_limits, reset any individual windows where
reset_at <= now. Only the expired windows are reset; other windows are untouched.
"""
from litellm.proxy.proxy_server import spend_counter_cache
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 7, 2026

Greptile Summary

This PR bundles four features: multi-window concurrent budget enforcement for keys and teams, per-team-member model scoping with a default_team_member_models team default, a fix to the /v1/messages/count_tokens status-code propagation, and a batch-limit guard on the stale-object cleanup job. The schema migrations are additive and safe, the new auth checks reuse the existing get_team_membership cache correctly, and the _upsert_budget_and_membership refactor to in-place updates is a genuine bug fix.

  • P1 – Wrong field names in budget_limits docstring examples: four endpoint docstrings document budget_limit/time_period but the BudgetLimitEntry model requires max_budget/budget_duration; callers following the docs receive a 422 immediately.
  • P2 – Window reset_at always re-anchored on update: _set_budget_reset_at and prepare_key_update_data call get_budget_reset_time() for every window on every update, so editing a key or team mid-period silently extends existing windows instead of preserving the original start time.
  • P2 – N+1 writes in reset_budget_windows: the keys and teams loops each issue one update DB call per changed record; these could be issued concurrently with asyncio.gather.

Confidence Score: 4/5

Safe to merge after fixing the budget_limits docstring field names — wrong examples cause 422 for all API callers trying the feature.

Two P1 docstring errors (same root cause, different endpoints) mean anyone trying budget_limits via the documented API will get a 422 until they independently discover the correct field names. All other findings are P2 (N+1 background writes, window-timer extension on update). Core auth logic, migrations, and the new Pydantic types look correct.

litellm/proxy/management_endpoints/key_management_endpoints.py (lines 1223, 2157) and litellm/proxy/management_endpoints/team_endpoints.py (line 832) for the docstring fixes; litellm/proxy/common_utils/reset_budget_job.py for the timezone and N+1 write issues.

Important Files Changed

Filename Overview
litellm/proxy/auth/auth_checks.py Adds multi-window budget enforcement and per-member model scope checks to the critical auth path; both use the existing get_team_membership cache so no net new DB queries for warm caches.
litellm/proxy/common_utils/reset_budget_job.py New reset_budget_windows method issues N+1 DB writes for changed keys/teams; timezone-stripping in _reset_expired_window can cause stale windows for non-UTC deployments (flagged in prior review thread).
litellm/proxy/management_endpoints/key_management_endpoints.py Adds budget_limits to key generation/update; docstring examples use wrong field names (budget_limit/time_period) that will cause 422 errors for API callers; window reset timestamps are not preserved across updates.
litellm/proxy/management_endpoints/team_endpoints.py Adds budget_limits and default_team_member_models to team create/update; docstring examples have wrong field names; _set_budget_reset_at resets all window timers on every update rather than preserving existing ones.
litellm/proxy/management_endpoints/common_utils.py Correctly refactors _upsert_budget_and_membership to update existing budgets in-place rather than always creating new records; adds allowed_models field support.
litellm/proxy/proxy_server.py Adds per-window spend counter increments for multi-budget keys and teams using cache lookups; extracts _try_provider_token_count helper that correctly propagates HTTP status codes from provider token-counting APIs.
litellm/proxy/_types.py Adds BudgetLimitEntry, budget_limits to key/team types, allowed_models to LiteLLM_BudgetTable, and default_team_member_models to TeamBase; closes unclosed paren in TeamBase.budget_limits.
tests/test_litellm/proxy/auth/test_multi_budget_windows.py New unit tests for multi-window budget enforcement; all mock-based, no real network calls; covers under-budget, over-first-window, over-second-window, and Pydantic object coercion cases.
tests/test_litellm/proxy/common_utils/test_upsert_budget_membership.py Tests updated to reflect the new in-place update behavior for existing budget IDs; assertions correctly flip from create to update calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant common_checks
    participant auth_checks
    participant Cache as UserKeyCache
    participant DB as PrismaDB
    participant SpendCounter

    Client->>common_checks: request(model, team, virtualkey)
    common_checks->>auth_checks: _check_team_member_model_access
    auth_checks->>Cache: get_team_membership(user_id, team_id)
    alt cache miss
        Cache->>DB: find_unique litellm_teammembership
        DB-->>Cache: membership + allowed_models
    end
    Cache-->>auth_checks: membership
    auth_checks-->>common_checks: 401 if model not allowed

    common_checks->>auth_checks: _team_multi_budget_check
    auth_checks->>SpendCounter: get_current_spend(team:window)
    auth_checks-->>common_checks: 429 if window exceeded

    common_checks->>auth_checks: _virtual_key_multi_budget_check
    auth_checks->>SpendCounter: get_current_spend(key:window)
    auth_checks-->>common_checks: 429 if window exceeded

    common_checks-->>Client: allowed

    Note over SpendCounter: Post-response spend tracking
    Client->>SpendCounter: increment_spend_counters(cost)
    SpendCounter->>SpendCounter: incr spend:key (global)
    SpendCounter->>SpendCounter: incr spend:key:window (per BudgetLimitEntry)
    SpendCounter->>SpendCounter: incr spend:team (global)
    SpendCounter->>SpendCounter: incr spend:team:window (per BudgetLimitEntry)
Loading

Reviews (5): Last reviewed commit: "style: run black formatter on files from..." | Re-trigger Greptile

Comment thread litellm/proxy/proxy_server.py Outdated
Comment on lines +560 to +574
now: datetime,
) -> bool:
"""Reset a single budget window if expired. Returns True if the window was reset."""
from litellm.proxy.common_utils.timezone_utils import get_budget_reset_time

reset_at_str = window.get("reset_at")
if not reset_at_str:
return False
reset_at = datetime.fromisoformat(
reset_at_str.replace("Z", "+00:00")
).replace(tzinfo=None)
if reset_at > now:
return False
spend_counter_cache.in_memory_cache.set_cache(key=counter_key, value=0.0)
if spend_counter_cache.redis_cache is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Timezone stripping causes incorrect window reset timing for non-UTC deployments

_reset_expired_window strips timezone info from the stored reset_at string with .replace(tzinfo=None), then compares against datetime.utcnow() (also naive). When litellm_settings.timezone is configured to a non-UTC zone (e.g. America/New_York, UTC−5), get_budget_reset_time returns a TZ-aware datetime whose .isoformat() looks like 2026-04-07T05:00:00-05:00. After stripping the offset, the naive value 2026-04-07T05:00:00 is compared against UTC now (2026-04-07T10:00:00) — making 05:00 > 10:00False — so the window is not reset even though it expired 5 hours ago.

A safer approach keeps both sides timezone-aware:

from datetime import timezone

now = datetime.now(timezone.utc)
reset_at_str = window.get("reset_at")
if not reset_at_str:
    return False
reset_at = datetime.fromisoformat(reset_at_str.replace("Z", "+00:00"))
# reset_at is timezone-aware; compare directly without stripping
if reset_at > now:
    return False

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 7, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 4 committers have signed the CLA.

✅ otaviofbrito
✅ yuneng-berri
❌ github-actions[bot]
❌ ishaan-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

ishaan-berri and others added 4 commits April 7, 2026 09:32
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ort, function or class'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ort, function or class'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 7, 2026 16:33 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 7, 2026 16:33 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 7, 2026 16:33 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-redis-postgres April 7, 2026 16:33 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 17, 2026 20:13 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 17, 2026 20:13 — with GitHub Actions Inactive
@ishaan-berri ishaan-berri temporarily deployed to integration-postgres April 17, 2026 20:13 — with GitHub Actions Inactive
…ails on pooler URL

When DIRECT_URL is not set and DATABASE_URL is a Neon pooler URL, prisma migrate diff
fails (pooler doesn't support extended query protocol for schema introspection). Previously
_resolve_all_migrations returned early without applying any migrations, leaving the
budget_limits column missing and causing test_auth_callback_new_user to fail.

Now falls back to running each migration SQL file via prisma db execute --file, which
works with pooler URLs and is safe to re-run due to IF NOT EXISTS guards.
page_utils.test.ts enforces that every menuGroups entry has a matching
description and vice versa. The left nav uses 'skills' but page_metadata.ts
still had 'claude-code-plugins', causing two test failures.
…ending

When prisma migrate deploy reports 'No pending migrations to apply' the DB
already matches schema — running _resolve_all_migrations (migrate diff +
prisma db execute) adds 25+ seconds unnecessarily, causing the proxy to
miss the 90-second startup timeout in test_litellm_proxy_server_config_no_general_settings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants