Skip to content

Litellm oss staging 04 11 2026#25589

Merged
Sameerlite merged 11 commits intomainfrom
litellm_oss_staging_04_11_2026
Apr 14, 2026
Merged

Litellm oss staging 04 11 2026#25589
Sameerlite merged 11 commits intomainfrom
litellm_oss_staging_04_11_2026

Conversation

@krrish-berri-2
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

dkindlund and others added 5 commits April 11, 2026 19:36
…update and key rotation (#25552)

Two code paths in key_management_endpoints.py call hash_token()
unconditionally when invalidating the user_api_key_cache after a key
update.  When the caller passes a pre-hashed token ID (not an sk-
prefixed key), hash_token() double-hashes it, producing a cache key
that does not match the actual cached entry.  Cache invalidation
silently fails.

This is compounded by update_cache() which writes the stale cached key
object back with a fresh 60s TTL after every successful request,
preventing natural TTL expiry.  The stale entry (with outdated fields
like max_budget=None) persists indefinitely under load.

PR #24969 fixed this in update_key_fn but missed two other call sites:
- _process_single_key_update (bulk update path)
- _execute_virtual_key_regeneration (key rotation path)

Fix: replace hash_token() with _hash_token_if_needed() in both
locations, matching the pattern already used elsewhere in the file.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… key (#25549)

The model_max_budget limiter tracks spend in one code path
(async_log_success_event) and enforces budget limits in another
(is_key_within_model_budget via user_api_key_auth). These two paths
used different model name formats to build cache keys:

- Tracking used standard_logging_payload["model"], which is the
  deployment-level model name (e.g. "vertex_ai/claude-opus-4-6@default")
- Enforcement used request_data["model"], which is the model group
  alias (e.g. "claude-opus-4-6")

Because the cache keys never matched, the enforcement path always read
None for current spend, silently allowing all requests through even
after the budget was exceeded. This affected any provider that decorates
model names with provider prefixes or version suffixes (Vertex AI,
Bedrock, etc.).

Fix: use model_group (the user-facing alias) from StandardLoggingPayload
for spend tracking, falling back to model when model_group is None.
This aligns the tracking cache key with the enforcement cache key.

Fixes the same root cause reported in #15223 and #10052.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r schedule (#25440)

Budget table entries (team members, end-users) used duration_in_seconds()
for a sliding-window reset, while keys/users/teams used calendar-aligned
get_budget_reset_time(). This made "30d" and "1mo" mean different things
depending on entity type. Now both paths use get_budget_reset_time() for
consistent calendar-aligned resets (e.g. "30d" → 1st of next month).

Fixes #25432

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 14, 2026 3:37pm

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Apr 12, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_oss_staging_04_11_2026 (a0e61a9) with main (e64d98f)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 12, 2026

Greptile Summary

This staging PR bundles several bug fixes: aligning model_max_budget spend-tracking cache keys to use model_group (the user-facing alias), fixing silent cache-invalidation failures caused by double-hashing token IDs in bulk-update and key-rotation endpoints, resetting spend counters for budget-tier-linked keys and team members when a budget period expires, and adding model_max_budget validation to the /budget/new and /budget/update endpoints. New model pricing entries (baseten, wandb) and accompanying unit/integration tests are included.

Confidence Score: 5/5

  • Safe to merge; all remaining findings are P2 style and test-coverage suggestions that do not block production correctness.
  • All three issues are P2: an inline import that should be module-level, a missing mock method that silently skips (but does not break) a test, and a missing assertion on a spend > 0 filter. None affect runtime behaviour or data integrity.
  • tests/test_litellm/proxy/common_utils/test_reset_budget_job.py — add find_many stub to MockLiteLLMTeamMembership so the new team-member cache-reset path is exercised.

Important Files Changed

Filename Overview
litellm/proxy/common_utils/reset_budget_job.py Adds reset_budget_for_keys_linked_to_budgets (resets spend on keys tied to a budget tier with no own reset schedule) and reset_budget_for_team_members (resets team-member spend counters in Redis/in-memory + DB). Also extends _reset_budget_common to flush Redis spend counters for keys and teams on budget reset. Logic is sound; imports from proxy_server are inline due to the circular-import constraint.
litellm/proxy/hooks/model_max_budget_limiter.py Uses model_group (user-facing alias) instead of the deployment-level model field as the spend-tracking cache key, aligning the log path with the enforcement path in is_key_within_model_budget. No issues found.
litellm/proxy/management_endpoints/budget_management_endpoints.py Adds validate_model_max_budget calls to /budget/new and /budget/update to enforce the enterprise license check and schema validation on model_max_budget. Minor: the import is placed inline inside the handler functions instead of at the top of the file, violating the project style guide.
tests/test_litellm/proxy/common_utils/test_reset_budget_job.py Comprehensive new test suite for ResetBudgetJob covering keys, users, teams, end-users, budget-table resets, and the new reset_budget_for_keys_linked_to_budgets helper. Minor gap: MockLiteLLMTeamMembership is missing find_many, silently skipping the new team-member cache-reset path; and spend: {gt: 0} filter is not verified.
tests/test_litellm/proxy/management_endpoints/test_key_management_endpoints.py Adds tests for cache-invalidation correctness in _process_single_key_update and _execute_virtual_key_regeneration, verifying that pre-hashed tokens are not double-hashed. Tests are isolated and use appropriate mocking.
tests/test_litellm/test_cost_calculator.py Adds pricing-verification tests for baseten and wandb model entries, a dynamic-routing cost-fallback test, and an image token cost test with and without input_cost_per_image_token. All tests are self-contained mock-based unit tests.
tests/test_budget_management.py Integration test verifying that /budget/new with a budget_duration sets budget_reset_at to the next calendar-aligned reset time. Lives in the tests/ root (integration tier) and makes real HTTP calls, which is appropriate for this location.
litellm/proxy/management_endpoints/key_management_endpoints.py Replaces raw cache-key lookup with _hash_token_if_needed in bulk-update and key-regeneration paths to prevent double-hashing when the input is already a pre-hashed token ID. No functional regressions found.

Sequence Diagram

sequenceDiagram
    participant BudgetResetJob
    participant DB as PrismaClient / DB
    participant Cache as Redis / InMemoryCache

    BudgetResetJob->>DB: "find budgets where reset_at <= now"
    DB-->>BudgetResetJob: budgets_to_reset[]

    BudgetResetJob->>BudgetResetJob: _reset_budget_reset_at_date() for each budget
    BudgetResetJob->>DB: update_many budgets (new reset_at)

    BudgetResetJob->>DB: find team memberships by budget_id
    DB-->>BudgetResetJob: memberships[]
    loop each membership
        BudgetResetJob->>Cache: "set spend:team_member:{uid}:{tid} = 0"
    end
    BudgetResetJob->>DB: "update_many litellm_teammembership spend=0"

    BudgetResetJob->>DB: "update_many litellm_verificationtoken<br/>(budget_id IN ids, budget_duration IS NULL, spend > 0) spend=0"

    BudgetResetJob->>DB: find end-users by budget_id
    DB-->>BudgetResetJob: endusers[]
    BudgetResetJob->>BudgetResetJob: "_reset_budget_for_enduser() spend=0"
    BudgetResetJob->>DB: "update_many endusers spend=0"
Loading

Reviews (4): Last reviewed commit: "Fix code qa" | Re-trigger Greptile

Comment on lines +8857 to +8875
class DictLikeResult:
def __init__(self, data):
self._data = data
def __iter__(self):
return iter(self._data.items())
mock_prisma_client.db.litellm_verificationtoken.update = AsyncMock(
return_value=DictLikeResult({"token": "new-hashed-token", "key_name": "sk-...ab12", "user_id": "user-1"})
)
mock_prisma_client.db.litellm_verificationtoken.create = AsyncMock(
return_value=None
)
mock_prisma_client.jsonify_object = MagicMock(side_effect=lambda data: data)

mock_user_api_key_cache = MagicMock()
mock_proxy_logging_obj = MagicMock()

user_api_key_dict = UserAPIKeyAuth(
user_role=LitellmUserRoles.PROXY_ADMIN,
api_key="sk-admin",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 DictLikeResult workaround is fragile

The inner DictLikeResult class works today because _execute_virtual_key_regeneration calls dict(updated_token), and __iter__ yields (key, value) pairs. However, if the function is later updated to call any Prisma model method (e.g., updated_token.model_dump(), updated_token.token, attribute access), the test would raise an AttributeError without a clear failure message.

A more resilient alternative is using a MagicMock with __iter__ set, or using a real LiteLLM_VerificationToken instance as the mock return value. This keeps the test aligned with the actual Prisma return type shape.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

…ng-encoding-format

Revert "fix(embedding): omit null encoding_format for openai requests"
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Apr 14, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29203053 Triggered Generic Password ee40da5 .circleci/config.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@Sameerlite Sameerlite temporarily deployed to integration-postgres April 14, 2026 15:25 — with GitHub Actions Inactive
@Sameerlite Sameerlite temporarily deployed to integration-postgres April 14, 2026 15:25 — with GitHub Actions Inactive
@Sameerlite Sameerlite merged commit b8f7d61 into main Apr 14, 2026
100 of 108 checks passed
@Sameerlite Sameerlite deleted the litellm_oss_staging_04_11_2026 branch April 14, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants