litellm oss staging 04/15/2026 by krrish-berri-2 · Pull Request #25831 · BerriAI/litellm

krrish-berri-2 · 2026-04-16T02:24:21Z

…uests (#25800)

The Vertex AI count-tokens endpoint rejects model names that include version suffixes (@default, @20251001, etc.) with: "claude-sonnet-4-6@default is not supported for token counting"

The same model without the suffix ("claude-sonnet-4-6") works correctly.

Strip @suffix from both the model parameter and request_data["model"] in handle_count_tokens_request before sending to the API.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

vercel · 2026-04-16T02:24:29Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 16, 2026 2:01pm

greptile-apps · 2026-04-16T02:26:31Z

Greptile Summary

This is a staging PR bundling five targeted bug fixes: (1) stripping @version suffixes from Vertex AI partner-model count-tokens requests, (2) propagating Ollama done_reason='length' as finish_reason, (3) fixing allowed_openai_params forwarding None for unset params, (4) adding a user_api_key_alias fallback for Noma v2 application_id, and (5) syncing the in-memory credential list after a PATCH update. Every fix is accompanied by targeted unit tests.

Confidence Score: 5/5

Safe to merge — all five fixes are well-scoped, isolated, and covered by mock-only unit tests.

All findings are P2 or below. The primary fix (Vertex AI suffix stripping) is simple, correct, and thoroughly tested. Remaining changes are equally targeted with no regression risk identified.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/vertex_ai/vertex_ai_partner_models/count_tokens/handler.py	Adds `_strip_version_suffix` static method and applies it to both `model` and `request_data["model"]` before the count-tokens API call; uses dict-spread immutable copy per style guide.
tests/test_litellm/llms/vertex_ai/vertex_ai_partner_models/count_tokens/test_count_tokens_location.py	Adds `TestCountTokensVersionSuffixStripping` class with four mock-only tests covering `@default`, `@date`, no-suffix, and end-to-end stripping of `request_data["model"]`.
litellm/llms/ollama/chat/transformation.py	Propagates Ollama `done_reason='length'` as OpenAI `finish_reason='length'` in both streaming and non-streaming paths via `map_finish_reason`.
litellm/utils.py	Fixes `_apply_openai_param_overrides` to only forward params that the caller actually sent, preventing `None` values from reaching provider SDKs as unknown kwargs.
litellm/proxy/guardrails/guardrail_hooks/noma/noma_v2.py	Adds `user_api_key_alias` as final fallback for `application_id` when neither dynamic params nor the configured value is available.
litellm/proxy/credential_endpoints/endpoints.py	PATCH handler now syncs in-memory `credential_list` after DB update, handling rename by removing the old entry before upserting the new one.
litellm/llms/bedrock/chat/converse_transformation.py	No logic changes; new unit tests added for the existing `_transform_usage` method covering both base and reasoning-content token accounting.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant H as handle_count_tokens_request
    participant S as _strip_version_suffix
    participant API as Vertex AI count-tokens API

    C->>H: model="claude-sonnet-4-6@default"<br/>request_data={"model":"claude-sonnet-4-6@default", ...}
    H->>S: _strip_version_suffix("claude-sonnet-4-6@default")
    S-->>H: "claude-sonnet-4-6"
    H->>H: model = "claude-sonnet-4-6"
    H->>S: _strip_version_suffix(request_data["model"])
    S-->>H: "claude-sonnet-4-6"
    H->>H: request_data = {**request_data, "model": "claude-sonnet-4-6"}
    H->>API: POST /count-tokens {"model": "claude-sonnet-4-6", "messages": [...]}
    API-->>H: {"input_tokens": 42}
    H-->>C: {"input_tokens": 42, "tokenizer_used": "vertex_ai_partner_models"}

_{Reviews (6): Last reviewed commit: "Fix code qa" | Re-trigger Greptile}

CLAassistant · 2026-04-16T05:04:58Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ Sameerlite
✅ xr843
✅ dkindlund
✅ jarsever
✅ bse-ai
❌ kuun993
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

@default

…uests (#25800) The Vertex AI count-tokens endpoint rejects model names that include version suffixes (@default, @20251001, etc.) with: "claude-sonnet-4-6@default is not supported for token counting" The same model without the suffix ("claude-sonnet-4-6") works correctly. Strip @suffix from both the model parameter and request_data["model"] in handle_count_tokens_request before sending to the API. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tokens truncation (#25824) * fix(ollama): propagate done_reason='length' as finish_reason for max_tokens truncation Ollama returns done_reason='length' when a response is cut off by num_predict (the max_tokens limit). Previously, non-streaming responses hardcoded finish_reason='stop', and streaming used chunk.get('done_reason', 'stop') which also defaulted to 'stop' when done_reason was absent. This meant callers (e.g. the Anthropic pass-through adapter, which maps OpenAI 'length' -> Anthropic 'max_tokens') could never detect truncation, making stop_reason always appear as 'end_turn' even for cut-off responses. Fix: read done_reason from the response JSON in the non-streaming path and use `chunk.get('done_reason') or 'stop'` in the streaming path, so Ollama's actual done_reason passes through to the caller unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update test_ollama_chat_transformation.py * Update litellm/llms/ollama/chat/transformation.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…oard (#25795) Noma v1 resolved application_id from user_api_key_alias when no explicit value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch and this fallback was not ported, causing all requests from shared LiteLLM instances to appear as a single generic "litellm" application in the Noma dashboard — breaking per-user traceability. Fix: after checking dynamic_params and self.application_id, fall back to user_api_key_alias from litellm_metadata or metadata. This matches the pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data() and restores the v1 behavior where each API key gets its own application entry in the Noma dashboard. Fixes #25794 Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…ne (#25777) * feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint (#25696) * feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint - Fixes #25538 * test(proxy): add tests for _get_openapi_url --------- Co-authored-by: Progressive-engg <lov.kumari55@gmail.com> * feat(prometheus): add api_provider label to spend metric (#25693) * feat(prometheus): add api_provider label to spend metric Add `api_provider` to `litellm_spend_metric` labels so users can build Grafana dashboards that break down spend by cloud provider (e.g. bedrock, anthropic, openai, azure, vertex_ai). The `api_provider` label already exists in UserAPIKeyLabelValues and is populated from `standard_logging_payload["custom_llm_provider"]`, but was not included in the spend metric's label list. * add api_provider to requests metric + add test Address review feedback: - Add api_provider to litellm_requests_metric too (same call-site as spend metric, keeps label sets in sync) - Add test_api_provider_in_spend_and_requests_metrics following the existing pattern in test_prometheus_labels.py * fix: ensure `litellm_metadata` is attached to `pre_call` guardrail to align with `post_call` guardrail (#25641) * fix: ensure `litellm_metadata` is attached to pre_call to align with post_call * refactor: remove unused BaseTranslation._ensure_litellm_metadata * refactor: module level imports for ensure_litellm_metadata and CodeQL * fix: update based off of Codex comment * revert: undo usage of `_guardrail_litellm_metadata` * feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite-preview (#25610) * fix(bedrock): skip synthetic tool injection for json_object with no schema (#25740) When response_format={"type": "json_object"} is sent without a JSON schema, _create_json_tool_call_for_response_format builds a tool with an empty schema (properties: {}). The model follows the empty schema and returns {} instead of the actual JSON the caller asked for. This patch: - Skips synthetic json_tool_call injection when no schema is provided. The model already returns JSON when the prompt asks for it. - Fixes finish_reason: after _filter_json_mode_tools strips all synthetic tool calls, finish_reason stays "tool_calls" instead of "stop". Callers (like the OpenAI SDK) misinterpret this as a pending tool invocation. json_schema requests with an explicit schema are unchanged. Co-authored-by: Claude <noreply@anthropic.com> * fix(utils): allowed_openai_params must not forward unset params as None `_apply_openai_param_overrides` iterated `allowed_openai_params` and unconditionally wrote `optional_params[param] = non_default_params.pop(param, None)` for each entry. If the caller listed a param name but did not actually send that param in the request, the pop returned `None` and `None` was still written to `optional_params`. The openai SDK then rejected it as a top-level kwarg: AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking' Reproducer (from #25697): allowed_openai_params = ["chat_template_kwargs", "enable_thinking"] body = {"chat_template_kwargs": {"enable_thinking": False}} Here `enable_thinking` is only present nested inside `chat_template_kwargs`, so the helper should forward `chat_template_kwargs` and leave `enable_thinking` alone. Instead it wrote `optional_params["enable_thinking"] = None`. Fix: only forward a param if it was actually present in `non_default_params`. Behavior is unchanged for the happy path (param sent → still forwarded), and the explicit `None` leakage is gone. Adds a regression test exercising the helper in isolation so the test does not depend on any provider-specific `map_openai_params` plumbing. Fixes #25697 --------- Co-authored-by: lovek629 <59618812+lovek629@users.noreply.github.com> Co-authored-by: Progressive-engg <lov.kumari55@gmail.com> Co-authored-by: Ori Kotek <ori.k@codium.ai> Co-authored-by: Alexander Grattan <51346343+agrattan0820@users.noreply.github.com> Co-authored-by: Mohana Siddhartha Chivukula <103447836+iamsiddhu3007@users.noreply.github.com> Co-authored-by: Amiram Mizne <amiramm@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com>

krrish-berri-2 requested a review from Sameerlite April 16, 2026 02:24

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 02:24 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 16, 2026 02:24 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 02:24 — with GitHub Actions Inactive

vercel Bot deployed to Preview April 16, 2026 02:25 View deployment

krrish-berri-2 had a problem deploying to integration-postgres April 16, 2026 04:52 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:54 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 16, 2026 04:54 — with GitHub Actions Error

vercel Bot deployed to Preview April 16, 2026 04:55 View deployment

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:58 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 16, 2026 04:59 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:59 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 16, 2026 04:59 — with GitHub Actions Error

vercel Bot deployed to Preview April 16, 2026 05:00 View deployment

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 05:04 — with GitHub Actions Inactive

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 05:05 — with GitHub Actions Inactive

krrish-berri-2 had a problem deploying to integration-postgres April 16, 2026 05:05 — with GitHub Actions Error

krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 05:05 — with GitHub Actions Inactive

vercel Bot deployed to Preview April 16, 2026 05:06 View deployment

dkindlund and others added 2 commits April 16, 2026 19:03

bse-ai and others added 3 commits April 16, 2026 19:04

fix(credentials): sync in-memory credential_list after update (#25758)

d9a8a8a

Sameerlite force-pushed the litellm_oss_staging_04_15_2026_p1 branch from 2138a47 to d9a8a8a Compare April 16, 2026 13:35

Sameerlite temporarily deployed to integration-postgres April 16, 2026 13:35 — with GitHub Actions Inactive

Sameerlite had a problem deploying to integration-postgres April 16, 2026 13:35 — with GitHub Actions Error

Sameerlite temporarily deployed to integration-postgres April 16, 2026 13:35 — with GitHub Actions Inactive

Sameerlite had a problem deploying to integration-postgres April 16, 2026 13:35 — with GitHub Actions Error

vercel Bot deployed to Preview April 16, 2026 13:37 View deployment

Fix import error

baf19b4

Sameerlite temporarily deployed to integration-postgres April 16, 2026 13:47 — with GitHub Actions Inactive

Sameerlite had a problem deploying to integration-postgres April 16, 2026 13:48 — with GitHub Actions Error

Sameerlite temporarily deployed to integration-postgres April 16, 2026 13:48 — with GitHub Actions Inactive

vercel Bot deployed to Preview April 16, 2026 13:48 View deployment

Fix code qa

4b5c86b

Sameerlite temporarily deployed to integration-postgres April 16, 2026 14:00 — with GitHub Actions Inactive

vercel Bot deployed to Preview April 16, 2026 14:01 View deployment

Sameerlite approved these changes Apr 16, 2026

View reviewed changes

Sameerlite merged commit 26937a2 into litellm_internal_staging Apr 16, 2026
96 of 98 checks passed

Sameerlite deleted the litellm_oss_staging_04_15_2026_p1 branch April 16, 2026 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

litellm oss staging 04/15/2026#25831

litellm oss staging 04/15/2026#25831
Sameerlite merged 7 commits intolitellm_internal_stagingfrom
litellm_oss_staging_04_15_2026_p1

krrish-berri-2 commented Apr 16, 2026

Uh oh!

vercel Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 16, 2026 •

edited

Loading

Important Files Changed

Uh oh!

CLAassistant commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

krrish-berri-2 commented Apr 16, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

vercel Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

CLAassistant commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

vercel Bot commented Apr 16, 2026 •

edited

Loading

greptile-apps Bot commented Apr 16, 2026 •

edited

Loading

CLAassistant commented Apr 16, 2026 •

edited

Loading