Skip to content

litellm oss staging 04/15/2026#25831

Merged
Sameerlite merged 7 commits intolitellm_internal_stagingfrom
litellm_oss_staging_04_15_2026_p1
Apr 16, 2026
Merged

litellm oss staging 04/15/2026#25831
Sameerlite merged 7 commits intolitellm_internal_stagingfrom
litellm_oss_staging_04_15_2026_p1

Conversation

@krrish-berri-2
Copy link
Copy Markdown
Contributor

…uests (#25800)

The Vertex AI count-tokens endpoint rejects model names that include version suffixes (@default, @20251001, etc.) with: "claude-sonnet-4-6@default is not supported for token counting"

The same model without the suffix ("claude-sonnet-4-6") works correctly.

Strip @suffix from both the model parameter and request_data["model"] in handle_count_tokens_request before sending to the API.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 16, 2026 2:01pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This is a staging PR bundling five targeted bug fixes: (1) stripping @version suffixes from Vertex AI partner-model count-tokens requests, (2) propagating Ollama done_reason='length' as finish_reason, (3) fixing allowed_openai_params forwarding None for unset params, (4) adding a user_api_key_alias fallback for Noma v2 application_id, and (5) syncing the in-memory credential list after a PATCH update. Every fix is accompanied by targeted unit tests.

Confidence Score: 5/5

Safe to merge — all five fixes are well-scoped, isolated, and covered by mock-only unit tests.

All findings are P2 or below. The primary fix (Vertex AI suffix stripping) is simple, correct, and thoroughly tested. Remaining changes are equally targeted with no regression risk identified.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/vertex_ai/vertex_ai_partner_models/count_tokens/handler.py Adds _strip_version_suffix static method and applies it to both model and request_data["model"] before the count-tokens API call; uses dict-spread immutable copy per style guide.
tests/test_litellm/llms/vertex_ai/vertex_ai_partner_models/count_tokens/test_count_tokens_location.py Adds TestCountTokensVersionSuffixStripping class with four mock-only tests covering @default, @date, no-suffix, and end-to-end stripping of request_data["model"].
litellm/llms/ollama/chat/transformation.py Propagates Ollama done_reason='length' as OpenAI finish_reason='length' in both streaming and non-streaming paths via map_finish_reason.
litellm/utils.py Fixes _apply_openai_param_overrides to only forward params that the caller actually sent, preventing None values from reaching provider SDKs as unknown kwargs.
litellm/proxy/guardrails/guardrail_hooks/noma/noma_v2.py Adds user_api_key_alias as final fallback for application_id when neither dynamic params nor the configured value is available.
litellm/proxy/credential_endpoints/endpoints.py PATCH handler now syncs in-memory credential_list after DB update, handling rename by removing the old entry before upserting the new one.
litellm/llms/bedrock/chat/converse_transformation.py No logic changes; new unit tests added for the existing _transform_usage method covering both base and reasoning-content token accounting.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant H as handle_count_tokens_request
    participant S as _strip_version_suffix
    participant API as Vertex AI count-tokens API

    C->>H: model="claude-sonnet-4-6@default"<br/>request_data={"model":"claude-sonnet-4-6@default", ...}
    H->>S: _strip_version_suffix("claude-sonnet-4-6@default")
    S-->>H: "claude-sonnet-4-6"
    H->>H: model = "claude-sonnet-4-6"
    H->>S: _strip_version_suffix(request_data["model"])
    S-->>H: "claude-sonnet-4-6"
    H->>H: request_data = {**request_data, "model": "claude-sonnet-4-6"}
    H->>API: POST /count-tokens {"model": "claude-sonnet-4-6", "messages": [...]}
    API-->>H: {"input_tokens": 42}
    H-->>C: {"input_tokens": 42, "tokenizer_used": "vertex_ai_partner_models"}
Loading

Reviews (6): Last reviewed commit: "Fix code qa" | Re-trigger Greptile

@krrish-berri-2 krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:54 — with GitHub Actions Inactive
@krrish-berri-2 krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:54 — with GitHub Actions Inactive
@krrish-berri-2 krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:58 — with GitHub Actions Inactive
@krrish-berri-2 krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 04:59 — with GitHub Actions Inactive
@krrish-berri-2 krrish-berri-2 temporarily deployed to integration-postgres April 16, 2026 05:04 — with GitHub Actions Inactive
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 16, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ Sameerlite
✅ xr843
✅ dkindlund
✅ jarsever
✅ bse-ai
❌ kuun993
You have signed the CLA already but the status is still pending? Let us recheck it.

dkindlund and others added 2 commits April 16, 2026 19:03
…uests (#25800)

The Vertex AI count-tokens endpoint rejects model names that include
version suffixes (@default, @20251001, etc.) with:
"claude-sonnet-4-6@default is not supported for token counting"

The same model without the suffix ("claude-sonnet-4-6") works correctly.

Strip @suffix from both the model parameter and request_data["model"]
in handle_count_tokens_request before sending to the API.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tokens truncation (#25824)

* fix(ollama): propagate done_reason='length' as finish_reason for max_tokens truncation

Ollama returns done_reason='length' when a response is cut off by num_predict
(the max_tokens limit). Previously, non-streaming responses hardcoded
finish_reason='stop', and streaming used chunk.get('done_reason', 'stop')
which also defaulted to 'stop' when done_reason was absent.

This meant callers (e.g. the Anthropic pass-through adapter, which maps
OpenAI 'length' -> Anthropic 'max_tokens') could never detect truncation,
making stop_reason always appear as 'end_turn' even for cut-off responses.

Fix: read done_reason from the response JSON in the non-streaming path and
use `chunk.get('done_reason') or 'stop'` in the streaming path, so Ollama's
actual done_reason passes through to the caller unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update test_ollama_chat_transformation.py

* Update litellm/llms/ollama/chat/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
bse-ai and others added 3 commits April 16, 2026 19:04
…oard (#25795)

Noma v1 resolved application_id from user_api_key_alias when no explicit
value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch
and this fallback was not ported, causing all requests from shared LiteLLM
instances to appear as a single generic "litellm" application in the Noma
dashboard — breaking per-user traceability.

Fix: after checking dynamic_params and self.application_id, fall back to
user_api_key_alias from litellm_metadata or metadata. This matches the
pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data()
and restores the v1 behavior where each API key gets its own application
entry in the Noma dashboard.

Fixes #25794

Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ne (#25777)

* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint (#25696)

* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint - Fixes #25538

* test(proxy): add tests for _get_openapi_url

---------

Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>

* feat(prometheus): add api_provider label to spend metric (#25693)

* feat(prometheus): add api_provider label to spend metric

Add `api_provider` to `litellm_spend_metric` labels so users can
build Grafana dashboards that break down spend by cloud provider
(e.g. bedrock, anthropic, openai, azure, vertex_ai).

The `api_provider` label already exists in UserAPIKeyLabelValues and
is populated from `standard_logging_payload["custom_llm_provider"]`,
but was not included in the spend metric's label list.

* add api_provider to requests metric + add test

Address review feedback:
- Add api_provider to litellm_requests_metric too (same call-site as
  spend metric, keeps label sets in sync)
- Add test_api_provider_in_spend_and_requests_metrics following the
  existing pattern in test_prometheus_labels.py

* fix: ensure `litellm_metadata` is attached to `pre_call` guardrail to align with `post_call` guardrail (#25641)

* fix: ensure `litellm_metadata` is attached to pre_call to align with post_call

* refactor: remove unused BaseTranslation._ensure_litellm_metadata

* refactor: module level imports for ensure_litellm_metadata and CodeQL

* fix: update based off of Codex comment

* revert: undo usage of `_guardrail_litellm_metadata`

* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite-preview (#25610)

* fix(bedrock): skip synthetic tool injection for json_object with no schema (#25740)

When response_format={"type": "json_object"} is sent without a JSON
schema, _create_json_tool_call_for_response_format builds a tool with an
empty schema (properties: {}). The model follows the empty schema and
returns {} instead of the actual JSON the caller asked for.

This patch:
- Skips synthetic json_tool_call injection when no schema is provided.
  The model already returns JSON when the prompt asks for it.
- Fixes finish_reason: after _filter_json_mode_tools strips all
  synthetic tool calls, finish_reason stays "tool_calls" instead of
  "stop". Callers (like the OpenAI SDK) misinterpret this as a pending
  tool invocation.

json_schema requests with an explicit schema are unchanged.

Co-authored-by: Claude <noreply@anthropic.com>

* fix(utils): allowed_openai_params must not forward unset params as None

`_apply_openai_param_overrides` iterated `allowed_openai_params` and
unconditionally wrote `optional_params[param] = non_default_params.pop(param, None)`
for each entry. If the caller listed a param name but did not actually
send that param in the request, the pop returned `None` and `None` was
still written to `optional_params`. The openai SDK then rejected it as
a top-level kwarg:

    AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'

Reproducer (from #25697):

    allowed_openai_params = ["chat_template_kwargs", "enable_thinking"]
    body = {"chat_template_kwargs": {"enable_thinking": False}}

Here `enable_thinking` is only present nested inside
`chat_template_kwargs`, so the helper should forward
`chat_template_kwargs` and leave `enable_thinking` alone. Instead it
wrote `optional_params["enable_thinking"] = None`.

Fix: only forward a param if it was actually present in
`non_default_params`. Behavior is unchanged for the happy path (param
sent → still forwarded), and the explicit `None` leakage is gone.

Adds a regression test exercising the helper in isolation so the test
does not depend on any provider-specific `map_openai_params` plumbing.

Fixes #25697

---------

Co-authored-by: lovek629 <59618812+lovek629@users.noreply.github.com>
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
Co-authored-by: Ori Kotek <ori.k@codium.ai>
Co-authored-by: Alexander Grattan <51346343+agrattan0820@users.noreply.github.com>
Co-authored-by: Mohana Siddhartha Chivukula <103447836+iamsiddhu3007@users.noreply.github.com>
Co-authored-by: Amiram Mizne <amiramm@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
@Sameerlite Sameerlite merged commit 26937a2 into litellm_internal_staging Apr 16, 2026
96 of 98 checks passed
@Sameerlite Sameerlite deleted the litellm_oss_staging_04_15_2026_p1 branch April 16, 2026 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants