Skip to content

Litellm oss staging 04 08 2026#25397

Merged
Sameerlite merged 8 commits intomainfrom
litellm_oss_staging_04_08_2026
Apr 13, 2026
Merged

Litellm oss staging 04 08 2026#25397
Sameerlite merged 8 commits intomainfrom
litellm_oss_staging_04_08_2026

Conversation

@krrish-berri-2
Copy link
Copy Markdown
Contributor

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

michelligabriele and others added 5 commits April 8, 2026 21:22
#25337)

* fix(vertex_ai): normalize Gemini finish_reason enum through map_finish_reason in streaming handler

In the legacy vertex_ai SDK streaming path, the raw Gemini finish_reason enum name (e.g. "STOP", "MAX_TOKENS") was stored directly into self.received_finish_reason without being mapped to OpenAI-compatible values. The finish_reason_handler then compared against lowercase "stop", causing the case mismatch to prevent the tool_call override from ever firing. This fix applies map_finish_reason() so all Gemini enum names are normalized before storage.Refactor finish reason handling to use map_finish_reason function.

* refactor: use module-level map_finish_reason import; drop redundant inline import

map_finish_reason is already imported at module scope (line 49) via `from .core_helpers import map_finish_reason, process_response_headers`. The inline import added in the previous commit was redundant. Addressed Greptile review feedback.Removed unnecessary import of map_finish_reason from core_helpers.

* test: add unit tests for Gemini legacy vertex finish_reason normalisation

Added tests to ensure finish_reason normalization for Gemini legacy vertex tool calls and stop reasons.
* fix: remove leading space from license public_key.pem

PEM must begin with -----BEGIN; a leading ASCII space breaks
cryptography.load_pem_public_key on older cryptography (e.g. 41.x),
causing OpenSSL no start line / deserialize errors.

Made-with: Cursor

* test: assert license public_key.pem loads as valid PEM

Regression guard for leading whitespace before -----BEGIN, which breaks
load_pem_public_key on older cryptography (e.g. 41.x).

Made-with: Cursor
…25331)

DashScope inherits OpenAIGPTConfig which strips cache_control from
messages and tools by default. Override remove_cache_control_flag_from_messages_and_tools()
to preserve cache_control, following the same pattern used by ZAI, MiniMax, and Databricks.

Verified through 10-round multi-turn conversation tests:
- Explicit caching works correctly: cached_tokens grows each round from R4 onwards,
  with cache_creation_tokens reported on first cache build.
- Implicit caching is not affected: models that rely on implicit prefix-matching caching
  produce identical cached_tokens with and without this change, confirmed by comparing
  results against both the reverted codebase and direct API calls bypassing litellm.
- No errors or regressions observed on any model, including those that do not support
  explicit caching — the DashScope API silently ignores unrecognized cache_control fields.

Fixes #25330

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ai/gpt-oss-120b (#25263)

* fix: expose reasoning effort fields in get_model_info and add together_ai/gpt-oss-120b

- litellm/utils.py: pass supports_none_reasoning_effort and
  supports_xhigh_reasoning_effort through _get_model_info_helper so
  get_model_info() returns them (previously silently dropped). Fixes #25096.

- model_prices_and_context_window.json: add together_ai/openai/gpt-oss-120b
  with supports_reasoning: true so reasoning_effort is accepted for this
  model without requiring drop_params. Fixes #25132.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: consolidate duplicate together_ai/openai/gpt-oss-120b entry and sync backup file

* fix: link commit to GitHub account for CLA verification

---------

Co-authored-by: Austin Varga <austin@knowmi.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 13, 2026 3:19am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 9, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
6 out of 8 committers have signed the CLA.

✅ michelligabriele
✅ avarga1
✅ silencedoctor
✅ milan-berri
✅ Sameerlite
✅ acebot712
❌ abhyudayareddy
❌ krrish-berri-2
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 9, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_oss_staging_04_08_2026 (fa605d8) with main (5544803)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds a new PromptGuard guardrail integration (prompt injection detection, PII redaction, topic filtering) along with a Vertex AI streaming fix (normalising proto finish_reason names through map_finish_reason()), a DashScope cache_control preservation override, a public_key.pem leading-whitespace fix, and two new model-info capability flags (supports_none_reasoning_effort, supports_xhigh_reasoning_effort).

  • Redact silent pass-through (promptguard.py lines 186–196): when PromptGuard returns decision: \"redact\" but omits redacted_messages, the original content is returned to the LLM with no log and no error — the intended redaction is silently skipped regardless of block_on_error.

Confidence Score: 4/5

Safe to merge after addressing the redact silent pass-through bug in PromptGuard

One P1 bug: when PromptGuard returns decision='redact' without a redacted_messages payload, the original (potentially PII-containing) content is forwarded to the LLM unmodified and no warning is emitted. All other changes — streaming fix, DashScope override, PEM fix, model data updates — are correct and well-tested.

litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py (redact decision silent pass-through)

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py New PromptGuard guardrail integration; silent pass-through bug when redact decision carries no redacted_messages
litellm/proxy/guardrails/guardrail_hooks/promptguard/init.py Standard guardrail initializer and registry wiring for PromptGuard; looks correct
litellm/litellm_core_utils/streaming_handler.py Bug fix: wraps raw Vertex AI proto enum finish_reason name through map_finish_reason() so tool_calls override works correctly
litellm/llms/dashscope/chat/transformation.py Overrides remove_cache_control_flag_from_messages_and_tools to preserve cache_control for DashScope provider
litellm/proxy/auth/public_key.pem Fixes leading whitespace on -----BEGIN PUBLIC KEY----- header that caused PEM load failure
litellm/types/proxy/guardrails/guardrail_hooks/promptguard.py Defines Pydantic config model for PromptGuard with api_key, api_base, and block_on_error fields
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py Comprehensive mock-only test suite for PromptGuard guardrail; no real network calls

Sequence Diagram

sequenceDiagram
    participant Proxy
    participant PromptGuardGuardrail
    participant PromptGuardAPI
    participant LLM

    Proxy->>PromptGuardGuardrail: apply_guardrail(inputs, "request")
    PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard
    alt API error
        PromptGuardAPI-->>PromptGuardGuardrail: Exception
        alt block_on_error=True
            PromptGuardGuardrail-->>Proxy: GuardrailRaisedException
        else block_on_error=False
            PromptGuardGuardrail-->>Proxy: original inputs
        end
    else decision=allow
        PromptGuardAPI-->>PromptGuardGuardrail: decision:allow
        PromptGuardGuardrail-->>Proxy: inputs unchanged
        Proxy->>LLM: original messages
    else decision=block
        PromptGuardAPI-->>PromptGuardGuardrail: decision:block
        PromptGuardGuardrail-->>Proxy: GuardrailRaisedException
    else decision=redact with redacted_messages
        PromptGuardAPI-->>PromptGuardGuardrail: decision:redact + redacted_messages
        PromptGuardGuardrail-->>Proxy: inputs with redacted content
        Proxy->>LLM: redacted messages
    else decision=redact WITHOUT redacted_messages
        PromptGuardAPI-->>PromptGuardGuardrail: decision:redact only
        Note over PromptGuardGuardrail: Bug: original inputs returned silently
        PromptGuardGuardrail-->>Proxy: original inputs unredacted
        Proxy->>LLM: unredacted messages
    end
Loading

Reviews (4): Last reviewed commit: "Merge pull request #25616 from BerriAI/m..." | Re-trigger Greptile

Comment on lines +7 to 10
from litellm.types.llms.openai import ChatCompletionToolParam

from litellm.secret_managers.main import get_secret_str
from litellm.types.llms.openai import AllMessageValues
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate import from the same module

ChatCompletionToolParam and AllMessageValues are both imported from litellm.types.llms.openai in two separate statements. Consolidate them into one import per the project's style conventions.

Suggested change
from litellm.types.llms.openai import ChatCompletionToolParam
from litellm.secret_managers.main import get_secret_str
from litellm.types.llms.openai import AllMessageValues
from litellm.types.llms.openai import AllMessageValues, ChatCompletionToolParam

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

* Add PromptGuard guardrail integration

Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.

Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
  apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries

Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap

Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
  request payload construction, error handling, config model, and
  registry wiring

* Fix redact path and init ordering per review feedback

- P1: Update structured_messages (not just texts) when PromptGuard
  returns a redact decision, so PII redaction is effective for the
  primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
  resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction

* Harden PromptGuard integration: fail-open, event hooks, images, docs

- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality

* Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat

* Address remaining Greptile feedback: timeout, redact guard

- Add explicit 10s timeout to async_handler.post() to prevent
  indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
  was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
  texts key (41 tests total)

* Fix CI lint: black formatting, add PromptGuardConfigModel to LitellmParams

- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
  Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)

* Address Greptile review: redact path, null decision, error context

- P1: Filter _extract_texts_from_messages to user-role messages only,
  preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
  weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
  decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
  caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
  preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow

* Fix black formatting: collapse f-string in error message
@Sameerlite Sameerlite temporarily deployed to integration-postgres April 13, 2026 03:13 — with GitHub Actions Inactive
@Sameerlite Sameerlite temporarily deployed to integration-postgres April 13, 2026 03:13 — with GitHub Actions Inactive
@Sameerlite Sameerlite temporarily deployed to integration-postgres April 13, 2026 03:13 — with GitHub Actions Inactive
Comment on lines +186 to +198
if decision == "redact":
redacted = result.get("redacted_messages")
if redacted:
if structured_messages:
inputs["structured_messages"] = redacted
if "texts" in inputs:
extracted = self._extract_texts_from_messages(
redacted,
)
if extracted:
inputs["texts"] = extracted

return inputs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent redaction bypass when redacted_messages is absent

When PromptGuard returns decision: "redact" but redacted_messages is null or missing, the if redacted: guard silently falls through and the original unredacted inputs are returned unchanged — the PII/sensitive content passes to the LLM with no log warning and no error. This undermines the security guarantee of the redact path regardless of block_on_error.

Suggested change
if decision == "redact":
redacted = result.get("redacted_messages")
if redacted:
if structured_messages:
inputs["structured_messages"] = redacted
if "texts" in inputs:
extracted = self._extract_texts_from_messages(
redacted,
)
if extracted:
inputs["texts"] = extracted
return inputs
if decision == "redact":
redacted = result.get("redacted_messages")
if redacted:
if structured_messages:
inputs["structured_messages"] = redacted
if "texts" in inputs:
extracted = self._extract_texts_from_messages(
redacted,
)
if extracted:
inputs["texts"] = extracted
else:
verbose_proxy_logger.warning(
"PromptGuard returned decision='redact' but no "
"redacted_messages; original content will be used."
)

@Sameerlite Sameerlite merged commit 5e80e07 into main Apr 13, 2026
103 of 108 checks passed
@Sameerlite Sameerlite deleted the litellm_oss_staging_04_08_2026 branch April 13, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants