Conversation
#25337) * fix(vertex_ai): normalize Gemini finish_reason enum through map_finish_reason in streaming handler In the legacy vertex_ai SDK streaming path, the raw Gemini finish_reason enum name (e.g. "STOP", "MAX_TOKENS") was stored directly into self.received_finish_reason without being mapped to OpenAI-compatible values. The finish_reason_handler then compared against lowercase "stop", causing the case mismatch to prevent the tool_call override from ever firing. This fix applies map_finish_reason() so all Gemini enum names are normalized before storage.Refactor finish reason handling to use map_finish_reason function. * refactor: use module-level map_finish_reason import; drop redundant inline import map_finish_reason is already imported at module scope (line 49) via `from .core_helpers import map_finish_reason, process_response_headers`. The inline import added in the previous commit was redundant. Addressed Greptile review feedback.Removed unnecessary import of map_finish_reason from core_helpers. * test: add unit tests for Gemini legacy vertex finish_reason normalisation Added tests to ensure finish_reason normalization for Gemini legacy vertex tool calls and stop reasons.
* fix: remove leading space from license public_key.pem PEM must begin with -----BEGIN; a leading ASCII space breaks cryptography.load_pem_public_key on older cryptography (e.g. 41.x), causing OpenSSL no start line / deserialize errors. Made-with: Cursor * test: assert license public_key.pem loads as valid PEM Regression guard for leading whitespace before -----BEGIN, which breaks load_pem_public_key on older cryptography (e.g. 41.x). Made-with: Cursor
…25331) DashScope inherits OpenAIGPTConfig which strips cache_control from messages and tools by default. Override remove_cache_control_flag_from_messages_and_tools() to preserve cache_control, following the same pattern used by ZAI, MiniMax, and Databricks. Verified through 10-round multi-turn conversation tests: - Explicit caching works correctly: cached_tokens grows each round from R4 onwards, with cache_creation_tokens reported on first cache build. - Implicit caching is not affected: models that rely on implicit prefix-matching caching produce identical cached_tokens with and without this change, confirmed by comparing results against both the reverted codebase and direct API calls bypassing litellm. - No errors or regressions observed on any model, including those that do not support explicit caching — the DashScope API silently ignores unrecognized cache_control fields. Fixes #25330 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ai/gpt-oss-120b (#25263) * fix: expose reasoning effort fields in get_model_info and add together_ai/gpt-oss-120b - litellm/utils.py: pass supports_none_reasoning_effort and supports_xhigh_reasoning_effort through _get_model_info_helper so get_model_info() returns them (previously silently dropped). Fixes #25096. - model_prices_and_context_window.json: add together_ai/openai/gpt-oss-120b with supports_reasoning: true so reasoning_effort is accepted for this model without requiring drop_params. Fixes #25132. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: consolidate duplicate together_ai/openai/gpt-oss-120b entry and sync backup file * fix: link commit to GitHub account for CLA verification --------- Co-authored-by: Austin Varga <austin@knowmi.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Greptile SummaryThis PR adds a new PromptGuard guardrail integration (prompt injection detection, PII redaction, topic filtering) along with a Vertex AI streaming fix (normalising proto
Confidence Score: 4/5Safe to merge after addressing the redact silent pass-through bug in PromptGuard One P1 bug: when PromptGuard returns decision='redact' without a redacted_messages payload, the original (potentially PII-containing) content is forwarded to the LLM unmodified and no warning is emitted. All other changes — streaming fix, DashScope override, PEM fix, model data updates — are correct and well-tested. litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py (redact decision silent pass-through)
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py | New PromptGuard guardrail integration; silent pass-through bug when redact decision carries no redacted_messages |
| litellm/proxy/guardrails/guardrail_hooks/promptguard/init.py | Standard guardrail initializer and registry wiring for PromptGuard; looks correct |
| litellm/litellm_core_utils/streaming_handler.py | Bug fix: wraps raw Vertex AI proto enum finish_reason name through map_finish_reason() so tool_calls override works correctly |
| litellm/llms/dashscope/chat/transformation.py | Overrides remove_cache_control_flag_from_messages_and_tools to preserve cache_control for DashScope provider |
| litellm/proxy/auth/public_key.pem | Fixes leading whitespace on -----BEGIN PUBLIC KEY----- header that caused PEM load failure |
| litellm/types/proxy/guardrails/guardrail_hooks/promptguard.py | Defines Pydantic config model for PromptGuard with api_key, api_base, and block_on_error fields |
| tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py | Comprehensive mock-only test suite for PromptGuard guardrail; no real network calls |
Sequence Diagram
sequenceDiagram
participant Proxy
participant PromptGuardGuardrail
participant PromptGuardAPI
participant LLM
Proxy->>PromptGuardGuardrail: apply_guardrail(inputs, "request")
PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard
alt API error
PromptGuardAPI-->>PromptGuardGuardrail: Exception
alt block_on_error=True
PromptGuardGuardrail-->>Proxy: GuardrailRaisedException
else block_on_error=False
PromptGuardGuardrail-->>Proxy: original inputs
end
else decision=allow
PromptGuardAPI-->>PromptGuardGuardrail: decision:allow
PromptGuardGuardrail-->>Proxy: inputs unchanged
Proxy->>LLM: original messages
else decision=block
PromptGuardAPI-->>PromptGuardGuardrail: decision:block
PromptGuardGuardrail-->>Proxy: GuardrailRaisedException
else decision=redact with redacted_messages
PromptGuardAPI-->>PromptGuardGuardrail: decision:redact + redacted_messages
PromptGuardGuardrail-->>Proxy: inputs with redacted content
Proxy->>LLM: redacted messages
else decision=redact WITHOUT redacted_messages
PromptGuardAPI-->>PromptGuardGuardrail: decision:redact only
Note over PromptGuardGuardrail: Bug: original inputs returned silently
PromptGuardGuardrail-->>Proxy: original inputs unredacted
Proxy->>LLM: unredacted messages
end
Reviews (4): Last reviewed commit: "Merge pull request #25616 from BerriAI/m..." | Re-trigger Greptile
| from litellm.types.llms.openai import ChatCompletionToolParam | ||
|
|
||
| from litellm.secret_managers.main import get_secret_str | ||
| from litellm.types.llms.openai import AllMessageValues |
There was a problem hiding this comment.
Duplicate import from the same module
ChatCompletionToolParam and AllMessageValues are both imported from litellm.types.llms.openai in two separate statements. Consolidate them into one import per the project's style conventions.
| from litellm.types.llms.openai import ChatCompletionToolParam | |
| from litellm.secret_managers.main import get_secret_str | |
| from litellm.types.llms.openai import AllMessageValues | |
| from litellm.types.llms.openai import AllMessageValues, ChatCompletionToolParam |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
* Add PromptGuard guardrail integration
Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.
Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries
Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap
Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
request payload construction, error handling, config model, and
registry wiring
* Fix redact path and init ordering per review feedback
- P1: Update structured_messages (not just texts) when PromptGuard
returns a redact decision, so PII redaction is effective for the
primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction
* Harden PromptGuard integration: fail-open, event hooks, images, docs
- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality
* Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat
* Address remaining Greptile feedback: timeout, redact guard
- Add explicit 10s timeout to async_handler.post() to prevent
indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
texts key (41 tests total)
* Fix CI lint: black formatting, add PromptGuardConfigModel to LitellmParams
- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)
* Address Greptile review: redact path, null decision, error context
- P1: Filter _extract_texts_from_messages to user-role messages only,
preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow
* Fix black formatting: collapse f-string in error message
merge main
| if decision == "redact": | ||
| redacted = result.get("redacted_messages") | ||
| if redacted: | ||
| if structured_messages: | ||
| inputs["structured_messages"] = redacted | ||
| if "texts" in inputs: | ||
| extracted = self._extract_texts_from_messages( | ||
| redacted, | ||
| ) | ||
| if extracted: | ||
| inputs["texts"] = extracted | ||
|
|
||
| return inputs |
There was a problem hiding this comment.
Silent redaction bypass when
redacted_messages is absent
When PromptGuard returns decision: "redact" but redacted_messages is null or missing, the if redacted: guard silently falls through and the original unredacted inputs are returned unchanged — the PII/sensitive content passes to the LLM with no log warning and no error. This undermines the security guarantee of the redact path regardless of block_on_error.
| if decision == "redact": | |
| redacted = result.get("redacted_messages") | |
| if redacted: | |
| if structured_messages: | |
| inputs["structured_messages"] = redacted | |
| if "texts" in inputs: | |
| extracted = self._extract_texts_from_messages( | |
| redacted, | |
| ) | |
| if extracted: | |
| inputs["texts"] = extracted | |
| return inputs | |
| if decision == "redact": | |
| redacted = result.get("redacted_messages") | |
| if redacted: | |
| if structured_messages: | |
| inputs["structured_messages"] = redacted | |
| if "texts" in inputs: | |
| extracted = self._extract_texts_from_messages( | |
| redacted, | |
| ) | |
| if extracted: | |
| inputs["texts"] = extracted | |
| else: | |
| verbose_proxy_logger.warning( | |
| "PromptGuard returned decision='redact' but no " | |
| "redacted_messages; original content will be used." | |
| ) |
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes