Add PromptGuard guardrail integration#24268
Add PromptGuard guardrail integration#24268krrish-berri-2 merged 8 commits intoBerriAI:litellm_oss_staging_04_08_2026from
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@greptileai review |
Greptile SummaryThis PR integrates PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, implementing the All issues surfaced in the prior review round (credentials-before-client ordering, Key changes:
Confidence Score: 4/5Safe to merge — all prior P0/P1 issues have been resolved and the implementation follows existing guardrail patterns. The integration is well-structured with proper error handling, timeouts, and credential validation. All mocked tests pass, CI checks are green per the description, and the UI wiring is correct. Score is 4 rather than 5 because the block_on_error field is now exposed on the shared LitellmParams model for all guardrail types (a minor naming-convention inconsistency with the existing fail_on_error field for Model Armor), and the logo lookup for PromptGuard falls back to a blank name/logo on first render before populateGuardrailProviders is called — though both are consistent with the existing codebase patterns. No files require special attention — all key implementation files have been reviewed and look correct.
|
| Filename | Overview |
|---|---|
| litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py | Core guardrail implementation: credentials validated first, timeout set, Dict[str,Any] typed, error wrapped in GuardrailRaisedException, null-decision handled via or, and redact path correctly guards the texts key injection — all prior review issues resolved. |
| litellm/types/guardrails.py | PROMPTGUARD enum added to SupportedGuardrailIntegrations; PromptGuardConfigModel correctly mixed into LitellmParams via multiple inheritance. |
| tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py | 40 unit tests across 8 classes; all HTTP calls mocked (no real network calls); strict equality assertions used for redact paths; null-decision case covered. |
| ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx | Static Promptguard entry added to guardrail_provider_map; PromptGuard key in guardrailLogoMap matches ui_friendly_name() return value — logo resolves correctly after populateGuardrailProviders is called. |
Sequence Diagram
sequenceDiagram
participant Client
participant LiteLLMProxy
participant PromptGuardGuardrail
participant PromptGuardAPI
participant LLMProvider
Client->>LiteLLMProxy: POST /v1/chat/completions
LiteLLMProxy->>PromptGuardGuardrail: apply_guardrail(inputs, "request")
PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard {messages, direction:"input"}
alt decision: allow
PromptGuardAPI-->>PromptGuardGuardrail: {decision:"allow"}
PromptGuardGuardrail-->>LiteLLMProxy: inputs unchanged
LiteLLMProxy->>LLMProvider: forward request
LLMProvider-->>LiteLLMProxy: LLM response
LiteLLMProxy->>PromptGuardGuardrail: apply_guardrail(outputs, "response")
PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard {messages, direction:"output"}
PromptGuardAPI-->>PromptGuardGuardrail: {decision:"allow"}
PromptGuardGuardrail-->>LiteLLMProxy: outputs unchanged
LiteLLMProxy-->>Client: 200 OK
else decision: block
PromptGuardAPI-->>PromptGuardGuardrail: {decision:"block", threat_type, confidence}
PromptGuardGuardrail-->>LiteLLMProxy: raise GuardrailRaisedException
LiteLLMProxy-->>Client: 400 Blocked
else decision: redact
PromptGuardAPI-->>PromptGuardGuardrail: {decision:"redact", redacted_messages}
PromptGuardGuardrail-->>LiteLLMProxy: inputs with redacted texts/structured_messages
LiteLLMProxy->>LLMProvider: forward redacted request
LLMProvider-->>LiteLLMProxy: LLM response
LiteLLMProxy-->>Client: 200 OK (PII redacted)
else API error + block_on_error=true
PromptGuardAPI-->>PromptGuardGuardrail: timeout / 5xx
PromptGuardGuardrail-->>LiteLLMProxy: raise GuardrailRaisedException (fail-closed)
LiteLLMProxy-->>Client: 400 Error
end
Reviews (8): Last reviewed commit: "Fix black formatting: collapse f-string ..." | Re-trigger Greptile
|
All @greptileai review comments have been addressed: @ishaan-jaff - would appreciate your review when you get a chance. This adds PromptGuard as a first-class guardrail provider (backend hook, UI garden card, docs page, 41 tests). Happy to address any further feedback. |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, supporting prompt injection detection, PII redaction, topic filtering, entity blocklists, and hallucination detection via PromptGuard's /api/v1/guard API endpoint. Backend: - Add PROMPTGUARD to SupportedGuardrailIntegrations enum - Implement PromptGuardGuardrail (CustomGuardrail subclass) with apply_guardrail handling allow/block/redact decisions - Add Pydantic config model with api_key, api_base, ui_friendly_name - Auto-discovered via guardrail_hooks/promptguard/__init__.py registries Frontend: - Add PromptGuard partner card to Guardrail Garden with eval scores - Add preset configuration for quick setup - Add logo to guardrailLogoMap Tests: - 30 unit tests covering configuration, allow/block/redact actions, request payload construction, error handling, config model, and registry wiring
- P1: Update structured_messages (not just texts) when PromptGuard returns a redact decision, so PII redaction is effective for the primary LLM message path - P2: Validate credentials before allocating the HTTPX client so resources aren't acquired if PromptGuardMissingCredentials is raised - Add tests for structured_messages redaction and texts-only redaction
- Add block_on_error config (default fail-closed, configurable fail-open) - Declare supported_event_hooks (pre_call, post_call) like other vendors - Forward images from GenericGuardrailAPIInputs to PromptGuard API - Wrap API call in try/except for resilient error handling - Add comprehensive documentation page with config examples - Register docs page in sidebar alongside other guardrail providers - Expand test suite from 32 to 40 tests covering new functionality
- Add explicit 10s timeout to async_handler.post() to prevent indefinite hangs when PromptGuard API is unresponsive - Guard redact path: only update inputs["texts"] when the key was originally present, avoiding phantom key injection - Add test: redact with structured_messages only does not create texts key (41 tests total)
…arams - Reformat promptguard.py to match CI black version (parenthesization) - Add PromptGuardConfigModel as base class of LitellmParams for proper Pydantic schema validation, consistent with all other guardrail vendors - Use litellm_params.block_on_error directly (now a typed field)
85c39e5 to
20a5750
Compare
| if decision == "redact": | ||
| redacted = result.get("redacted_messages") | ||
| if redacted: | ||
| if structured_messages: | ||
| inputs["structured_messages"] = redacted | ||
| if "texts" in inputs: | ||
| extracted = self._extract_texts_from_messages( | ||
| redacted, | ||
| ) | ||
| if extracted: | ||
| inputs["texts"] = extracted |
There was a problem hiding this comment.
_extract_texts_from_messages injects non-user content into texts when both inputs co-exist
When inputs contains both structured_messages and texts, the PromptGuard API is sent the structured_messages (which may include system + user messages). The redacted_messages in the response will therefore also contain all roles. _extract_texts_from_messages extracts text from every message regardless of role, so the result will include system message content that was never in the original texts.
Example: original texts = ["My SSN is 123-45-6789"] (1 user-only item) + structured_messages = [system_msg, user_msg]. After redaction, _extract_texts_from_messages(redacted) returns ["Be helpful.", "My SSN is *********"] (2 items — system content injected). The downstream caller's inputs["texts"] grows from 1 item to 2 items with injected system content.
The fix is to only extract texts from user messages:
if "texts" in inputs:
extracted = [
t for msg in redacted
if msg.get("role") == "user"
for t in (
[msg["content"]] if isinstance(msg.get("content"), str)
else [item["text"] for item in msg.get("content", []) if isinstance(item, dict) and item.get("type") == "text"]
)
]
if extracted:
inputs["texts"] = extracted| except Exception as exc: | ||
| verbose_proxy_logger.error("PromptGuard API error: %s", str(exc)) | ||
| if self.block_on_error: | ||
| raise | ||
| return inputs |
There was a problem hiding this comment.
Re-raising bare exception loses PromptGuard context
When block_on_error=True the raw httpx.HTTPStatusError or httpx.ConnectError is re-raised with raise. The caller receives a low-level transport exception rather than something that clearly identifies the guardrail as the failure source. Consider wrapping before re-raising:
except Exception as exc:
verbose_proxy_logger.error("PromptGuard API error: %s", str(exc))
if self.block_on_error:
raise GuardrailRaisedException(
guardrail_name=self.guardrail_name,
message=f"PromptGuard API unreachable (block_on_error=True): {exc}",
) from exc
return inputs- P1: Filter _extract_texts_from_messages to user-role messages only,
preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow
|
The only failing check is
All PromptGuard-specific checks pass:
Also addressed all Greptile review comments in
|
|
All Greptile review comments from the latest round addressed in ceb5b5d:
|
|
Updated PR body to follow the repository's PR template format. CI status clarification: All 5 failing checks ( @ishaan-jaff — would appreciate your review when you get a chance. |
c688d9d
into
BerriAI:litellm_oss_staging_04_08_2026
Relevant issues
Fixes #24272
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link: https://github.com/BerriAI/litellm/actions/runs/23793434199
CI run for the last commit
Link: https://github.com/BerriAI/litellm/actions/runs/23793434199
Merge / cherry-pick CI run
Links: (pending merge)
CI summary: 53 pass / 5 fail. All 5 failing checks (
lint,proxy-infra,Analyze (actions),Analyze (javascript-typescript),zizmor) are pre-existing failures onmain— they fail on every recent main branch run. The PromptGuard-specific guardrails test (test (proxy-guardrails)) passes.build-uipasses. CLA signed.Type
🆕 New Feature
📖 Documentation
✅ Test
Changes
Summary
Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, appearing alongside existing partners in the Guardrail Garden UI.
PromptGuard is an AI security gateway that provides:
Architecture
What's included
Backend (Python):
PROMPTGUARDadded toSupportedGuardrailIntegrationsenumPromptGuardGuardrail—CustomGuardrailsubclass implementingapply_guardrailviaPOST /api/v1/guarddecision: "allow"→ pass through unchangeddecision: "block"→ raiseGuardrailRaisedExceptionwith threat detailsdecision: "redact"→ return modified inputs with redacted content (updates bothtextsandstructured_messages)block_on_error(fail-closed by default, fail-open optional)supported_event_hooksdeclaration (pre_call,post_call)GenericGuardrailAPIInputs.imagesapi_key,api_base,block_on_error,ui_friendly_name()guardrail_hooks/promptguard/__init__.pyregistries (zero manual wiring)Frontend (TypeScript):
guardrailLogoMapDocumentation:
docs/proxy/guardrails/promptguard.mdTests:
Files changed
litellm/types/guardrails.pylitellm/types/proxy/guardrails/guardrail_hooks/promptguard.pylitellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.pylitellm/proxy/guardrails/guardrail_hooks/promptguard/__init__.pytests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.pyui/litellm-dashboard/public/assets/logos/promptguard.svgui/litellm-dashboard/src/components/guardrails/guardrail_garden_data.tsui/litellm-dashboard/src/components/guardrails/guardrail_garden_configs.tsui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsxdocs/my-website/docs/proxy/guardrails/promptguard.mddocs/my-website/sidebars.jsTest plan
poetry run black --checkpasses on all new/modified PromptGuard Python filespoetry run ruff checkpasses on all new/modified Python filespoetry run mypy --ignore-missing-importspasses (0 issues)check-circular-importspassescheck-import-safetypasses (from litellm import *succeeds)pytest tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py)npm run buildsucceeds (UI compiles with no errors)npm run testpasses (373 test files, 3626 tests)Greptile review
All review comments addressed across 3 review rounds:
_extract_texts_from_messagesfilters torole == "user"only — prevents system content injection intotextsinto strict==equalityresult.get("decision") or "allow"handles explicit null decisionGuardrailRaisedExceptionfor guardrail contextPromptguardentry inguardrail_provider_mapfor pre-API-load fallbackasync_handler.post()inputs["texts"]when key was originally presentConfidence score: 4/5
AI-assisted PR (Claude). All tests verified locally. Author understands the code.