Add PromptGuard guardrail integration by acebot712 · Pull Request #24268 · BerriAI/litellm

acebot712 · 2026-03-21T05:09:50Z

Relevant issues

Fixes #24272

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link: https://github.com/BerriAI/litellm/actions/runs/23793434199
CI run for the last commit
Link: https://github.com/BerriAI/litellm/actions/runs/23793434199
Merge / cherry-pick CI run
Links: (pending merge)

CI summary: 53 pass / 5 fail. All 5 failing checks (lint, proxy-infra, Analyze (actions), Analyze (javascript-typescript), zizmor) are pre-existing failures on main — they fail on every recent main branch run. The PromptGuard-specific guardrails test (test (proxy-guardrails)) passes. build-ui passes. CLA signed.

Type

🆕 New Feature
📖 Documentation
✅ Test

Changes

Summary

Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, appearing alongside existing partners in the Guardrail Garden UI.

PromptGuard is an AI security gateway that provides:

Prompt injection detection with 94.9% F1 score (100% precision, 90.4% recall on 5,384 test cases)
PII detection & redaction with configurable entity types
Topic filtering and entity blocklists
Hallucination detection
Self-hostable with drop-in proxy integration

Architecture

App → LiteLLM Proxy → PromptGuard API (/api/v1/guard) → decision: allow/block/redact
                    → LLM Provider (if allowed/redacted)

What's included

Backend (Python):

PROMPTGUARD added to SupportedGuardrailIntegrations enum
PromptGuardGuardrail — CustomGuardrail subclass implementing apply_guardrail via POST /api/v1/guard
- decision: "allow" → pass through unchanged
- decision: "block" → raise GuardrailRaisedException with threat details
- decision: "redact" → return modified inputs with redacted content (updates both texts and structured_messages)
Configurable block_on_error (fail-closed by default, fail-open optional)
Explicit supported_event_hooks declaration (pre_call, post_call)
Image passthrough via GenericGuardrailAPIInputs.images
Pydantic config model with api_key, api_base, block_on_error, ui_friendly_name()
Auto-discovered via guardrail_hooks/promptguard/__init__.py registries (zero manual wiring)

Frontend (TypeScript):

Partner card in Guardrail Garden with eval scores
Preset configuration for quick setup
Logo in guardrailLogoMap

Documentation:

Full docs page at docs/proxy/guardrails/promptguard.md
Added to sidebar navigation

Tests:

40 unit tests across 8 test classes covering configuration, allow/block/redact decisions, fail-open resilience, image passthrough, request payload construction, error handling, config model, and registry wiring
All tests use mocked HTTP responses (no real API calls)

Files changed

File	Type
`litellm/types/guardrails.py`	Modified — enum entry
`litellm/types/proxy/guardrails/guardrail_hooks/promptguard.py`	New — config model
`litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py`	New — guardrail hook
`litellm/proxy/guardrails/guardrail_hooks/promptguard/__init__.py`	New — registry
`tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py`	New — 40 tests
`ui/litellm-dashboard/public/assets/logos/promptguard.svg`	New — logo
`ui/litellm-dashboard/src/components/guardrails/guardrail_garden_data.ts`	Modified — partner card
`ui/litellm-dashboard/src/components/guardrails/guardrail_garden_configs.ts`	Modified — preset
`ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx`	Modified — logo map
`docs/my-website/docs/proxy/guardrails/promptguard.md`	New — documentation
`docs/my-website/sidebars.js`	Modified — sidebar entry

Test plan

poetry run black --check passes on all new/modified PromptGuard Python files
poetry run ruff check passes on all new/modified Python files
poetry run mypy --ignore-missing-imports passes (0 issues)
check-circular-imports passes
check-import-safety passes (from litellm import * succeeds)
40/40 unit tests pass (pytest tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py)
npm run build succeeds (UI compiles with no errors)
npm run test passes (373 test files, 3626 tests)
CLA signed

Greptile review

All review comments addressed across 3 review rounds:

P1: _extract_texts_from_messages filters to role == "user" only — prevents system content injection into texts
P1: Strengthened test assertion from in to strict == equality
P2: result.get("decision") or "allow" handles explicit null decision
P2: Wrapped bare exception re-raise in GuardrailRaisedException for guardrail context
P2: Added static Promptguard entry in guardrail_provider_map for pre-API-load fallback
P2: Added explicit 10s timeout to async_handler.post()
P2: Guard redact path: only update inputs["texts"] when key was originally present

Confidence score: 4/5

AI-assisted PR (Claude). All tests verified locally. Author understands the code.

vercel · 2026-03-21T05:09:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 31, 2026 10:49am

acebot712 · 2026-03-21T05:09:57Z

@greptileai review

codspeed-hq · 2026-03-21T05:12:07Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing acebot712:add-promptguard-guardrail (109e0a6) with main (08be1e5)}

greptile-apps · 2026-03-21T05:12:34Z

Greptile Summary

This PR integrates PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, implementing the allow/block/redact decision flow via POST /api/v1/guard, alongside a Guardrail Garden UI card with eval metrics, a Pydantic config model, auto-discovery registries, and 40 mocked unit tests.

All issues surfaced in the prior review round (credentials-before-client ordering, Dict[str,Any] Python 3.8 compatibility, explicit HTTP timeout, texts key injection on structured_messages-only inputs, null-decision handling via or, and error wrapping in GuardrailRaisedException) have been addressed. The code follows established patterns in the codebase.

Key changes:

PROMPTGUARD enum entry added to SupportedGuardrailIntegrations
PromptGuardGuardrail (CustomGuardrail subclass) with validated credential order, 10 s timeout, and fail-closed/fail-open error handling
_extract_texts_from_messages correctly filters to user-role only, preventing system-message injection into inputs["texts"]
Static Promptguard: "promptguard" entry in guardrail_provider_map and PromptGuard key in guardrailLogoMap ensure logo resolves correctly once populateGuardrailProviders has been called
All tests are fully mocked — no real network calls are made

Confidence Score: 4/5

Safe to merge — all prior P0/P1 issues have been resolved and the implementation follows existing guardrail patterns.

The integration is well-structured with proper error handling, timeouts, and credential validation. All mocked tests pass, CI checks are green per the description, and the UI wiring is correct. Score is 4 rather than 5 because the block_on_error field is now exposed on the shared LitellmParams model for all guardrail types (a minor naming-convention inconsistency with the existing fail_on_error field for Model Armor), and the logo lookup for PromptGuard falls back to a blank name/logo on first render before populateGuardrailProviders is called — though both are consistent with the existing codebase patterns.

No files require special attention — all key implementation files have been reviewed and look correct.

Important Files Changed

Filename	Overview
litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py	Core guardrail implementation: credentials validated first, timeout set, Dict[str,Any] typed, error wrapped in GuardrailRaisedException, null-decision handled via `or`, and redact path correctly guards the texts key injection — all prior review issues resolved.
litellm/types/guardrails.py	PROMPTGUARD enum added to SupportedGuardrailIntegrations; PromptGuardConfigModel correctly mixed into LitellmParams via multiple inheritance.
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py	40 unit tests across 8 classes; all HTTP calls mocked (no real network calls); strict equality assertions used for redact paths; null-decision case covered.
ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx	Static Promptguard entry added to guardrail_provider_map; PromptGuard key in guardrailLogoMap matches ui_friendly_name() return value — logo resolves correctly after populateGuardrailProviders is called.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLMProxy
    participant PromptGuardGuardrail
    participant PromptGuardAPI
    participant LLMProvider

    Client->>LiteLLMProxy: POST /v1/chat/completions
    LiteLLMProxy->>PromptGuardGuardrail: apply_guardrail(inputs, "request")
    PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard {messages, direction:"input"}
    alt decision: allow
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"allow"}
        PromptGuardGuardrail-->>LiteLLMProxy: inputs unchanged
        LiteLLMProxy->>LLMProvider: forward request
        LLMProvider-->>LiteLLMProxy: LLM response
        LiteLLMProxy->>PromptGuardGuardrail: apply_guardrail(outputs, "response")
        PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard {messages, direction:"output"}
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"allow"}
        PromptGuardGuardrail-->>LiteLLMProxy: outputs unchanged
        LiteLLMProxy-->>Client: 200 OK
    else decision: block
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"block", threat_type, confidence}
        PromptGuardGuardrail-->>LiteLLMProxy: raise GuardrailRaisedException
        LiteLLMProxy-->>Client: 400 Blocked
    else decision: redact
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"redact", redacted_messages}
        PromptGuardGuardrail-->>LiteLLMProxy: inputs with redacted texts/structured_messages
        LiteLLMProxy->>LLMProvider: forward redacted request
        LLMProvider-->>LiteLLMProxy: LLM response
        LiteLLMProxy-->>Client: 200 OK (PII redacted)
    else API error + block_on_error=true
        PromptGuardAPI-->>PromptGuardGuardrail: timeout / 5xx
        PromptGuardGuardrail-->>LiteLLMProxy: raise GuardrailRaisedException (fail-closed)
        LiteLLMProxy-->>Client: 400 Error
    end

_{Reviews (8): Last reviewed commit: "Fix black formatting: collapse f-string ..." | Re-trigger Greptile}

acebot712 · 2026-03-21T05:56:10Z

All @greptileai review comments have been addressed:

@ishaan-jaff - would appreciate your review when you get a chance. This adds PromptGuard as a first-class guardrail provider (backend hook, UI garden card, docs page, 41 tests). Happy to address any further feedback.

gitguardian · 2026-03-31T07:17:08Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, supporting prompt injection detection, PII redaction, topic filtering, entity blocklists, and hallucination detection via PromptGuard's /api/v1/guard API endpoint. Backend: - Add PROMPTGUARD to SupportedGuardrailIntegrations enum - Implement PromptGuardGuardrail (CustomGuardrail subclass) with apply_guardrail handling allow/block/redact decisions - Add Pydantic config model with api_key, api_base, ui_friendly_name - Auto-discovered via guardrail_hooks/promptguard/__init__.py registries Frontend: - Add PromptGuard partner card to Guardrail Garden with eval scores - Add preset configuration for quick setup - Add logo to guardrailLogoMap Tests: - 30 unit tests covering configuration, allow/block/redact actions, request payload construction, error handling, config model, and registry wiring

- P1: Update structured_messages (not just texts) when PromptGuard returns a redact decision, so PII redaction is effective for the primary LLM message path - P2: Validate credentials before allocating the HTTPX client so resources aren't acquired if PromptGuardMissingCredentials is raised - Add tests for structured_messages redaction and texts-only redaction

- Add block_on_error config (default fail-closed, configurable fail-open) - Declare supported_event_hooks (pre_call, post_call) like other vendors - Forward images from GenericGuardrailAPIInputs to PromptGuard API - Wrap API call in try/except for resilient error handling - Add comprehensive documentation page with config examples - Register docs page in sidebar alongside other guardrail providers - Expand test suite from 32 to 40 tests covering new functionality

- Add explicit 10s timeout to async_handler.post() to prevent indefinite hangs when PromptGuard API is unresponsive - Guard redact path: only update inputs["texts"] when the key was originally present, avoiding phantom key injection - Add test: redact with structured_messages only does not create texts key (41 tests total)

…arams - Reformat promptguard.py to match CI black version (parenthesization) - Add PromptGuardConfigModel as base class of LitellmParams for proper Pydantic schema validation, consistent with all other guardrail vendors - Use litellm_params.block_on_error directly (now a typed field)

greptile-apps · 2026-03-31T07:32:20Z

+        if decision == "redact":
+            redacted = result.get("redacted_messages")
+            if redacted:
+                if structured_messages:
+                    inputs["structured_messages"] = redacted
+                if "texts" in inputs:
+                    extracted = self._extract_texts_from_messages(
+                        redacted,
+                    )
+                    if extracted:
+                        inputs["texts"] = extracted


_extract_texts_from_messages injects non-user content into texts when both inputs co-exist

When inputs contains both structured_messages and texts, the PromptGuard API is sent the structured_messages (which may include system + user messages). The redacted_messages in the response will therefore also contain all roles. _extract_texts_from_messages extracts text from every message regardless of role, so the result will include system message content that was never in the original texts.

Example: original texts = ["My SSN is 123-45-6789"] (1 user-only item) + structured_messages = [system_msg, user_msg]. After redaction, _extract_texts_from_messages(redacted) returns ["Be helpful.", "My SSN is *********"] (2 items — system content injected). The downstream caller's inputs["texts"] grows from 1 item to 2 items with injected system content.

The fix is to only extract texts from user messages:

if "texts" in inputs: extracted = [ t for msg in redacted if msg.get("role") == "user" for t in ( [msg["content"]] if isinstance(msg.get("content"), str) else [item["text"] for item in msg.get("content", []) if isinstance(item, dict) and item.get("type") == "text"] ) ] if extracted: inputs["texts"] = extracted

greptile-apps · 2026-03-31T07:32:21Z

+        except Exception as exc:
+            verbose_proxy_logger.error("PromptGuard API error: %s", str(exc))
+            if self.block_on_error:
+                raise
+            return inputs


Re-raising bare exception loses PromptGuard context

When block_on_error=True the raw httpx.HTTPStatusError or httpx.ConnectError is re-raised with raise. The caller receives a low-level transport exception rather than something that clearly identifies the guardrail as the failure source. Consider wrapping before re-raising:

except Exception as exc: verbose_proxy_logger.error("PromptGuard API error: %s", str(exc)) if self.block_on_error: raise GuardrailRaisedException( guardrail_name=self.guardrail_name, message=f"PromptGuard API unreachable (block_on_error=True): {exc}", ) from exc return inputs

- P1: Filter _extract_texts_from_messages to user-role messages only, preventing system/assistant content from being injected into texts - P1: Strengthen test_redact_updates_structured_messages assertion from weak `in` check to strict equality, catching the injection bug - P2: Use `result.get("decision") or "allow"` to handle explicit null decision values (not just absent keys) - P2: Wrap bare exception re-raise in GuardrailRaisedException so the caller knows which guardrail failed (block_on_error=True path) - P2: Add static Promptguard entry in guardrail_provider_map so the preset works before populateGuardrailProviderMap is called - Add test for explicit null decision treated as allow

acebot712 · 2026-03-31T10:42:14Z

The only failing check is proxy-infra / Run tests — specifically test_check_migration_out_of_sync. This is a pre-existing upstream issue unrelated to this PR:

The test fails on every recent main branch run (92f9ad83, 13c37f35, 57e37f55 — all failure)
It also fails on every other open PR checked today (15/15 runs across all branches show failure)

All PromptGuard-specific checks pass:

test (proxy-guardrails): ✅
All 30+ other checks: ✅

Also addressed all Greptile review comments in ceb5b5d:

P1: _extract_texts_from_messages now filters to role == "user" only — prevents system content injection into texts
P1: Strengthened test assertion from in to strict == equality
P2: result.get("decision") or "allow" handles explicit null decision
P2: Wrapped bare exception re-raise in GuardrailRaisedException for guardrail context
P2: Added static Promptguard entry in guardrail_provider_map for pre-API-load fallback
Added test for explicit null decision treated as allow

acebot712 · 2026-03-31T10:47:29Z

All Greptile review comments from the latest round addressed in ceb5b5d:

P1 _extract_texts_from_messages injects non-user content: Fixed — filter to role == "user" messages only, so system/assistant content is never injected into texts.
P1 Weak in assertion: Fixed — changed to strict == equality so the test catches any injected system message content.
P2 dict.get() default not applied for null decision: Fixed — changed to result.get("decision") or "allow" plus new test_null_decision_treated_as_allow test.
P2 Re-raising bare exception loses context: Fixed — wrapped in GuardrailRaisedException with from exc chaining, updated both error-handling tests.
P2 Preset provider relies on dynamic map: Fixed — added static Promptguard: "promptguard" entry in guardrail_provider_map.
Black formatting: Fixed — reformatted to pass black --check.

acebot712 · 2026-04-06T06:43:32Z

Updated PR body to follow the repository's PR template format.

CI status clarification: All 5 failing checks (lint, proxy-infra, Analyze (actions), Analyze (javascript-typescript), zizmor) are pre-existing failures on main — the LiteLLM Linting workflow has failed on every recent main commit. All PromptGuard-specific files pass black --check. The guardrails test suite (test (proxy-guardrails)) passes. 53/58 checks green.

@ishaan-jaff — would appreciate your review when you get a chance.

vercel bot deployed to Preview March 21, 2026 05:11 View deployment

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated

vercel bot deployed to Preview March 21, 2026 05:32 View deployment

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated

vercel bot deployed to Preview March 21, 2026 05:44 View deployment

acebot712 mentioned this pull request Mar 21, 2026

[Feature]: Add PromptGuard as a first-class guardrail provider #24272

Closed

1 task

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py

Comment thread ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx

vercel bot deployed to Preview March 21, 2026 05:56 View deployment

vercel bot deployed to Preview March 21, 2026 06:03 View deployment

vercel bot deployed to Preview March 31, 2026 07:18 View deployment

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated

Comment thread ui/litellm-dashboard/src/components/guardrails/guardrail_garden_configs.ts

acebot712 added 6 commits March 31, 2026 12:54

Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat

1960392

acebot712 force-pushed the add-promptguard-guardrail branch from 85c39e5 to 20a5750 Compare March 31, 2026 07:24

vercel bot deployed to Preview March 31, 2026 07:26 View deployment

greptile-apps bot reviewed Mar 31, 2026

View reviewed changes

vercel bot deployed to Preview March 31, 2026 08:33 View deployment

Fix black formatting: collapse f-string in error message

109e0a6

vercel bot deployed to Preview March 31, 2026 10:49 View deployment

krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_08_2026 April 9, 2026 15:12

krrish-berri-2 merged commit c688d9d into BerriAI:litellm_oss_staging_04_08_2026 Apr 9, 2026
53 of 59 checks passed

acebot712 deleted the add-promptguard-guardrail branch April 9, 2026 20:18

Uh oh!

Conversation

acebot712 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Summary

Architecture

What's included

Files changed

Test plan

Greptile review

Uh oh!

vercel bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acebot712 commented Mar 21, 2026

Uh oh!

codspeed-hq bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

acebot712 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitguardian bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

️✅ There are no secrets present in this pull request anymore.

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

acebot712 commented Mar 31, 2026

Uh oh!

acebot712 commented Mar 31, 2026

Uh oh!

acebot712 commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acebot712 commented Mar 21, 2026 •

edited

Loading

vercel bot commented Mar 21, 2026 •

edited

Loading

codspeed-hq bot commented Mar 21, 2026 •

edited

Loading

greptile-apps bot commented Mar 21, 2026 •

edited

Loading

acebot712 commented Mar 21, 2026 •

edited

Loading

gitguardian bot commented Mar 31, 2026 •

edited

Loading