Skip to content

Add PromptGuard guardrail integration#24268

Merged
krrish-berri-2 merged 8 commits intoBerriAI:litellm_oss_staging_04_08_2026from
acebot712:add-promptguard-guardrail
Apr 9, 2026
Merged

Add PromptGuard guardrail integration#24268
krrish-berri-2 merged 8 commits intoBerriAI:litellm_oss_staging_04_08_2026from
acebot712:add-promptguard-guardrail

Conversation

@acebot712
Copy link
Copy Markdown
Contributor

@acebot712 acebot712 commented Mar 21, 2026

Relevant issues

Fixes #24272

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.

CI summary: 53 pass / 5 fail. All 5 failing checks (lint, proxy-infra, Analyze (actions), Analyze (javascript-typescript), zizmor) are pre-existing failures on main — they fail on every recent main branch run. The PromptGuard-specific guardrails test (test (proxy-guardrails)) passes. build-ui passes. CLA signed.

Type

🆕 New Feature
📖 Documentation
✅ Test

Changes

Summary

Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, appearing alongside existing partners in the Guardrail Garden UI.

PromptGuard is an AI security gateway that provides:

  • Prompt injection detection with 94.9% F1 score (100% precision, 90.4% recall on 5,384 test cases)
  • PII detection & redaction with configurable entity types
  • Topic filtering and entity blocklists
  • Hallucination detection
  • Self-hostable with drop-in proxy integration

Architecture

App → LiteLLM Proxy → PromptGuard API (/api/v1/guard) → decision: allow/block/redact
                    → LLM Provider (if allowed/redacted)

What's included

Backend (Python):

  • PROMPTGUARD added to SupportedGuardrailIntegrations enum
  • PromptGuardGuardrailCustomGuardrail subclass implementing apply_guardrail via POST /api/v1/guard
    • decision: "allow" → pass through unchanged
    • decision: "block" → raise GuardrailRaisedException with threat details
    • decision: "redact" → return modified inputs with redacted content (updates both texts and structured_messages)
  • Configurable block_on_error (fail-closed by default, fail-open optional)
  • Explicit supported_event_hooks declaration (pre_call, post_call)
  • Image passthrough via GenericGuardrailAPIInputs.images
  • Pydantic config model with api_key, api_base, block_on_error, ui_friendly_name()
  • Auto-discovered via guardrail_hooks/promptguard/__init__.py registries (zero manual wiring)

Frontend (TypeScript):

  • Partner card in Guardrail Garden with eval scores
  • Preset configuration for quick setup
  • Logo in guardrailLogoMap

Documentation:

  • Full docs page at docs/proxy/guardrails/promptguard.md
  • Added to sidebar navigation

Tests:

  • 40 unit tests across 8 test classes covering configuration, allow/block/redact decisions, fail-open resilience, image passthrough, request payload construction, error handling, config model, and registry wiring
  • All tests use mocked HTTP responses (no real API calls)

Files changed

File Type
litellm/types/guardrails.py Modified — enum entry
litellm/types/proxy/guardrails/guardrail_hooks/promptguard.py New — config model
litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py New — guardrail hook
litellm/proxy/guardrails/guardrail_hooks/promptguard/__init__.py New — registry
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py New — 40 tests
ui/litellm-dashboard/public/assets/logos/promptguard.svg New — logo
ui/litellm-dashboard/src/components/guardrails/guardrail_garden_data.ts Modified — partner card
ui/litellm-dashboard/src/components/guardrails/guardrail_garden_configs.ts Modified — preset
ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx Modified — logo map
docs/my-website/docs/proxy/guardrails/promptguard.md New — documentation
docs/my-website/sidebars.js Modified — sidebar entry

Test plan

  • poetry run black --check passes on all new/modified PromptGuard Python files
  • poetry run ruff check passes on all new/modified Python files
  • poetry run mypy --ignore-missing-imports passes (0 issues)
  • check-circular-imports passes
  • check-import-safety passes (from litellm import * succeeds)
  • 40/40 unit tests pass (pytest tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py)
  • npm run build succeeds (UI compiles with no errors)
  • npm run test passes (373 test files, 3626 tests)
  • CLA signed

Greptile review

All review comments addressed across 3 review rounds:

  • P1: _extract_texts_from_messages filters to role == "user" only — prevents system content injection into texts
  • P1: Strengthened test assertion from in to strict == equality
  • P2: result.get("decision") or "allow" handles explicit null decision
  • P2: Wrapped bare exception re-raise in GuardrailRaisedException for guardrail context
  • P2: Added static Promptguard entry in guardrail_provider_map for pre-API-load fallback
  • P2: Added explicit 10s timeout to async_handler.post()
  • P2: Guard redact path: only update inputs["texts"] when key was originally present

Confidence score: 4/5


AI-assisted PR (Claude). All tests verified locally. Author understands the code.

@vercel
Copy link
Copy Markdown

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 31, 2026 10:49am

Request Review

@acebot712
Copy link
Copy Markdown
Contributor Author

@greptileai review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing acebot712:add-promptguard-guardrail (109e0a6) with main (08be1e5)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR integrates PromptGuard as a first-class guardrail vendor in LiteLLM's proxy, implementing the allow/block/redact decision flow via POST /api/v1/guard, alongside a Guardrail Garden UI card with eval metrics, a Pydantic config model, auto-discovery registries, and 40 mocked unit tests.

All issues surfaced in the prior review round (credentials-before-client ordering, Dict[str,Any] Python 3.8 compatibility, explicit HTTP timeout, texts key injection on structured_messages-only inputs, null-decision handling via or, and error wrapping in GuardrailRaisedException) have been addressed. The code follows established patterns in the codebase.

Key changes:

  • PROMPTGUARD enum entry added to SupportedGuardrailIntegrations
  • PromptGuardGuardrail (CustomGuardrail subclass) with validated credential order, 10 s timeout, and fail-closed/fail-open error handling
  • _extract_texts_from_messages correctly filters to user-role only, preventing system-message injection into inputs["texts"]
  • Static Promptguard: "promptguard" entry in guardrail_provider_map and PromptGuard key in guardrailLogoMap ensure logo resolves correctly once populateGuardrailProviders has been called
  • All tests are fully mocked — no real network calls are made

Confidence Score: 4/5

Safe to merge — all prior P0/P1 issues have been resolved and the implementation follows existing guardrail patterns.

The integration is well-structured with proper error handling, timeouts, and credential validation. All mocked tests pass, CI checks are green per the description, and the UI wiring is correct. Score is 4 rather than 5 because the block_on_error field is now exposed on the shared LitellmParams model for all guardrail types (a minor naming-convention inconsistency with the existing fail_on_error field for Model Armor), and the logo lookup for PromptGuard falls back to a blank name/logo on first render before populateGuardrailProviders is called — though both are consistent with the existing codebase patterns.

No files require special attention — all key implementation files have been reviewed and look correct.

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Core guardrail implementation: credentials validated first, timeout set, Dict[str,Any] typed, error wrapped in GuardrailRaisedException, null-decision handled via or, and redact path correctly guards the texts key injection — all prior review issues resolved.
litellm/types/guardrails.py PROMPTGUARD enum added to SupportedGuardrailIntegrations; PromptGuardConfigModel correctly mixed into LitellmParams via multiple inheritance.
tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py 40 unit tests across 8 classes; all HTTP calls mocked (no real network calls); strict equality assertions used for redact paths; null-decision case covered.
ui/litellm-dashboard/src/components/guardrails/guardrail_info_helpers.tsx Static Promptguard entry added to guardrail_provider_map; PromptGuard key in guardrailLogoMap matches ui_friendly_name() return value — logo resolves correctly after populateGuardrailProviders is called.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLMProxy
    participant PromptGuardGuardrail
    participant PromptGuardAPI
    participant LLMProvider

    Client->>LiteLLMProxy: POST /v1/chat/completions
    LiteLLMProxy->>PromptGuardGuardrail: apply_guardrail(inputs, "request")
    PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard {messages, direction:"input"}
    alt decision: allow
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"allow"}
        PromptGuardGuardrail-->>LiteLLMProxy: inputs unchanged
        LiteLLMProxy->>LLMProvider: forward request
        LLMProvider-->>LiteLLMProxy: LLM response
        LiteLLMProxy->>PromptGuardGuardrail: apply_guardrail(outputs, "response")
        PromptGuardGuardrail->>PromptGuardAPI: POST /api/v1/guard {messages, direction:"output"}
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"allow"}
        PromptGuardGuardrail-->>LiteLLMProxy: outputs unchanged
        LiteLLMProxy-->>Client: 200 OK
    else decision: block
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"block", threat_type, confidence}
        PromptGuardGuardrail-->>LiteLLMProxy: raise GuardrailRaisedException
        LiteLLMProxy-->>Client: 400 Blocked
    else decision: redact
        PromptGuardAPI-->>PromptGuardGuardrail: {decision:"redact", redacted_messages}
        PromptGuardGuardrail-->>LiteLLMProxy: inputs with redacted texts/structured_messages
        LiteLLMProxy->>LLMProvider: forward redacted request
        LLMProvider-->>LiteLLMProxy: LLM response
        LiteLLMProxy-->>Client: 200 OK (PII redacted)
    else API error + block_on_error=true
        PromptGuardAPI-->>PromptGuardGuardrail: timeout / 5xx
        PromptGuardGuardrail-->>LiteLLMProxy: raise GuardrailRaisedException (fail-closed)
        LiteLLMProxy-->>Client: 400 Error
    end
Loading

Reviews (8): Last reviewed commit: "Fix black formatting: collapse f-string ..." | Re-trigger Greptile

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated
Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated
Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated
Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated
Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py
@acebot712
Copy link
Copy Markdown
Contributor Author

acebot712 commented Mar 21, 2026

All @greptileai review comments have been addressed:

@ishaan-jaff - would appreciate your review when you get a chance. This adds PromptGuard as a first-class guardrail provider (backend hook, UI garden card, docs page, 41 tests). Happy to address any further feedback.

@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Mar 31, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Comment thread litellm/proxy/guardrails/guardrail_hooks/promptguard/promptguard.py Outdated
Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.

Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
  apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries

Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap

Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
  request payload construction, error handling, config model, and
  registry wiring
- P1: Update structured_messages (not just texts) when PromptGuard
  returns a redact decision, so PII redaction is effective for the
  primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
  resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction
- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality
- Add explicit 10s timeout to async_handler.post() to prevent
  indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
  was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
  texts key (41 tests total)
…arams

- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
  Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)
Comment on lines +183 to +193
if decision == "redact":
redacted = result.get("redacted_messages")
if redacted:
if structured_messages:
inputs["structured_messages"] = redacted
if "texts" in inputs:
extracted = self._extract_texts_from_messages(
redacted,
)
if extracted:
inputs["texts"] = extracted
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _extract_texts_from_messages injects non-user content into texts when both inputs co-exist

When inputs contains both structured_messages and texts, the PromptGuard API is sent the structured_messages (which may include system + user messages). The redacted_messages in the response will therefore also contain all roles. _extract_texts_from_messages extracts text from every message regardless of role, so the result will include system message content that was never in the original texts.

Example: original texts = ["My SSN is 123-45-6789"] (1 user-only item) + structured_messages = [system_msg, user_msg]. After redaction, _extract_texts_from_messages(redacted) returns ["Be helpful.", "My SSN is *********"] (2 items — system content injected). The downstream caller's inputs["texts"] grows from 1 item to 2 items with injected system content.

The fix is to only extract texts from user messages:

if "texts" in inputs:
    extracted = [
        t for msg in redacted
        if msg.get("role") == "user"
        for t in (
            [msg["content"]] if isinstance(msg.get("content"), str)
            else [item["text"] for item in msg.get("content", []) if isinstance(item, dict) and item.get("type") == "text"]
        )
    ]
    if extracted:
        inputs["texts"] = extracted

Comment thread tests/test_litellm/proxy/guardrails/guardrail_hooks/test_promptguard.py Outdated
Comment on lines +155 to +159
except Exception as exc:
verbose_proxy_logger.error("PromptGuard API error: %s", str(exc))
if self.block_on_error:
raise
return inputs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Re-raising bare exception loses PromptGuard context

When block_on_error=True the raw httpx.HTTPStatusError or httpx.ConnectError is re-raised with raise. The caller receives a low-level transport exception rather than something that clearly identifies the guardrail as the failure source. Consider wrapping before re-raising:

except Exception as exc:
    verbose_proxy_logger.error("PromptGuard API error: %s", str(exc))
    if self.block_on_error:
        raise GuardrailRaisedException(
            guardrail_name=self.guardrail_name,
            message=f"PromptGuard API unreachable (block_on_error=True): {exc}",
        ) from exc
    return inputs

- P1: Filter _extract_texts_from_messages to user-role messages only,
  preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
  weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
  decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
  caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
  preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow
@acebot712
Copy link
Copy Markdown
Contributor Author

The only failing check is proxy-infra / Run tests — specifically test_check_migration_out_of_sync. This is a pre-existing upstream issue unrelated to this PR:

  • The test fails on every recent main branch run (92f9ad83, 13c37f35, 57e37f55 — all failure)
  • It also fails on every other open PR checked today (15/15 runs across all branches show failure)

All PromptGuard-specific checks pass:

  • test (proxy-guardrails): ✅
  • All 30+ other checks: ✅

Also addressed all Greptile review comments in ceb5b5d:

  • P1: _extract_texts_from_messages now filters to role == "user" only — prevents system content injection into texts
  • P1: Strengthened test assertion from in to strict == equality
  • P2: result.get("decision") or "allow" handles explicit null decision
  • P2: Wrapped bare exception re-raise in GuardrailRaisedException for guardrail context
  • P2: Added static Promptguard entry in guardrail_provider_map for pre-API-load fallback
  • Added test for explicit null decision treated as allow

@acebot712
Copy link
Copy Markdown
Contributor Author

All Greptile review comments from the latest round addressed in ceb5b5d:

  • P1 _extract_texts_from_messages injects non-user content: Fixed — filter to role == "user" messages only, so system/assistant content is never injected into texts.
  • P1 Weak in assertion: Fixed — changed to strict == equality so the test catches any injected system message content.
  • P2 dict.get() default not applied for null decision: Fixed — changed to result.get("decision") or "allow" plus new test_null_decision_treated_as_allow test.
  • P2 Re-raising bare exception loses context: Fixed — wrapped in GuardrailRaisedException with from exc chaining, updated both error-handling tests.
  • P2 Preset provider relies on dynamic map: Fixed — added static Promptguard: "promptguard" entry in guardrail_provider_map.
  • Black formatting: Fixed — reformatted to pass black --check.

@acebot712
Copy link
Copy Markdown
Contributor Author

Updated PR body to follow the repository's PR template format.

CI status clarification: All 5 failing checks (lint, proxy-infra, Analyze (actions), Analyze (javascript-typescript), zizmor) are pre-existing failures on main — the LiteLLM Linting workflow has failed on every recent main commit. All PromptGuard-specific files pass black --check. The guardrails test suite (test (proxy-guardrails)) passes. 53/58 checks green.

@ishaan-jaff — would appreciate your review when you get a chance.

@krrish-berri-2 krrish-berri-2 changed the base branch from main to litellm_oss_staging_04_08_2026 April 9, 2026 15:12
@krrish-berri-2 krrish-berri-2 merged commit c688d9d into BerriAI:litellm_oss_staging_04_08_2026 Apr 9, 2026
53 of 59 checks passed
@acebot712 acebot712 deleted the add-promptguard-guardrail branch April 9, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add PromptGuard as a first-class guardrail provider

2 participants