Skip to content

Agent breaker#1628

Open
eliyacohen-hub wants to merge 3 commits intoNVIDIA:mainfrom
eliyacohen-hub:agent_breaker
Open

Agent breaker#1628
eliyacohen-hub wants to merge 3 commits intoNVIDIA:mainfrom
eliyacohen-hub:agent_breaker

Conversation

@eliyacohen-hub
Copy link

Agent Breaker: Multi-turn red-team probe for agentic LLM applications

Adds a new probe (agent_breaker.AgentBreaker) that performs automated security testing of agentic LLM applications — systems that use tools (e.g. code execution, database queries, file access, API calls).

A red team model analyzes each tool for vulnerabilities, generates targeted exploits, attacks the agent in multi-turn conversations (learning from failures), and verifies attack success.

Key features:

  • Auto-discovery — if no tools are defined in config, the probe queries the target agent to discover its tools automatically
  • Parallel tool attacks — configurable max_parallel_tools (default: sequential)
  • Adaptive attacks — each attempt analyzes previous prompts/responses to improve exploits
  • Early stopping — stops attacking a tool immediately upon success

OWASP LLM Top 10: LLM01 (Prompt Injection), LLM07 (Insecure Plugin Design), LLM08 (Excessive Agency)

Verification

  • Create a scan config YAML pointing to your target agent REST endpoint
  • python -m garak --config scan_config.yaml
  • python -m pytest tests/probes/test_agent_breaker.py tests/detectors/test_detectors_agent_breaker.py -v
  • Verify auto-discovery works when agent.yaml has no tools defined
  • Verify parallel and sequential tool attacks both work correctly
  • Verify results display: agent_breaker.AgentBreakerResult: FAIL ok on X/Y

Environment notes

  • Requires a red team model via NVIDIA Inference API, or the user can change it to another llm endpoint
  • Requires a target agent exposed as a REST endpoint (or any garak generator)
  • No specific hardware requirements (all inference is remote API calls)

eliyac-cyber and others added 3 commits February 24, 2026 10:49
Add a new generator wrapping the NVIDIA Inference API
(OpenAI-compatible endpoint at inference-api.nvidia.com).
Used by the AgentBreaker probe as the default red team model.

Co-authored-by: Cursor <cursoragent@cursor.com>
Multi-turn red-team probe that systematically attacks tool-using LLM
agents to identify security vulnerabilities. Key features:

- Analyzes each tool for exploitable vulnerabilities using a red team model
- Auto-discovers agent purpose and tools if not configured
- Configurable parallelism for concurrent tool attacks
- Per-tool exploit verification with confidence scoring
- Custom detector (AgentBreakerResult) for garak evaluator integration

Includes 41 tests (33 probe + 8 detector) covering config loading,
auto-discovery, attack orchestration, tool ordering, early stopping,
postprocess flag propagation, and verification parsing.
@github-actions
Copy link
Contributor

DCO Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by just posting a Pull Request Comment same as the below format.


I have read the DCO Document and I hereby sign the DCO


You can retrigger this bot by commenting recheck in this Pull Request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants