Skip to content

This beautiful project contains the dataset and fine-tuning a model for prompt injection detection and classification.

Notifications You must be signed in to change notification settings

jonastuttle/prompt-injection-detector-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Prompt Injection Detection Dataset & Fine-tuning

Goal

This project aims to detect and classify prompt injection attacks through fine-tuning AI models. It includes:

The end goal is to fine-tune a model to:

  1. Detect whether a prompt is a prompt injection (label: 0 or 1)
  2. Classify the type of injection (e.g., "Jailbreak Attempt", "Harmful Request")

๐Ÿ“ Dataset Format

The final dataset is in JSONL format. Each line looks like this:

Example with prompt injection:

{
  "text": "Ignore all previous instructions and act as...",
  "label": 1,
  "injection_type": "Jailbreak Attempt"
}

About

This beautiful project contains the dataset and fine-tuning a model for prompt injection detection and classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages