Prompt Injection Detection Dataset & Fine-tuning

Goal

This project aims to detect and classify prompt injection attacks through fine-tuning AI models. It includes:

The end goal is to fine-tune a model to:

The final dataset is in JSONL format. Each line looks like this:

Example with prompt injection:

{
  "text": "Ignore all previous instructions and act as...",
  "label": 1,
  "injection_type": "Jailbreak Attempt"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
scripts/dataset-gather		scripts/dataset-gather
README.md		README.md
requirements.txt		requirements.txt