GitHub - xndong/SATA: Repository of SATA paradigm for LLM jailbreak.

Introduction of SATA

SATA-MLM

SATA-ELP

Before Your Running

1. Create Environment for the Experiments

Create a new environment and activate it

# create a new environment
conda env create -f environment-llm-safety.yml
# activate the environment
source activate llm-safety 
# alternatively, use the following command to activate the environment
conda activate llm-safety

or only install the required packages without creating a new environment

# ignore the dependencies
pip install --no-deps -r requirements.txt
# alternatively, install the dependencies
pip install -r requirements.txt

2. Prepare API Keys or Service Endpoint for the Experiments

Make sure setup your API keys or service endpoint in utility/model.py before running experiments.

OpenAI API key
Claude API key
API key in cloud services: DeepInfra or SiliconFlow
Azure OpenAI endpoint and API key (optional)

We leave the API keys or service endpint as a placeholder in the code. You need to replace them with your own API keys or service endpoint. The place holder is like PLACEHOLDER_FOR_YOUR_API_KEY or PLACEHOLDER_FOR_YOUR_AZURE_OPENAI_ENDPOINT.

Run SATA

Run Jailbreak Inference and Evaluation

You can run a single experiment with the following command (You can change the --victim_model_name, --ps or any arguments. More details please refer to main.py and utility/argsparse.py):

# SATA-ELP
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --mode 7
# SATA-MLM
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps wiki-text-infilling-sw --mode 7

or if you want to quickly re-produce our experimental results, just run a batch of experiments with our prepared scripts:

# SATA-ELP
bash scripts/victim-gpt-3.5-turbo.sh
# SATA-MLM
bash scripts/victim-gpt-3.5-turbo-TextInfilling.sh

You can find many scripts in the scripts directory.

Run Jailbreak Inference Only

You should change the bash command argument from --mode 7 to --mode 4

# SATA-ELP
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --mode 4
# SATA-MLM
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps wiki-text-infilling-sw --mode 4

Run Jailbreak Evaluation Only

For GPT evaluation, you should change the bash command argument from --mode 7 to --mode 1
For sub-string evaluation, you should change the bash command argument from --mode 7 to --mode 2
For both GPT evaluation and sub-string evaluation, you should change the bash command argument from --mode 7 to --mode 3
Forcelly re-evaluate by GPT, you should change the bash command argument from to --mode 8

# remember to add `--exp_id` argument
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --mode 3 --exp_id PASTE_YOUR_expID

Run SATA with Jailbreak Defense

You should add a bash command argument --defense, e.g. --defense ppl, --defense rpo. For example, you can run a single inference experiment:

# SATA-ELP
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --defense ppl --mode 7
# SATA-MLM
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps wiki-text-infilling-sw --defense ppl --mode 7

or if you want to quickly re-produce our experimental results with defense, just run a batch of experiments with our prepared scripts:

# SATA-ELP
bash scripts/defense-ppl/victim-llama3-70b.sh
# SATA-MLM
bash scripts/defense-ppl/victim-llama3-70b-TextInfilling.sh

Ensemble the Results of SATA (Optional)

We provide jupyter notebooks for you to ensemble the results of SATA with different --ps. You can find the notebook in the src directory. is optional

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{dong-etal-2025-sata,
    title = "{SATA}: A Paradigm for {LLM} Jailbreak via Simple Assistive Task Linkage",
    author = "Dong, Xiaoning  and
      Hu, Wenbo  and
      Xu, Wei  and
      He, Tianxing",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.100/",
    doi = "10.18653/v1/2025.findings-acl.100",
    pages = "1952--1987",
    ISBN = "979-8-89176-256-5",
}

Acknowledgement

This repository is based on the ArtPrompt repository. We thank the authors for their great work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction of SATA

SATA-MLM

SATA-ELP

Before Your Running

1. Create Environment for the Experiments

2. Prepare API Keys or Service Endpoint for the Experiments

Run SATA

Run Jailbreak Inference and Evaluation

Run Jailbreak Inference Only

Run Jailbreak Evaluation Only

Run SATA with Jailbreak Defense

Ensemble the Results of SATA (Optional)

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
data		data
scripts		scripts
src		src
utility		utility
README.md		README.md
environment-llm-safety.yml		environment-llm-safety.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Introduction of SATA

SATA-MLM

SATA-ELP

Before Your Running

1. Create Environment for the Experiments

2. Prepare API Keys or Service Endpoint for the Experiments

Run SATA

Run Jailbreak Inference and Evaluation

Run Jailbreak Inference Only

Run Jailbreak Evaluation Only

Run SATA with Jailbreak Defense

Ensemble the Results of SATA (Optional)

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages