Create a new environment and activate it
# create a new environment
conda env create -f environment-llm-safety.yml
# activate the environment
source activate llm-safety
# alternatively, use the following command to activate the environment
conda activate llm-safetyor only install the required packages without creating a new environment
# ignore the dependencies
pip install --no-deps -r requirements.txt
# alternatively, install the dependencies
pip install -r requirements.txtMake sure setup your API keys or service endpoint in utility/model.py before running experiments.
- OpenAI API key
- Claude API key
- API key in cloud services: DeepInfra or SiliconFlow
- Azure OpenAI endpoint and API key (optional)
We leave the API keys or service endpint as a placeholder in the code. You need to replace them with your own API keys or service endpoint. The place holder is like PLACEHOLDER_FOR_YOUR_API_KEY or PLACEHOLDER_FOR_YOUR_AZURE_OPENAI_ENDPOINT.
You can run a single experiment with the following command (You can change the --victim_model_name, --ps or any arguments. More details please refer to main.py and utility/argsparse.py):
# SATA-ELP
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --mode 7
# SATA-MLM
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps wiki-text-infilling-sw --mode 7or if you want to quickly re-produce our experimental results, just run a batch of experiments with our prepared scripts:
# SATA-ELP
bash scripts/victim-gpt-3.5-turbo.sh
# SATA-MLM
bash scripts/victim-gpt-3.5-turbo-TextInfilling.shYou can find many scripts in the scripts directory.
- You should change the bash command argument from
--mode 7to--mode 4
# SATA-ELP
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --mode 4
# SATA-MLM
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps wiki-text-infilling-sw --mode 4- For GPT evaluation, you should change the bash command argument from
--mode 7to--mode 1 - For sub-string evaluation, you should change the bash command argument from
--mode 7to--mode 2 - For both GPT evaluation and sub-string evaluation, you should change the bash command argument from
--mode 7to--mode 3 - Forcelly re-evaluate by GPT, you should change the bash command argument from to
--mode 8
# remember to add `--exp_id` argument
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --mode 3 --exp_id PASTE_YOUR_expIDYou should add a bash command argument --defense, e.g. --defense ppl, --defense rpo.
For example, you can run a single inference experiment:
# SATA-ELP
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps swq-mask-sw --defense ppl --mode 7
# SATA-MLM
python src/main.py --input_dataset advbench-custom --victim_model_name gpt-3.5-turbo --judge_model_name gpt-4o --ps wiki-text-infilling-sw --defense ppl --mode 7or if you want to quickly re-produce our experimental results with defense, just run a batch of experiments with our prepared scripts:
# SATA-ELP
bash scripts/defense-ppl/victim-llama3-70b.sh
# SATA-MLM
bash scripts/defense-ppl/victim-llama3-70b-TextInfilling.shWe provide jupyter notebooks for you to ensemble the results of SATA with different --ps. You can find the notebook in the src directory.
is optional
If you find our work useful in your research, please consider citing our paper:
@inproceedings{dong-etal-2025-sata,
title = "{SATA}: A Paradigm for {LLM} Jailbreak via Simple Assistive Task Linkage",
author = "Dong, Xiaoning and
Hu, Wenbo and
Xu, Wei and
He, Tianxing",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.100/",
doi = "10.18653/v1/2025.findings-acl.100",
pages = "1952--1987",
ISBN = "979-8-89176-256-5",
}
This repository is based on the ArtPrompt repository. We thank the authors for their great work.

