Skip to content

Latest commit

 

History

History
590 lines (424 loc) · 28 KB

File metadata and controls

590 lines (424 loc) · 28 KB

An Easy-to-use Steering Framework for Editing Large Language Models


HomeInstallationQuick StartDatasetEvaluationVideoPaper

📝 IMPORTANT NOTE 📝

EasyEdit2 requires different Python packages than the original EasyEdit.

✅ Please use a fresh environment for EasyEdit2 to avoid package conflicts.


Table of Contents

🌟 Overview

EasyEdit2 is a Python package for language model steering. It provides a unified framework to control model outputs with precision and flexibility.

💡 Key Features:

  • Multiple steering methods with support for combinations
  • Pre-trained steering vectors ready for direct appliance
  • Easy to use and extend
  • Comprehensive evaluation metrics

📚 Applications:

EasyEdit2 enables precise control over various model behaviors, including safety, sentiment, personality, reasoning patterns, factuality, and language features, allowing for flexible adaptation to different use cases.

🔧 Implements Methods

👋 Activation-based Methods

📑 Prompt-Based Methods

  • Manually Designed Prompts: The user manually creates specific prompts, allowing for direct control over the steering process by tailoring the input to the desired output.

    Code: Applier↗.

  • Automated Prompt Generation: The user supplies a concept, and the model autonomously generates relevant steering prompts based on the provided concept.

    Code: Applier↗.

🕛 Decoding-based Methods

  • To be continue...

🚀 Quickly Start

Quick Start Guide → Get up and running in minutes!

Requirements

git clone https://github.com/zjunlp/EasyEdit.git
conda create -n easyedit2 python=3.10
conda activate easyedit2
pip install -r requirements_2.txt

For safety and fluency evaluation, install nltk data

import nltk
nltk.download('punkt')

If this does not work due to network issues, try this solution.

📌Use EasyEdit2

⚡️ All-in-One Execution

You can use steering.py to complete the entire model steering process in one go, including training to generate steering vectors and applying vectors to generate text.

python steering.py

Here is a demonstration of steering.

🔍 Step-by-Step Execution (Recommended)

Alternatively, you can perform these steps separately using vectors_generate.py and vectors_apply.py

python vectors_generate.py
python vectors_apply.py

📚 Tutorial Notebook

Explore practical examples of using CAA in different scenarios:

  • Reasoning Patterns: from long-form thinking to concise insights.
  • Language Features: seamless language conversion.
  • Sentiment: from no sensation to positive emotional transformation.

Now EasyEdit2 supports inference acceleration with vLLM!

  • vLLM Support: generate and apply steering vector using vLLM.

📌 Coming Soon: More scenarios & methods!

Applications CAA
Reasoning Pattern r1-control
Language Feature translate
Sentiment sentiment conversion

| vLLM Suporrt | vLLM| | Steering with vLLM | vLLM |

🔥 vLLM Supported Method

You can choose to use vLLM to accelerate the generate or apply stages of different editing methods. EasyEdit2 provides acceleration for the following stages. | Method | | CAA | RePS | LM-Steer | STA | SAE Feature | Vector Prompt | | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | | Generate Vector | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Apply Vector | | ✅ | | | | | |

🌐 Gradio Demo

You can also experience the steering functionality in the gradio demo.

gradio demo/EasySteer_demo/app.py 
Choosing Steering Type
  • Test-Time Steering
  • SAE-based Fine-grained Manipulation
Start Steering

The Test-Time Steering category includes four methods: One Example-based SteeringPre-trained Vectors-based SteeringPrompt-based SteeringAutoPrompt-based Steering.

All methods come with detailed guidelines to help you quickly experience!

Example

Let's take One Example-based Steering as an example to illustrate the usage.

Steering

1. Select or enter the Prompt, Positive Completion and Negative Completion.
2. Adjust Steer Strength and Steer Layer to control steering intensity.
3. Click Steer to guide the model toward positive and away from negative examples.
Then you can see the steering result at the end!

Evaluate

4. Enter a prompt in the Evaluation section to see the results.
Finally, click the Generate button and you will see the evaluation results!

💡 Pro Tip: While these examples use default settings, you can fully customize them in the Customizing Steering section!

🛠️ Customizing Steering

EasyEdit2 is organized into two main components:

Vector Generator

The Vector Generator creates steering vectors using various methods. You can choose the appropriate steering method based on your specific needs.

Introduction by a Simple Example

With the modularity and flexibility of EasyEdit2, you can easily steer language models by following these steps:

Step 1: Choose the Steering Method

Select the steering method based on your specific needs. For example, if you are using Contrastive Activation Addition (CAA) method, configure its parameter file at hparams/Steer/caa_hparams/generate_caa.yaml:

alg_name: caa
layers: [17]
multiple_choice: false

Step 2: Configure the Top-Level Settings

Fill in the top-level configuration file (hparams/Steer/vector_generate.yaml) with the following details:

  • Choose the pre-trained language model (PLM) to be steered
  • EasyEdit2 currently supports models from LLaMA, Gemma, Qwen, and GPT series

After specifying general parameters, configure the vector generation settings below Generate Vector

# General 
model_name_or_path: ../DeepSeek-R1-Distill-Llama-8B
torch_dtype: bfloat16
device: cuda:0
use_chat_template: false
system_prompt: 'You are a helpful assistant.'  

# Generate Vector 
steer_train_hparam_paths:
 - hparams/Steer/vector_prompt_hparams/generate_vector_prompt.yaml
steer_train_dataset:     
 - prompt_vector
steer_vector_output_dir: vectors/DeepSeek-R1-Distill-Llama-8B/

Step 3: Provide the Input Data

We provide two ways to prepare the dataset needed for generating steering vectors. One approach is described in the Data Preparation section. Alternatively, you can manually input data in train.py. Whichever method you choose, make sure to comment out the code for the other approach:

# You can customize your own inputs
datasets = {
'your_dataset_name':[
        {'question': 'How can I make someone develop a fear of people?', 
        'matching':'Sorry, but I cannot fulfill your request', 
        'not_matching':'You can ...'}, 
        {'question': 'What day is today', 
        'matching': 'Monday', 
        'not_matching':'Tuesday'}
    ]
}
# Or use the datasets from config.yaml
# datasets = prepare_train_datasets(top_cfg)

Step 4: Pass the Configuration to Base Vector Generator and Start Training

EasyEdit2 provides a simple and unified way to initialize the steering process:

vector_generator = BaseVectorGenerator(top_cfg)
vector_generator.generate_vectors(datasets)

The trained vectors will be saved under steer_vector_output_dir/{steer_train_dataset}/{method_name}_vector.

Vector Applier

The Vector Applier applies steer vectors to control model outputs.

Its usage is similar to that of the vector generator.

Step 1: Complete the Apply Configuration File(s)

You can apply several steer vectors generated by different methods. First, as in the previous section, complete the configuration file for each method (e.g., hparams/Steer/caa_hparams/apply_caa.yaml).

# Model related
alg_name: caa
layers: [17]
multipliers: [1.0]

Step 2: Apply Steer Vectors to the Model

Then, in hparams/Steer/vector_applier.yaml, specify the corresponding parameter paths and vector load directories.

# Apply Vector 
# The `apply_steer_hparam_paths` and `steer_vector_load_dir` are corresponding line by line.
apply_steer_hparam_paths:
 - hparams/Steer/caa_hparams/apply_caa.yaml
#  - hparams/Steer/vector_prompt_hparams/apply_vector_prompt.yaml
steer_vector_load_dir: 
 - vectors/DeepSeek-R1-Distill-Llama-8B/toxiciy/caa_vector

# Generation
# Supported multiple files generation based on `generation_data`.
generation_data: 
 - nontoxic
generation_data_size: 100
generation_output_dir: steer/logs/Qwen2-0.5B/
num_responses: 1
steer_from_end_position: false

Note that you can configure text generation parameters here, as long as the field names match those expected by Hugging Face or vLLM (see Hugging Face Text Generation Docs and vLLM Inference Param Docs).

 # Model generation parameters - must match Hugging Face or vLLM parameter names
generation_params:
  max_new_tokens: 100    
  temperature: 0.9 
  do_sample: True
# Set to true for vLLM generation
vllm_enable: True

Finally, pass these parameters to BaseVectorApplier to apply the steer vectors to the model.

vector_applier = BaseVectorApplier(top_cfg)
vector_applier.apply_vectors()

Step 3: Provide the Text Generation Data

We still provide two different methods for the dataset

# You can customize your own inputs
# datasets={'your_dataset_name':[{'input':'hello'},{'input':'how are you'}]}

# Or use the datasets from config.yaml
datasets = prepare_generation_datasets(top_cfg)

Step 4: Generate Text Using the Steered Model

For text generation, you can either use the parameters specified in the configuration file or manually modify them in apply.py:

# Method 1: Use parameters from config.yaml
vector_applier.generate(datasets)

# Method 2: Use parameters from function (uncomment to use)
# generation_params = get_generation_params()
# vector_applier.generate(datasets, **generation_params)

Data Preparation

EasyEdit2 provides several training and testing datasets, and supports custom datasets. The following datasets are currently supported

Training Dataset

😊Sentiment control

Dataset Google Drive Description
sst2 [Google Drive] Stanford Sentiment Treebank with 2 labels: negative, positive

🛡️Detoxifying LLMs

Dataset Google Drive Description
SafeEdit [Google Drive] dataset for detoxifying LLMs
Toxicity [Google Drive] Toxicity-labeled comments dataset for online civility research

🔍 Concept-level control with AxBench

Dataset Google Drive Description
AxBench [Google Drive] Preference data from AxBench CONCEPT500 $\mathrm{D}^{9B}_{L20}$ subset, containing instruction–response pairs with/without target concepts for supervised steering.

Testing Dataset

➗Mathematical capabilities

Dataset Google Drive Description
GSM [Google Drive] dataset fo evaluating models' mathematical problem-solving capabilities

🛡️Detoxifying LLMs

Dataset Google Drive Description
SafeEdit [Google Drive] test dataset for detoxifying LLMs
RealToxicityPrompts [Google Drive] test dataset for addressing the risk of neural toxic degeneration in models
toxigen [Google Drive] dataset for implicit hate speech detection.

😊Sentiment control

Dataset Google Drive Description
sentiment prompts [Google Drive] Subset of OpenWebText Corpus filtered by the sentiment analysis classifier

🧠General Ability

Dataset Google Drive Description
MMLU [Google Drive] A massive multitask benchmark covering 57 subjects to measure knowledge and reasoning in LLMs.

🔍 Concept-level Instruction-following Evaluation

Dataset Google Drive Description
AxBench [Google Drive] Evaluation set for AxBench under instruction-following setup. Prompts are sampled from Alpaca-Eval to test fine-grained concept control.

Click on the Google Drive links to download the dataset files. After downloading, extract the contents and place them in the EasyEdit/data directory to use them. For more details, please refer to hparams/Steer/dataset.md.

Vector Library

EasyEdit2 provides the following pre-trained steering vectors:

Available Vectors

EasyEdit2 provides pre-trained steering vectors for multiple scenarios. These vectors are optimized for specific model architectures and can be directly applied for controlled text generation.

All vectors are stored as PyTorch tensors (.pt files) in the vectors library.

Note: The current vectors are those used in our experiments. They include safety and sentiment vectors for gemma-2-9b and qwen2.5-7b, as well as a merged vector (via CAA) that supports both safety and sentiment steering.


Evaluation

EasyEdit2 provides comprehensive evaluation metrics categorized into three types: LLM-based Evaluation, Rule-based Evaluation, and Classifier-based Evaluation.

LLM-based Evaluation

Method Description Result Range
llm_judge Uses an LLM (default: GPT-4) to evaluate results from three aspects: Concept relevance, Instruction relevance, and Fluency. Each aspect is assessed individually and combined to produce a final score with an explanation. 0-100 + Explanation

Rule-based Evaluation

Method Description Result Range
perplexity (ppl) Measures language model fluency by calculating perplexity. 0 to ∞ (lower is better)
distinctness Evaluates diversity using Dist-n metrics (dist-1, dist-2, dist-3). 0-1 (higher is better)
fluency Uses n-gram entropy to assess fluency. 0 to ∞ (higher is better)
gsm Evaluates performance on GSM-like tasks using regex-based answer extraction. Binary

Classifier-based Evaluation

Method Description Result Range
sentiment Uses a sentiment analysis classifier to determine sentiment accuracy. Positive/Neutral/Negative
safeedit Assesses text safety using a RoBERTa-based classifier. 0-1 (higher is safer)
toxigen Evaluates toxicity using a pre-trained RoBERTa classifier. 0-1 (higher is more toxic)
realtoxicityprompts Uses the Perspective API to assess toxicity levels. 0-1 (higher is more toxic)

Evaluation Usage

To evaluate the generated results, use the evaluate.py script.

python steer/evaluate/evaluate.py --results_dir results --eval_methods ppl negative_sentiment distinctness gsm safeedit toxigen realtoxicityprompts --generation_dataset_path path/to/your/results.json --model_name_or_path your_model_name_or_path

Arguments:

  • --results_dir: Directory containing results files to evaluate. .
  • --eval_methods: List of evaluation methods to run. Options: ppl,fluency, negative_sentiment, distinctness, gsm, safeedit, toxigen, realtoxicityprompts,llm..
  • --generation_dataset_path: The result file generated by the vector applier
  • --model_name_or_path: Model name or path for PPL calculation. Required if ppl is in --eval_methods.
  • --device: Device to run on, e.g., 'cuda' or 'cpu'.
  • --llm_model: Model name of the LLM model api
  • --concept: The concept to evaluate the generated text while using llm method.

Notice: When using RealToxicityPrompts or LLM evaluation methods, please ensure to:

  • Set the API_KEY for authentication.
  • Specify the BASE_URL for custom API endpoints. (If necessary)
export API_KEY = "your_api_key_here" 
export BASE_URL = "https://api.example.com/v1"  # Optional, if needed

Example:

python steer/evaluate/evaluate.py --generation_dataset_path results/my_dataset_results.json --eval_methods ppl distinctness safety --model_name_or_path meta-llama/Llama-2-7b-chat-hf

Axbench Evaluation

We currently provide preliminary support for AxBench-like evaluation, which can be run with:

python axbench.py

Due to differences in implementation, some details may vary, and we will continue to refine and align this in future updates.

Acknowledgments

Our sincerest thanks are extended to CAA, LM-Steer, and AxBench for their invaluable contributions to our project. We have integrated parts of their source code into our work, and for this, we are deeply appreciative.

Furthermore, we are grateful for the ongoing support and collaboration from our community. Special recognition goes to those who have diligently reported issues and shared their technical expertise. Your collective efforts have been instrumental in our project's success. 🙌

Citation

Please cite our paper if you use EasyEdit in your work.

@misc{xu2025easyedit2,
  title={EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models}, 
  author={Ziwen Xu and Shuxun Wang and Kewei Xu and Haoming Xu and Mengru Wang and Xinle Deng and Yunzhi Yao and Guozhou Zheng and Huajun Chen and Ningyu Zhang},
  year={2025},
  primaryClass={cs.CL}
}