Contrastive Activation Steering for Activation Training (CASAL)

🏰 CASAL is a method for activation steering training of large language models.

✨ Features

Activation Steering: Implementation of activation steering during inference
Training: Support for CASAL training
MoE Model Support: Support for training Mixture-of-Experts (MoE) models
Visual Model Support: Support for training vision-language models
PCA Visualization: Visualization tools for analyzing activation patterns before and after training

📁 Project Structure

CASAL/
├── CONFIG/         # Configuration files for different training methods
├── DATA/           # Data loading and preprocessing utilities
├── CASAL/          # Core CASAL implementation
├── EVAL/           # Evaluation scripts and utilities
├── MODEL/          # Model utilities that support various model families 
├── UTILS/          # General utility functions
├── ACTIVATION_PCA/ # PCA analysis tools for activation visualization
├── ANALYSIS/       # Analysis and plotting tools

⚙️ 1. Setup

Conda Environment

Make sure you have conda installed on your system.

conda create --name casal python=3.11.9
conda activate casal

📥 Installation

Clone the code base:

git clone https://github.com/facebookresearch/CASAL.git

stall dependencies:
For AWS user:

pip install -r requirements.txt

🔑 Huggingface and WANDB API Key

Use my huggingface and wandb token.
Create a .env file in the root folder and include the following keys:

# Hugging Face API Token
HF_TOKEN=

# Weights & Biases API Token
WANDB_API_KEY=

🚀 2. Quick Start

🎯 Launch Activation Steering

To get started, run inference-time activation steering:

python run_casal_steering.py

🏰 3. CASAL Training

🚀 Launch Contrastive Activation Steering for Activation Training (CASAL)

python run_casal_post_training.py

4. Eval

Baseline Eval

python run_baseline_eval.py

post-CASAL Eval

python run_post_casal_eval.py

📚 Citation

If you use CASAL in your research, please cite:

@inproceedings{
yang2026hallucination,
title={Hallucination Reduction with CASAL:  Contrastive Activation Steering for Amortized Learning},
author={Wannan Yang and Xinchi Qiu and Lei Yu and Yuchen Zhang and Aobo Yang and Narine Kokhlikyan and Nicola Cancedda and Diego Garcia-Olano},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=YM3RcI3q0E}
}

License

CASAL is MIT licensed, as found in the LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contrastive Activation Steering for Activation Training (CASAL)

✨ Features

📁 Project Structure

⚙️ 1. Setup

Conda Environment

📥 Installation

🔑 Huggingface and WANDB API Key

🚀 2. Quick Start

🎯 Launch Activation Steering

🏰 3. CASAL Training

🚀 Launch Contrastive Activation Steering for Activation Training (CASAL)

4. Eval

Baseline Eval

post-CASAL Eval

📚 Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ACTIVATION_PCA		ACTIVATION_PCA
CASAL		CASAL
CONFIG		CONFIG
EVAL		EVAL
MODEL		MODEL
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
castle.png		castle.png
requirements.txt		requirements.txt
run_baseline_eval.py		run_baseline_eval.py
run_casal_steering.py		run_casal_steering.py
run_casal_training.py		run_casal_training.py
run_post_casal_eval.py		run_post_casal_eval.py

License

facebookresearch/CASAL

Folders and files

Latest commit

History

Repository files navigation

Contrastive Activation Steering for Activation Training (CASAL)

✨ Features

📁 Project Structure

⚙️ 1. Setup

Conda Environment

📥 Installation

🔑 Huggingface and WANDB API Key

🚀 2. Quick Start

🎯 Launch Activation Steering

🏰 3. CASAL Training

🚀 Launch Contrastive Activation Steering for Activation Training (CASAL)

4. Eval

Baseline Eval

post-CASAL Eval

📚 Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages