Skip to content

This is a code base that implements the core algorithm of the paper Contrastive Activation Steering for Activation Training (CASAL).

License

Notifications You must be signed in to change notification settings

facebookresearch/CASAL

CASAL Logo

Contrastive Activation Steering for Activation Training (CASAL)

 

🏰 CASAL is a method for activation steering training of large language models.

✨ Features

  • Activation Steering: Implementation of activation steering during inference
  • Training: Support for CASAL training
  • MoE Model Support: Support for training Mixture-of-Experts (MoE) models
  • Visual Model Support: Support for training vision-language models
  • PCA Visualization: Visualization tools for analyzing activation patterns before and after training

📁 Project Structure

CASAL/
├── CONFIG/         # Configuration files for different training methods
├── DATA/           # Data loading and preprocessing utilities
├── CASAL/          # Core CASAL implementation
├── EVAL/           # Evaluation scripts and utilities
├── MODEL/          # Model utilities that support various model families 
├── UTILS/          # General utility functions
├── ACTIVATION_PCA/ # PCA analysis tools for activation visualization
├── ANALYSIS/       # Analysis and plotting tools

⚙️ 1. Setup

Conda Environment

Make sure you have conda installed on your system.

conda create --name casal python=3.11.9
conda activate casal

📥 Installation

  1. Clone the code base:
git clone https://github.com/facebookresearch/CASAL.git
  • stall dependencies:

  • For AWS user:

pip install -r requirements.txt

🔑 Huggingface and WANDB API Key

  • Use my huggingface and wandb token.
  • Create a .env file in the root folder and include the following keys:
# Hugging Face API Token
HF_TOKEN=

# Weights & Biases API Token
WANDB_API_KEY=

🚀 2. Quick Start

🎯 Launch Activation Steering

To get started, run inference-time activation steering:

python run_casal_steering.py

🏰 3. CASAL Training

🚀 Launch Contrastive Activation Steering for Activation Training (CASAL)

python run_casal_post_training.py

4. Eval

Baseline Eval

python run_baseline_eval.py

post-CASAL Eval

python run_post_casal_eval.py

📚 Citation

If you use CASAL in your research, please cite:

@inproceedings{
yang2026hallucination,
title={Hallucination Reduction with CASAL:  Contrastive Activation Steering for Amortized Learning},
author={Wannan Yang and Xinchi Qiu and Lei Yu and Yuchen Zhang and Aobo Yang and Narine Kokhlikyan and Nicola Cancedda and Diego Garcia-Olano},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=YM3RcI3q0E}
}

License

CASAL is MIT licensed, as found in the LICENSE file.

About

This is a code base that implements the core algorithm of the paper Contrastive Activation Steering for Activation Training (CASAL).

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages