Skip to content

KingdalfGoodman/LoTA-QAF

Repository files navigation

LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning

LoTA-QAF: Accepted to NeurIPS'25 as a poster. 😁

arxiv.org/abs/2505.18724

This repository contains the code for LoTA-QAF, a novel fine-tuning method for quantized Large Language Models (LLMs). It enables the lossless merging of ternary adaptation weights and the adjustment of all quantized weights. LoTA-QAF combines:

  • Custom-designed Ternary Adaptation (TA) that aligns ternary weights with the quantization grid to adjust quantized weights.
  • A TA-based mechanism for the lossless merging of adaptation weights.
  • Ternary Signed Gradient Descent (t-SignSGD) for updating TA weights.

1. File Structure

  • Core Logic:

    • LoTA_QAF_main.py: The main script for training LoTA-QAF and performing evaluations (using lm-eval for MMLU and evalGSV.py for GSM8K, SQL, and ViGGO).
  • LoTA Components (located in the LoTA/ directory):

    • LoTA/layer.py: Contains CustomLoraLinear, where Ternary Adaptation is implemented, used for training.
    • LoTA/adapter.py: Provides the LTA (Lossless Ternary Adaptation) classes for loading trained Ternary Adaptation during inference and evaluation.
    • LoTA/lota_merge.py: Includes the logic for merging Ternary Adaptation weights into the quantized model weights.
  • Optimizer:

    • t_signSGD.py: Implementation of the Ternary Signed Gradient Descent (t-SignSGD) optimizer used for training Ternary Adaptation.
  • Utility Modules:

    • data_print_save.py: A collection of utility functions for preparing datasets (e.g., Alpaca, GSM8K, SQL, ViGGO), printing configurations, and saving experimental results, etc.
    • evalGSV.py: A custom evaluation script designed for Task-Specific such as GSM8K, SQL, and ViGGO.
    • gptq_quantize.py: A script used for quantizing models using the GPTQModel library, preparing them for QAF.

2. Quick Start

Hardware Requirements:

  • CUDA Version: 12.2 (Recommended).

Software Dependencies:

The LoTA-QAF implementation is built upon specific versions of key libraries:

  • peft==0.15.1
  • gptqmodel==2.1.1.dev0

It is recommended to install packages using a virtual environment.

pip install -r requirements.txt
# For detailed versioning of all dependencies, please refer to the environment.yml file.

Basic Usage

The main script LoTA_QAF_main.py operates in two modes: Training (mode 1) and Evaluation (mode 2).

Common Base Parameters (baseConfig):

  • --mode: 1 for training, 2 for evaluation.
  • --pretrained: Path to the base pre-trained model (e.g., /your_path/models/llama_3.1_8B_Instruct).
  • --quantized_model_dir: Path to the quantized model directory (e.g., /your_path/quant_models/8B_instruct/int4_64_asym).

Mode 1:

For training, you'll primarily use trainingConfig arguments alongside baseConfig.

python LoTA_QAF_main.py \
    --mode 1 \
    --pretrained "/your_path/models/llama_3.1_8B_Instruct" \
    --quantized_model_dir "/your_path/quant_models/8B_instruct/int4_64_asym" \
    --lota_qaf True \

    --training_data_name "alpaca" \
    --adapter_path "your_path/adapter_output" \

    --interval_point 48 \   # Omega            for LoTA-QAF
    --filter_ratio 0.95 \   # Sigma_t          for LoTA-QAF, here 0.95 is discard 0.95 and select top 0.05. 
    --min_grad 0.999 \      # Effective range 0.95-0.999 in 0-80% of epochs. [Refer in "Baselines and Hyper-parameters" of the paper. The naming is not ideal and has not been updated yet.]
    --filter_upper 0.9999   # 0.999-0.9999     in 20-100% epoch

    --max_steps 300 \
    --save_number 5 \
    --train_batch_size 64 \
    --gradient_accumulation_steps 1 \

Mode 2:

For evaluation, you'll use evalConfig arguments. Parameters like pretrained, quantized_model_dir, w_bits, group_size, lora_r, lora_alpha, and lota_qaf are often automatically inferred from the --load_adapter path if an adapter is being evaluated.

# Example 1: Evaluate a GPTQ model with a LoTA-QAF adapter on MMLU
python LoTA_QAF_main.py \
    --mode 2 \
    --load_adapter "/path/to/your/trained_lota_adapter/8B_int4_LoTA_48_0.950_0.999_alpaca_..." \
    --tasks "mmlu" \
    --num_fewshot 5 \
    --eval_batch_size 16 \
    --output_path "./eval_results" \
    # --auto_gptq "gptq" # Default for loading adapter with GPTQ model

# Example 2: Evaluate a GPTQ model with a LoTA-QAF adapter on a task-specific dataset (e.g., gsm8k)
python LoTA_QAF_main.py \
    --mode 2 \
    --load_adapter "/path/to/your/trained_lota_adapter/8B_int4_LoTA_48_0.950_0.999_gsm8k" \
    --output_path "./eval_results_gsv" \
    --ft_dataset_name "gsm8k" \
    --eval_batch_size 64 \
    --auto_gptq "gptq"
  • Key Evaluation Parameters (evalConfig):
    • --load_adapter: Path to the trained adapter to load. Set to "none" to evaluate the base model without an adapter. Many parameters like lota_qaf, w_bits, group_size, pretrained model path, and quantized_model_dir will be auto-configured based on this path.
    • --auto_gptq: Use "gptq" to load a GPTQ quantized model (with or without adapter). Use "none" to load a 16-bit model (typically for evaluating a base 16-bit model without an adapter).
    • --tasks: List of tasks for lm-eval (e.g., "mmlu").
    • --ft_dataset_name: For task-specific evaluation using evalGSV.py (e.g., "gsm8k", "sql", "viggo"). If not "none", this evaluation type is chosen over lm-eval.
    • --num_fewshot: Number of few-shot examples for lm-eval.
    • --eval_batch_size: Batch size for evaluation.
    • --output_path: Directory to save evaluation results.

Automatic Parameter Configuration: The script includes logic to automatically determine several parameters, especially in evaluation mode (mode 2) when --load_adapter is specified. This includes:

  • w_bits, group_size from quantized_model_dir (training) or load_adapter path (evaluation).
  • lora_r, lora_alpha based on model size.
  • lota_qaf and load_ada_interval (Omega) based on the load_adapter path structure.
  • pretrained model path and quantized_model_dir based on model size and quantization bits inferred from the load_adapter path. Note: You will need to update the placeholder /your_path/ in the script (base_args.pretrained = f"/your_path/models/{pre}" and base_args.quantized_model_dir = f"/your_path/quant_models/{model_size}_instruct/int{base_args.w_bits}_{base_args.group_size}_asym") to your actual model paths for this auto-configuration to work correctly.

3. License and Version

  • Version: 2025.05.15
  • License: MIT License

Copyright (c) 2025 [KingdalfGoodman]

This project is licensed under the MIT License - see the LICENSE file for details

About

[NeurIPS'25 Poster] LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages