Name	Name	Last commit message	Last commit date
parent directory ..
alpaca-qlora	alpaca-qlora
README.md	README.md
qlora_finetuning_cpu.py	qlora_finetuning_cpu.py

Finetuning LLAMA Using QLoRA (experimental support)

This example demonstrates how to finetune a llama2-7b model using Big-LLM 4bit optimizations on Intel CPUs.

Distributed Training Guide

Single node with single socket: simple example or alpaca example
Single node with multiple sockets
multiple nodes with multiple sockets

Example: Finetune llama2-7b using QLoRA

This example is ported from bnb-4bit-training.

1. Install

conda create -n llm python=3.11
conda activate llm
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
pip install transformers==4.36.0
pip install peft==0.10.0
pip install datasets
pip install bitsandbytes scipy

2. Finetune model

If the machine memory is not enough, you can try to set use_gradient_checkpointing=True in here. While gradient checkpointing may improve memory efficiency, it slows training by approximately 20%. We Recommend using micro_batch_size of 8 for better performance using 48cores in this example. You can refer to this guide for more details. And remember to use ipex-llm-init before you start finetuning, which can accelerate the job.

source ipex-llm-init -t
python ./qlora_finetuning_cpu.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --dataset DATASET

Sample Output

{'loss': 2.0251, 'learning_rate': 0.0002, 'epoch': 0.02}
{'loss': 1.2389, 'learning_rate': 0.00017777777777777779, 'epoch': 0.03}
{'loss': 1.032, 'learning_rate': 0.00015555555555555556, 'epoch': 0.05}
{'loss': 0.9141, 'learning_rate': 0.00013333333333333334, 'epoch': 0.06}
{'loss': 0.8505, 'learning_rate': 0.00011111111111111112, 'epoch': 0.08}
{'loss': 0.8713, 'learning_rate': 8.888888888888889e-05, 'epoch': 0.09}
{'loss': 0.8635, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.11}
{'loss': 0.8853, 'learning_rate': 4.4444444444444447e-05, 'epoch': 0.12}
{'loss': 0.859, 'learning_rate': 2.2222222222222223e-05, 'epoch': 0.14}
{'loss': 0.8608, 'learning_rate': 0.0, 'epoch': 0.15}
{'train_runtime': xxxx, 'train_samples_per_second': xxxx, 'train_steps_per_second': xxxx, 'train_loss': 1.0400420665740966, 'epoch': 0.15}
100%|███████████████████████████████████████████████████████████████████████████████████| 200/200 [07:16<00:00,  2.18s/it]
TrainOutput(global_step=200, training_loss=1.0400420665740966, metrics={'train_runtime': xxxx, 'train_samples_per_second': xxxx, 'train_steps_per_second': xxxx, 'train_loss': 1.0400420665740966, 'epoch': 0.15})

3. Merge the adapter into the original model

Using the export_merged_model.py to merge.

python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged

Then you can use ./outputs/checkpoint-200-merged as a normal huggingface transformer model to do inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Finetuning LLAMA Using QLoRA (experimental support)

Distributed Training Guide

Example: Finetune llama2-7b using QLoRA

1. Install

2. Finetune model

Sample Output

3. Merge the adapter into the original model

FilesExpand file tree

QLoRA-FineTuning

Directory actions

More options

Directory actions

More options

Latest commit

History

QLoRA-FineTuning

Folders and files

parent directory

README.md

Finetuning LLAMA Using QLoRA (experimental support)

Distributed Training Guide

Example: Finetune llama2-7b using QLoRA

1. Install

2. Finetune model

Sample Output

3. Merge the adapter into the original model