Skip to content

libozhu03/PassionSR

Repository files navigation

πŸš€ PassionSR: Low-Bit Quantized Super-Resolution

LiBo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo and Xiaokang Yang
"PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution", CVPR 2025

page arXiv supp releases visitors GitHub Stars


πŸ“š Table of Contents


πŸ”₯ News

  • πŸŽ‰ [2025-06-09] Release code.
  • 🚩 [2025-03-10] The 2/4-bit version QArtSR is released.
  • πŸ† [2025-02-27] Congratulations, PassionSR has been accepted to CVPR 2025.
  • [2024-11-25] Create repository.

⭐⭐⭐ If PassionSR is helpful to your projects, please help star this repo. Thanks!


πŸ“˜ Abstract

Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps. However, even though the denoising step has been reduced to one, they require high computational costs and storage requirements, making it difficult for deployment on hardware devices. To address these issues, we propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR. First, we simplify OSD model to two core components, UNet and Variational Autoencoder (VAE) by removing the CLIPEncoder. Secondly, we propose Learnable Boundary Quantizer (LBQ) and Learnable Equivalent Transformation (LET) to optimize the quantization process and manipulate activation distributions for better quantization. Finally, we design a Distributed Quantization Calibration (DQC) strategy that stabilizes the training of quantized parameters for rapid convergence. Comprehensive experiments demonstrate that PassionSR with 8-bit and 6-bit obtains comparable visual results with full-precision model. Moreover, our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.


πŸ“ Structure Overview

HR LR OSEDiff(32-bit) EfficientDM(8-bit) PassionSR(8-bit)

βš™οΈ Installation

To set up the environment, clone the repository and create a new Conda environment using the provided dependencies.

git clone https://github.com/libozhu03/PassionSR.git
cd PassionSR
conda create -n passionsr python=3.10
conda activate passionsr
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple

Tested with:

  • Python 3.10
  • PyTorch 2.0.1
  • CUDA 11.8

πŸ“₯ Download Pretrained Models and Datasets

We provide pretrained weights for PassionSR under different settings.

Model Information Link
PassionSR The calibrated model weights under different settings Google Drive
SD2.1 Official model weights of stable diffusion 2.1 Huggingface

Place PassionSR's weights in ./weights and SD2.1 in ./hf-models.

Used training and testing sets can be downloaded as follows:

Training Set Testing Set Visual results
500 training images [Google Drive] RealSR + DRealSR + DIV2K_val [Google Drive] Google Drive

Download training and testing datasets and put them into the corresponding folders of ./data.


πŸ“ˆ Training

Run the command below to perform Post-Training Quantization (PTQ) using your desired configuration file. The script loads pretrained Stable Diffusion and OSEDiff weights, and applies quantization to selected components (e.g., UNet and/or VAE).

# Train the W8A8 models of Table 2 in the main paper. 
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/UV/saw_sep/saw_U_W8A8_V_W8A8.yaml

# Train the W6A6 models of Table 2 in the main paper. 
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/UV/saw_sep/saw_U_W6A6_V_W6A6.yaml

# Train the W8A8 models of Table 1 in the supplementary material. 
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/U/saw_sep/saw_W8A8.yaml

# Train the W6A6 models of Table 1 in the supplementary material.
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/U/saw_sep/saw_W6A6.yaml
πŸ”§ Training Configuration Example: The example YAML config demonstrates typical usage and can be adapted for different settings.
# device setting
device: "cuda:0"

cali_img_path: "data/cali_dataset" # path of calibration dataset

basic_config: # basic config for OSEDiff inference process
  seed: 42
  precision: "autocast"  # "full", "autocast"
  upscale: 4
  process_size: 512
  scale: 9.0
  lora_weights_path: preset/models/osediff.ckpt # OSEDiff ckpt path
  pretrained_model_name_or_path: hf-models/sd21 # stable diffusion path
  config: hf-models/ldm_Config/stable-diffusion/intel/v2-inference-v-fp32.yaml
  ckpt: hf-models/sd21/v2-1_512-ema-pruned.ckpt # stable diffusion ckpt path
  context_embedding_path: preset/models/empty_context_embedding.pt # empty text embedding path
  align_method: "nofix"  # 'wavelet', 'adain', 'nofix'
  merge_lora: True # merge lora into weight

quantize_config:
  quantize: True # quantize or not
  only_Unet: True # only quantize Unet or quantize both Unet and Vae
  Unet: # quantize setting for U-net
    quantype: PTQ # don't change
    method: saw_sep # name of method
    only_weight: False # weight only quantization method
    weight_quant_bits: 8
    weight_sym: False # weight quantization asymmetrical or not
    weight_sign: False # weight quantiztion sign or not
    act_quant_bits: 8
    act_sign: False # act quantiztion sign or not
    act_sym: False # act quantization asymmetrical or not
    split: True # half split for activation
    layer_type: 2Dquant # two quantizer types (2Dquant and normal_quant)
    s_alpha: 0.3 # scale factor intialization exponents
  Vae:
    quantype: PTQ
    method: saw
    only_weight: False
    weight_quant_bits: 8
    weight_sym: False
    weight_sign: False
    act_quant_bits: 8
    act_sign: False
    act_sym: False
    split: True
    layer_type: 2Dquant
  output_modelpath: results/quantize/saw_sep/UV/W8A8 # output path
  # calibration settings
  cali_batch_size: 4
  cali_learning_rate: 1e-5
  cali_epochs: 2
  loss_function: mse
  scheduler:
    milestones: [1]
    gamma: 0.1
  save_interval: 2

πŸ§ͺ Inference

Use the following command to run inference with quantized models. The pipeline supports various datasets (e.g., DIV2K_val, RealSR, DRealSR) and includes options for tiling, LoRA merging.

# Reprodue the W8A8 results of Table 2 in the main paper. 
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/UV/saw_U_W8A8_V_W8A8.yaml

# Reprodue the W6A6 results of Table 2 in the main paper. 
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/UV/saw_U_W6A6_V_W6A6.yaml

# Reprodue the W8A8 results of Table 1 in the supplementary material. 
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/U/saw_W8A8.yaml

# Reprodue the W6A6 results of Table 1 in the supplementary material. 
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/U/saw_W8A8.yaml
πŸ”§ Inference Configuration Example: The example YAML config demonstrates typical usage and can be adapted for different settings.
# device setting
device: cuda:0
out_dir: results/quantize/saw_sep/U/W8A8 # output path

# dataset to inference, set detailed dataset path in preset/data_construct.py
dataset: DIV2K_val # ["DIV2K_val", "RealSR", "DRealSR"] 

basic_config:
  seed: 42
  precision: "autocast" # ["full", "autocast"]
  process_size: 512
  config: hf-models/ldm_Config/stable-diffusion/intel/v2-inference-v-fp32.yaml
  ckpt: hf-models/sd21/v2-1_512-ema-pruned.ckpt
  lora_weights_path: preset/models/osediff.ckpt
  pretrained_model_name_or_path: hf-models/sd21
  context_embedding_path: preset/models/empty_context_embedding.pt
  upscale: 4
  align_method: adain # ['wavelet', 'adain', 'nofix']
  merge_lora: True

# scale: 9.0

# tile setting
tile_config:
  vae_decoder_tiled_size: 224 
  vae_encoder_tiled_size: 1024
  latent_tiled_size: 64 
  latent_tiled_overlap: 32

# quantize config
quantize_config:
  quantize: True
  only_Unet: True
  Unet: # keep same with quantize config
    quant_ckpt: weights/U_W8A8/PTQ/unet_ckpt_merge_saw_sep.pth # Unet quantize ckpt path
    quantype: PTQ
    method: saw
    only_weight: False
    weight_quant_bits: 8
    weight_sym: False
    weight_sign: False
    act_quant_bits: 8
    act_sign: False
    act_sym: False
    split: True
    layer_type: 2Dquant
    s_alpha: 0.3

πŸ“¦ Measure

Evaluate model performance by comparing super-resolution outputs against high-resolution ground truth images:

CUDA_VISIBLE_DEVICES="0" python measure.py -i YOUR_IMAGE_PATH -r HR_IMAGE_PATH

This script computes the image quality metrics presented in paper to assess the effectiveness of quantized inference.


πŸ”Ž Results

PassionSR significantly out-performs previous methods at the setting of W8A8 and W6A6.

Detailed results can be downloaded at Google Drive.

πŸ“Š Quantitative comparisons in Table 2 of the main paper (click to expand)

πŸ–Ό Visual comparison in Figure 6 of the main paper (click to expand)


πŸ“ Acknowledgements

We would like to thank the developers and maintainers of Stable Diffusion, Diffusers, and OSEDiff for their open-source contributions, which have greatly facilitated our research and development.

This project is supported in part by the Shanghai Jiao Tong University Artificial Intelligence Institute.

We also thank our collaborators and contributors for their valuable feedback and technical discussions.

πŸ“Œ Citation

@inproceedings{zhu2025passionsr,
  title={{PassionSR}: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution},
  author={Zhu, Libo and Li, Jianze and Qin, Haotong and Zhang, Yulun and Guo, Yong and Yang, Xiaokang},
  booktitle={CVPR},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors