LiBo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo and Xiaokang Yang
"PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution", CVPR 2025
- π₯ News
- π Abstract
- π Structure Overview
- βοΈ Installation
- π₯ Download Pretrained Models and Datasets
- π Training
- π§ͺ Inference
- π¦ Measure
- π Results
- π Acknowledgements
- π Citation
- π [2025-06-09] Release code.
- π© [2025-03-10] The 2/4-bit version QArtSR is released.
- π [2025-02-27] Congratulations, PassionSR has been accepted to CVPR 2025.
- [2024-11-25] Create repository.
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps. However, even though the denoising step has been reduced to one, they require high computational costs and storage requirements, making it difficult for deployment on hardware devices. To address these issues, we propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR. First, we simplify OSD model to two core components, UNet and Variational Autoencoder (VAE) by removing the CLIPEncoder. Secondly, we propose Learnable Boundary Quantizer (LBQ) and Learnable Equivalent Transformation (LET) to optimize the quantization process and manipulate activation distributions for better quantization. Finally, we design a Distributed Quantization Calibration (DQC) strategy that stabilizes the training of quantized parameters for rapid convergence. Comprehensive experiments demonstrate that PassionSR with 8-bit and 6-bit obtains comparable visual results with full-precision model. Moreover, our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
| HR | LR | OSEDiff(32-bit) | EfficientDM(8-bit) | PassionSR(8-bit) |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
To set up the environment, clone the repository and create a new Conda environment using the provided dependencies.
git clone https://github.com/libozhu03/PassionSR.git
cd PassionSR
conda create -n passionsr python=3.10
conda activate passionsr
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simpleTested with:
- Python 3.10
- PyTorch 2.0.1
- CUDA 11.8
We provide pretrained weights for PassionSR under different settings.
| Model | Information | Link |
|---|---|---|
| PassionSR | The calibrated model weights under different settings | Google Drive |
| SD2.1 | Official model weights of stable diffusion 2.1 | Huggingface |
Place PassionSR's weights in ./weights and SD2.1 in ./hf-models.
Used training and testing sets can be downloaded as follows:
| Training Set | Testing Set | Visual results |
|---|---|---|
| 500 training images [Google Drive] | RealSR + DRealSR + DIV2K_val [Google Drive] | Google Drive |
Download training and testing datasets and put them into the corresponding folders of ./data.
Run the command below to perform Post-Training Quantization (PTQ) using your desired configuration file. The script loads pretrained Stable Diffusion and OSEDiff weights, and applies quantization to selected components (e.g., UNet and/or VAE).
# Train the W8A8 models of Table 2 in the main paper.
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/UV/saw_sep/saw_U_W8A8_V_W8A8.yaml
# Train the W6A6 models of Table 2 in the main paper.
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/UV/saw_sep/saw_U_W6A6_V_W6A6.yaml
# Train the W8A8 models of Table 1 in the supplementary material.
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/U/saw_sep/saw_W8A8.yaml
# Train the W6A6 models of Table 1 in the supplementary material.
CUDA_VISIBLE_DEVICES="0" python ptq_quantize_single.py --config_file scripts/PTQ/config/U/saw_sep/saw_W6A6.yamlπ§ Training Configuration Example:
The example YAML config demonstrates typical usage and can be adapted for different settings.# device setting
device: "cuda:0"
cali_img_path: "data/cali_dataset" # path of calibration dataset
basic_config: # basic config for OSEDiff inference process
seed: 42
precision: "autocast" # "full", "autocast"
upscale: 4
process_size: 512
scale: 9.0
lora_weights_path: preset/models/osediff.ckpt # OSEDiff ckpt path
pretrained_model_name_or_path: hf-models/sd21 # stable diffusion path
config: hf-models/ldm_Config/stable-diffusion/intel/v2-inference-v-fp32.yaml
ckpt: hf-models/sd21/v2-1_512-ema-pruned.ckpt # stable diffusion ckpt path
context_embedding_path: preset/models/empty_context_embedding.pt # empty text embedding path
align_method: "nofix" # 'wavelet', 'adain', 'nofix'
merge_lora: True # merge lora into weight
quantize_config:
quantize: True # quantize or not
only_Unet: True # only quantize Unet or quantize both Unet and Vae
Unet: # quantize setting for U-net
quantype: PTQ # don't change
method: saw_sep # name of method
only_weight: False # weight only quantization method
weight_quant_bits: 8
weight_sym: False # weight quantization asymmetrical or not
weight_sign: False # weight quantiztion sign or not
act_quant_bits: 8
act_sign: False # act quantiztion sign or not
act_sym: False # act quantization asymmetrical or not
split: True # half split for activation
layer_type: 2Dquant # two quantizer types (2Dquant and normal_quant)
s_alpha: 0.3 # scale factor intialization exponents
Vae:
quantype: PTQ
method: saw
only_weight: False
weight_quant_bits: 8
weight_sym: False
weight_sign: False
act_quant_bits: 8
act_sign: False
act_sym: False
split: True
layer_type: 2Dquant
output_modelpath: results/quantize/saw_sep/UV/W8A8 # output path
# calibration settings
cali_batch_size: 4
cali_learning_rate: 1e-5
cali_epochs: 2
loss_function: mse
scheduler:
milestones: [1]
gamma: 0.1
save_interval: 2Use the following command to run inference with quantized models. The pipeline supports various datasets (e.g., DIV2K_val, RealSR, DRealSR) and includes options for tiling, LoRA merging.
# Reprodue the W8A8 results of Table 2 in the main paper.
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/UV/saw_U_W8A8_V_W8A8.yaml
# Reprodue the W6A6 results of Table 2 in the main paper.
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/UV/saw_U_W6A6_V_W6A6.yaml
# Reprodue the W8A8 results of Table 1 in the supplementary material.
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/U/saw_W8A8.yaml
# Reprodue the W6A6 results of Table 1 in the supplementary material.
CUDA_VISIBLE_DEVICES="0" python inference_single.py --config scripts/inference/config/saw_sep/U/saw_W8A8.yamlπ§ Inference Configuration Example:
The example YAML config demonstrates typical usage and can be adapted for different settings.# device setting
device: cuda:0
out_dir: results/quantize/saw_sep/U/W8A8 # output path
# dataset to inference, set detailed dataset path in preset/data_construct.py
dataset: DIV2K_val # ["DIV2K_val", "RealSR", "DRealSR"]
basic_config:
seed: 42
precision: "autocast" # ["full", "autocast"]
process_size: 512
config: hf-models/ldm_Config/stable-diffusion/intel/v2-inference-v-fp32.yaml
ckpt: hf-models/sd21/v2-1_512-ema-pruned.ckpt
lora_weights_path: preset/models/osediff.ckpt
pretrained_model_name_or_path: hf-models/sd21
context_embedding_path: preset/models/empty_context_embedding.pt
upscale: 4
align_method: adain # ['wavelet', 'adain', 'nofix']
merge_lora: True
# scale: 9.0
# tile setting
tile_config:
vae_decoder_tiled_size: 224
vae_encoder_tiled_size: 1024
latent_tiled_size: 64
latent_tiled_overlap: 32
# quantize config
quantize_config:
quantize: True
only_Unet: True
Unet: # keep same with quantize config
quant_ckpt: weights/U_W8A8/PTQ/unet_ckpt_merge_saw_sep.pth # Unet quantize ckpt path
quantype: PTQ
method: saw
only_weight: False
weight_quant_bits: 8
weight_sym: False
weight_sign: False
act_quant_bits: 8
act_sign: False
act_sym: False
split: True
layer_type: 2Dquant
s_alpha: 0.3
Evaluate model performance by comparing super-resolution outputs against high-resolution ground truth images:
CUDA_VISIBLE_DEVICES="0" python measure.py -i YOUR_IMAGE_PATH -r HR_IMAGE_PATHThis script computes the image quality metrics presented in paper to assess the effectiveness of quantized inference.
PassionSR significantly out-performs previous methods at the setting of W8A8 and W6A6.
Detailed results can be downloaded at Google Drive.
We would like to thank the developers and maintainers of Stable Diffusion, Diffusers, and OSEDiff for their open-source contributions, which have greatly facilitated our research and development.
This project is supported in part by the Shanghai Jiao Tong University Artificial Intelligence Institute.
We also thank our collaborators and contributors for their valuable feedback and technical discussions.
@inproceedings{zhu2025passionsr,
title={{PassionSR}: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution},
author={Zhu, Libo and Li, Jianze and Qin, Haotong and Zhang, Yulun and Guo, Yong and Yang, Xiaokang},
booktitle={CVPR},
year={2025}
}












