DP²O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution (NeurIPS 2025)
Tianhe Wu2,3 Qiaosi Yi1,2 Shuai Li1 Lei Zhang1,2,†
We will release:
[ ] Testing code[ ] Pretrained checkpoints:C-SD2,C-FLUX
[ ] DP²O-SR fine-tuned models:DP²O-SR(SD2),DP²O-SR(FLUX)
- Training code (diffusion-based & flow-based)
- Training datasets & IQA reward labels
DP²O-SR post-trains generative SR models to better match human perceptual preferences,
by optimizing over diverse outputs (sampled via noise) using IQA-based rewards, without requiring human annotations during training.
DP²O-SR achieves strong perceptual gains in just 500 training steps, outperforming powerful baselines like SeeSR and OSEDiff.
Visual comparison: different reward types
Combines full-reference (fidelity) and no-reference (realism) IQA metrics to guide training with a hybrid reward.
Specific details
Unlike the conventional best-vs-worst strategy, we rank the generated outputs for each input and retain only the top-N and bottom-N samples to form positive and negative sets. Preference pairs are then constructed between these subsets, avoiding uncertain middle samples and leading to richer supervision and more stable training.Sampling strategy trend
The optimal sampling strategy depends on model capacity:
- Small models prefer broader coverage (e.g. 1/4)
- Large models learn better with stronger contrast (e.g. 1/16)
Figure: Sampling strategy trend on C-SD2 (left) and C-FLUX (right).
Adaptive intra/inter-group weighting
We adaptively weight each preference pair:
- Intra-group: favor larger reward gaps
- Inter-group: prioritize diverse candidate groups
Best@M / Mean@M / Worst@M curves
We explore how perceptual quality varies with the number of sampled outputs M per input, where M increases exponentially from 1 to 64 (i.e., M = 2ⁿ).
Key findings:
- Best@M increases with M — higher perceptual peaks observed
- Worst@M drops in baselines, but improves significantly with DP²O-SR
- Mean@M stays relatively stable — but still benefits slightly from our approach
This shows that DP²O-SR not only improves average perceptual quality but more importantly raises the quality floor, resulting in more consistent and robust outputs across different seeds.
Figure: Perceptual reward curves across varying sample numbers (M). Left: C-SD2; Right: C-FLUX.
Examples of local detail enhancement
DP²O-SR leads to localized visual improvements, even though training is guided by global IQA rewards only.
- Seed sensitivity remains: Even within the same model, different random seeds cause variations in local structures (e.g., wing textures, insect eyes).
- Same-seed refinement: Under the same seed, DP²O-SR outputs consistently show sharper and more accurate textures than the baseline (e.g., clearer wing venation).
- Global-to-local effect: These refinements emerge without any explicit local supervision, suggesting the model learns to enhance perceptually salient regions.
Figure: DP²O-SR enhances local details (e.g., wing structure, red box) while preserving stable regions (e.g., head reflections, green box).
## git clone this repository
git clone https://github.com/cswry/DP2O-SR.git
cd DP2O-SR
# create an environment with python >= 3.8
conda create -n dp2osr python=3.8
conda activate dp2osr
pip install -r requirements.txt
- Download the pretrained FLUX.1-dev from Hugging Face.
- Since the official SD2-base has been taken down, we have hosted it together with C-FLUX, DP²O-FLUX, C-SD2, and DP²O-SD2 on Google Drive.
- Additionally, the RAM and DAPE models (used for extracting tag-style text prompts) are also available on Google Drive.
You can put the models into preset/models.
You can put the testing images in the preset/test_inp.
# C-SD2
accelerate launch test_sd2b_controlnet.py \
--pretrained_model_name_or_path "preset/models/stable-diffusion-2-base" \
--controlnet_model_name_or_path "preset/models/c-sd2/model.safetensors" \
--image_path "preset/test_inp" \
--output_dir "preset/test_oup_c_sd2" \
--align_method "adain" \
--ram_path "preset/models/ram_swin_large_14m.pth" \
--dape_path "preset/models/DAPE.pth" \
--guidance_scale 3.5 \
--num_inference_steps 50 \
--mixed_precision "fp16"
# DP²O-SD2
accelerate launch test_sd2b_controlnet.py \
--pretrained_model_name_or_path "preset/models/stable-diffusion-2-base" \
--controlnet_model_name_or_path "preset/models/dp2o-sd2/model.safetensors" \
--image_path "preset/test_inp" \
--output_dir "preset/test_oup_dp2o_sd2" \
--align_method "adain" \
--ram_path "preset/models/ram_swin_large_14m.pth" \
--dape_path "preset/models/DAPE.pth" \
--guidance_scale 3.5 \
--num_inference_steps 50 \
--mixed_precision "fp16"
# C-FLUX
accelerate launch test_flux_controlnet.py \
--pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
--controlnet_model_name_or_path "preset/models/c-flux/model.safetensors" \
--image_path "preset/test_inp" \
--output_dir "preset/test_oup_c_flux" \
--align_method "adain" \
--ram_path "preset/models/ram_swin_large_14m.pth" \
--dape_path "preset/models/DAPE.pth" \
--num_double_layers 4 \
--num_single_layers 0 \
--guidance_scale 2.5 \
--num_inference_steps 25 \
--mixed_precision "fp16"
# DP²O-FLUX
accelerate launch test_flux_controlnet.py \
--pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
--controlnet_model_name_or_path "preset/models/dp2o-flux/model.safetensors" \
--image_path "preset/test_inp" \
--output_dir "preset/test_oup_dp2o_flux" \
--align_method "adain" \
--ram_path "preset/models/ram_swin_large_14m.pth" \
--dape_path "preset/models/DAPE.pth" \
--num_double_layers 4 \
--num_single_layers 0 \
--guidance_scale 2.5 \
--num_inference_steps 25 \
--mixed_precision "fp16"
@inproceedings{wu2025dp2osr,
title = {DP²O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution},
author = {Wu, Rongyuan and Sun, Lingchen and Zhang, Zhengqiang and Wang, Shihao and Wu, Tianhe and Yi, Qiaosi and Li, Shuai and Zhang, Lei},
journal={arXiv preprint arXiv:2510.18851},
year = {2025}
}






