Skip to content

cswry/DP2O-SR

Repository files navigation

DP²O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution (NeurIPS 2025)

1The Hong Kong Polytechnic University   2OPPO Research Institute   3City University of Hong Kong
Corresponding author

📄 Paper

📋 Todo List

We will release:

  • [ ] Testing code
  • [ ] Pretrained checkpoints:
    • C-SD2, C-FLUX
  • [ ] DP²O-SR fine-tuned models:
    • DP²O-SR(SD2), DP²O-SR(FLUX)
  • Training code (diffusion-based & flow-based)
  • Training datasets & IQA reward labels

🧠 TL;DR

DP²O-SR post-trains generative SR models to better match human perceptual preferences,
by optimizing over diverse outputs (sampled via noise) using IQA-based rewards, without requiring human annotations during training.

🚀 Fast Convergence with Strong Gains

DP²O-SR achieves strong perceptual gains in just 500 training steps, outperforming powerful baselines like SeeSR and OSEDiff.

🌟 Key Contributions

Balanced Perceptual Reward

Visual comparison: different reward types

Combines full-reference (fidelity) and no-reference (realism) IQA metrics to guide training with a hybrid reward.

Balanced Reward

Multiple Preference Pairs Learning

Specific details Unlike the conventional best-vs-worst strategy, we rank the generated outputs for each input and retain only the top-N and bottom-N samples to form positive and negative sets. Preference pairs are then constructed between these subsets, avoiding uncertain middle samples and leading to richer supervision and more stable training.

Data Curation Strategy

Sampling strategy trend

The optimal sampling strategy depends on model capacity:

  • Small models prefer broader coverage (e.g. 1/4)
  • Large models learn better with stronger contrast (e.g. 1/16)

Figure: Sampling strategy trend on C-SD2 (left) and C-FLUX (right).

Hierarchical Preference Optimization (HPO)

Adaptive intra/inter-group weighting

We adaptively weight each preference pair:

  • Intra-group: favor larger reward gaps
  • Inter-group: prioritize diverse candidate groups

🔍 Interesting Observations

DP²O-SR Improves Output Consistency Across Random Seeds

Best@M / Mean@M / Worst@M curves

We explore how perceptual quality varies with the number of sampled outputs M per input, where M increases exponentially from 1 to 64 (i.e., M = 2ⁿ).

Key findings:

  • Best@M increases with M — higher perceptual peaks observed
  • Worst@M drops in baselines, but improves significantly with DP²O-SR
  • Mean@M stays relatively stable — but still benefits slightly from our approach

This shows that DP²O-SR not only improves average perceptual quality but more importantly raises the quality floor, resulting in more consistent and robust outputs across different seeds.

Figure: Perceptual reward curves across varying sample numbers (M). Left: C-SD2; Right: C-FLUX.

Local Refinement under Global Reward

Examples of local detail enhancement

DP²O-SR leads to localized visual improvements, even though training is guided by global IQA rewards only.

  • Seed sensitivity remains: Even within the same model, different random seeds cause variations in local structures (e.g., wing textures, insect eyes).
  • Same-seed refinement: Under the same seed, DP²O-SR outputs consistently show sharper and more accurate textures than the baseline (e.g., clearer wing venation).
  • Global-to-local effect: These refinements emerge without any explicit local supervision, suggesting the model learns to enhance perceptually salient regions.

Figure: DP²O-SR enhances local details (e.g., wing structure, red box) while preserving stable regions (e.g., head reflections, green box).

⚙️ Dependencies and Installation

## git clone this repository
git clone https://github.com/cswry/DP2O-SR.git
cd DP2O-SR

# create an environment with python >= 3.8
conda create -n dp2osr python=3.8
conda activate dp2osr
pip install -r requirements.txt

🚀 Quick Inference

Step 1: Download the pretrained models

  • Download the pretrained FLUX.1-dev from Hugging Face.
  • Since the official SD2-base has been taken down, we have hosted it together with C-FLUX, DP²O-FLUX, C-SD2, and DP²O-SD2 on Google Drive.
  • Additionally, the RAM and DAPE models (used for extracting tag-style text prompts) are also available on Google Drive.

You can put the models into preset/models.

Step 2: Prepare testing data

You can put the testing images in the preset/test_inp.

Step 3: Running testing command

# C-SD2
accelerate launch test_sd2b_controlnet.py \
    --pretrained_model_name_or_path "preset/models/stable-diffusion-2-base" \
    --controlnet_model_name_or_path "preset/models/c-sd2/model.safetensors" \
    --image_path "preset/test_inp" \
    --output_dir "preset/test_oup_c_sd2" \
    --align_method "adain" \
    --ram_path "preset/models/ram_swin_large_14m.pth" \
    --dape_path "preset/models/DAPE.pth" \
    --guidance_scale 3.5 \
    --num_inference_steps 50 \
    --mixed_precision "fp16" 
    
# DP²O-SD2
accelerate launch test_sd2b_controlnet.py \
    --pretrained_model_name_or_path "preset/models/stable-diffusion-2-base" \
    --controlnet_model_name_or_path "preset/models/dp2o-sd2/model.safetensors" \
    --image_path "preset/test_inp" \
    --output_dir "preset/test_oup_dp2o_sd2" \
    --align_method "adain" \
    --ram_path "preset/models/ram_swin_large_14m.pth" \
    --dape_path "preset/models/DAPE.pth" \
    --guidance_scale 3.5 \
    --num_inference_steps 50 \
    --mixed_precision "fp16" 

# C-FLUX
accelerate launch test_flux_controlnet.py \
    --pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
    --controlnet_model_name_or_path "preset/models/c-flux/model.safetensors" \
    --image_path "preset/test_inp" \
    --output_dir "preset/test_oup_c_flux" \
    --align_method "adain" \
    --ram_path "preset/models/ram_swin_large_14m.pth" \
    --dape_path "preset/models/DAPE.pth" \
    --num_double_layers 4 \
    --num_single_layers 0 \
    --guidance_scale 2.5 \
    --num_inference_steps 25 \
    --mixed_precision "fp16" 
    
# DP²O-FLUX
accelerate launch test_flux_controlnet.py \
    --pretrained_model_name_or_path "black-forest-labs/FLUX.1-dev" \
    --controlnet_model_name_or_path "preset/models/dp2o-flux/model.safetensors" \
    --image_path "preset/test_inp" \
    --output_dir "preset/test_oup_dp2o_flux" \
    --align_method "adain" \
    --ram_path "preset/models/ram_swin_large_14m.pth" \
    --dape_path "preset/models/DAPE.pth" \
    --num_double_layers 4 \
    --num_single_layers 0 \
    --guidance_scale 2.5 \
    --num_inference_steps 25 \
    --mixed_precision "fp16" 

📜 Citation

@inproceedings{wu2025dp2osr,
  title     = {DP²O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution},
  author    = {Wu, Rongyuan and Sun, Lingchen and Zhang, Zhengqiang and Wang, Shihao and Wu, Tianhe and Yi, Qiaosi and Li, Shuai and Zhang, Lei},
  journal={arXiv preprint arXiv:2510.18851},
  year      = {2025}
}

About

[NeurIPS 2025] DP²O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages