Hi, thank you for releasing your code and pre-trained models. I've been trying to reproduce the results from your paper on FFHQ and CelebA-HQ but am observing significant discrepancies from the reported performance. I wanted to document my findings in case there's something I'm missing or a known issue.
Environment & Setup
- Repository commit:
eb4e77c84d00678e409343b52d9804b8ff31f467 (retrieved 2026-01-14)
- Pre-trained models: Iteration 700,000 checkpoints downloaded from BaiduCloud (as linked in README)
- Test data: FFHQ and CelebA-HQ with mixed contamination types (scribbles + rectangular image patches)
Observed Issues
I've attached a comparison image showing Input → GT → Predicted Mask → Output → Predicted Output columns across 4 test samples.

1. Complete Failure on Rectangular Occlusions
The model fails entirely on square/rectangular image patches (rows 2 & 4 in attached image):
- The calendar text obstruction remains fully visible in the output
- The predicted mask localizes only a small central region rather than the full occlusion
- The 'Output column shows extreme noise (scattered white pixels across the entire image), suggesting the mask estimation has collapsed
This contrasts with the paper's Table 3 results showing strong performance on "Image Occlusion" contamination patterns.
2. Poor Texture Quality on Successful Detections
Even where the model correctly identifies contamination (scribble rows 1 & 3):
- Inpainted regions exhibit over-smoothing / "plastic skin" artifacts
- Loss of high-frequency detail (pores, skin texture) compared to GT
- Color blending issues leaving visible discoloration blobs (especially around mouth/chin areas)
3. Facial Feature Reconstruction
- Lips appear undefined and blurry when occluded
- Eye reconstruction shows asymmetry and lack of definition
Questions
- Are the BaiduCloud checkpoints the same ones used to generate the paper's quantitative results?
- Is there specific preprocessing required for the test images beyond resizing to 256×256?
- Were the paper results obtained with a different contamination synthesis pipeline than what's in the released code?
- Any known issues with certain contamination types (solid rectangles vs. irregular masks)?
Attached
comparison.jpg: Side-by-side comparison showing the issues described above
I'd appreciate any guidance on reproducing the reported results. Happy to provide additional details or test specific configurations if helpful.
Thanks for your time!
Hi, thank you for releasing your code and pre-trained models. I've been trying to reproduce the results from your paper on FFHQ and CelebA-HQ but am observing significant discrepancies from the reported performance. I wanted to document my findings in case there's something I'm missing or a known issue.
Environment & Setup
eb4e77c84d00678e409343b52d9804b8ff31f467(retrieved 2026-01-14)Observed Issues
I've attached a comparison image showing Input → GT → Predicted Mask → Output → Predicted Output columns across 4 test samples.
1. Complete Failure on Rectangular Occlusions
The model fails entirely on square/rectangular image patches (rows 2 & 4 in attached image):
This contrasts with the paper's Table 3 results showing strong performance on "Image Occlusion" contamination patterns.
2. Poor Texture Quality on Successful Detections
Even where the model correctly identifies contamination (scribble rows 1 & 3):
3. Facial Feature Reconstruction
Questions
Attached
comparison.jpg: Side-by-side comparison showing the issues described aboveI'd appreciate any guidance on reproducing the reported results. Happy to provide additional details or test specific configurations if helpful.
Thanks for your time!