PyTorch implementation of the paper based on enhancing wide-angle images by incorporating details from the narrow-angle shot of the same scene using cross-view attention mechanism.
Paper | Dataset | Models | Supplementary Material
We propose a novel method in this paper that infuses wider shots with finer quality details that is usually associated with an image captured by the primary lens by capturing the same scene using both narrow and wide field of view (FoV) lenses. We do so by training a Generative Adversarial Network (GAN)-based model to learn to extract the visual quality parameters from a narrow-angle shot and to transfer these to the corresponding wide-angle image of the scene using residual connections and an attention-based fusion module. We have mentioned in details the proposed technique to isolate the visual essence of an image and to transfer it into another image.
To run the project locally, clone the repository and,
- Model Checkpoints
- Get pretrained checkpoints for the Generator and Discriminator from Google Drive.
- Under the
Modelsdirectory, create a new directory namedcheckpoints. - Paste the checkpoints (
.pthfiles) inside the checkpoints directory.
- Dataset
- This model is trained on the Landscape HQ dataset.
- Original dataset can be downloaded from Kaggle.
- Images from the dataset need to be preprocessed before being useful as training data.
- We have prepared the preprocessing pipeline in
preprocess.pyand can be run with. - However, we have preprocessed 500 images for pug and play purpose and can be downloaded from Google Dirve. In this case, the already existing
Datasetsdirectory needs to be replaced by the downloaded one.
- Hyperparameters
- All necessary hyperparameters and environmental variables are stored in the
config.jsonfile.
- All necessary hyperparameters and environmental variables are stored in the
- Training
- All training hyperparameters e.g. number of epochs, number of images per epoch, learning rate and so on are loaded from the config.json file and can be tuned likewise.
- Training data are read from Dataset directory and must have three seperate directories inside, namely
narrow,original, andwide. - Training program can be initiated with the following command
python train .py
- Checkpoints are saved every 10 epochs
- Inference
- Inference required the
generator.pthcheckpoint in theModel/checkpoints/directory. - For inference, the input image must initially be preprocessed as follows,
python preprocess .py --single {path/to/file.ext}
- Preprocessed images are saved in
Dataset/uploads/directory. - Preprocessing creates a wide (noisy, 2x downsscaled) and a narrow (limited field of view) field of view image from the given base image and the idea is to 2x enhance the simulated wide FoV image.
- In case of real-life ultra-wide and primary lens image pairs, the preprocessing step is not necessary.
- To perform inference we need a pair of images, namely the wide and narrow FoV images and can be done with the following command,
python infer .py ./Dataset/uploads/{file_wide.ext} ./Dataset/uploads/{file_narrow.ext}
- The output image is saved at
Outputsfolder.
- Inference required the