Skip to content

Inference at lower resolutions #41

@danier97

Description

@danier97

Thanks for open sourcing this project - it's really helpful!

I wanted to evaluate your checkpoint at a lower resolution (default is 1280x704, I try to evaluate at 576x320), and to make your gen3c_single_image.py run, I added on this line self.model.state_shape=[16,16,40,72] which I believe correspond to the latent shape at the lower resolution. Then I ran the following command, specifying the height and width arguments.

CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/gen3c_single_image.py   \
  --checkpoint_dir checkpoints  \
   --input_image_path assets/diffusion/000000.png  \
   --video_save_name test_single_image     --guidance 1  \
   --foreground_masking --trajectory clockwise \
   --num_steps 25 \
   --height 320 \
   --width 576 \
   --offload_diffusion_transformer \
   --offload_tokenizer \
   --offload_text_encoder_model \
   --offload_prompt_upsampler \
   --offload_guardrail_models \
   --disable_guardrail \
   --disable_prompt_encoder

And the video below is the result I got. It follows the overall desired trajectory, but has obvious artefacts around the edges and some disoccluded parts. I wonder:

  1. If the way I am inferring at lower resolutions looks correct to you;
  2. If so, are the artefacts expected from the model at lower resolutions;
  3. If so, how would you suggest I evaluate the checkpoint on low resolution benchmarks, e.g. RE10K at 576x320?

Thank you a lot for your help!

test_single_image.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions