Thank you for your excellent work and for releasing the demos and code.
I have a question regarding the image-to-video (I2V) comparisons shown in the paper/demo. In some of the examples, the first frame appears to be identical (or very similar) to those produced by methods like CausVid and Self-Forcing, which makes me wonder:
1. Did you perform direct comparisons with CausVid and Self-Forcing under the same I2V setting?
2. If so, would it be possible to share the exact configs and checkpoints used for I2V when running CausVid and Self-Forcing for comparison?
Access to the specific checkpoints (or configuration details) would greatly help me.
Thank you very much for your time and clarification!