FVDM

Official Code for Paper Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach

Authors: Yaofang Liu, Yumeng REN, Xiaodong Cun, Aitor Artola, Yang Liu, Tieyong Zeng, Raymond H. Chan, Jean-michel Morel

FVDM (Frame-aware Video Diffusion Model) introduces a novel vectorized timestep variable (VTV) to revolutionize video generation, addressing limitations in current video diffusion models (VDMs). Unlike previous VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior quality in generated videos, overcoming challenges such as catastrophic forgetting during fine-tuning and limited generalizability in zero-shot methods.

Highlights

Vectorized Timestep Variable (VTV) for fine-grained temporal modeling
Great flexibility across a wide range of video generation tasks (in a zero-shot way)
Superior quality in generated videos
No additional computation cost during training and inference

Demos

With different VTV configurations, FVDM can be extended to numerous tasks (in a zero-shot way).

Below are FVDM generated videos w.r.t. datasets FaceForensics, SkyTimelapse, Taichi-HD, and UCF101. Note that the models/checkpoints are the same across different tasks (reflects strong zero-shot capabilities).

demo.mp4

Setup

git clone https://github.com/Yaofang-Liu/FVDM.git
cd FVDM
conda env create -f environment.yml
conda activate latte

Code Structure

.
├── configs/                # Training and sampling configurations
│   ├── ffs/               # FaceForensics configs
│   ├── sky/               # SkyTimelapse configs
│   ├── taichi/            # Taichi-HD configs
│   ├── ucf101/            # UCF101 configs
│   └── t2v/               # Text-to-Video configs
├── datasets/              # Dataset loaders
├── diffusers/             # Diffusion model components
├── diffusion/             # Gaussian diffusion utilities
├── models/                # Model architectures
├── sample/                # Sampling scripts
├── tools/                 # Evaluation metrics (FVD, FID, IS)
├── train_scripts/         # Training shell scripts
├── train.py               # Base training script
├── train_video.py         # Video training script
└── train_with_img.py      # Video-image joint training script

Training

To train FVDM on different datasets:

# FaceForensics
bash train_scripts/ffs_train_video.sh

# SkyTimelapse
bash train_scripts/sky_train_video.sh

# Taichi-HD
bash train_scripts/taichi_train_video.sh

# UCF101
bash train_scripts/ucf101_train_video.sh

Or use torchrun directly:

torchrun --nnodes=1 --nproc_per_node=N train_video.py --config ./configs/ffs/ffs_train_video.yaml

Sampling

To generate videos:

# FaceForensics
bash sample/ffs_video.sh

# SkyTimelapse
bash sample/sky_video.sh

# Taichi-HD
bash sample/taichi_video.sh

# UCF101
bash sample/ucf101_video.sh

# Text-to-Video
bash sample/t2v.sh

Evaluation

We provide evaluation scripts for FVD, FID, and IS metrics:

bash tools/eval_metrics_ucf101.sh
bash tools/eval_metrics_taichi.sh

Citation

If you find our work useful, please consider citing:

@misc{liu2024redefiningtemporalmodelingvideo,
      title={Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach},
      author={Yaofang Liu and Yumeng Ren and Xiaodong Cun and Aitor Artola and Yang Liu and Tieyong Zeng and Raymond H. Chan and Jean-michel Morel},
      year={2024},
      eprint={2410.03160},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.03160},
}

Acknowledgments

This implementation is built upon Latte. We thank the authors for their excellent work.

Contact

For any questions or feedback, please contact yaofanliu2-c@my.cityu.edu.hk.

License

See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FVDM

Highlights

Demos

Setup

Code Structure

Training

Sampling

Evaluation

Citation

Acknowledgments

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
configs		configs
datasets		datasets
diffusers		diffusers
diffusion		diffusion
eval		eval
ldm		ldm
models		models
sample		sample
tools		tools
train_scripts		train_scripts
LICENSE		LICENSE
Pipeline.png		Pipeline.png
README.md		README.md
Teaser.png		Teaser.png
environment.yml		environment.yml
train.py		train.py
train_video.py		train_video.py
train_with_img.py		train_with_img.py
utils.py		utils.py

License

Yaofang-Liu/FVDM

Folders and files

Latest commit

History

Repository files navigation

FVDM

Highlights

Demos

Setup

Code Structure

Training

Sampling

Evaluation

Citation

Acknowledgments

Contact

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages