Skip to content

zyhbili/MV-Performer

Repository files navigation

MV-Performer

[SIGGRAPH Asia 2025] The official repo for the conference paper "MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis". [Paper]

teaser

Installation

# Create and activate conda environment
conda create -n mvperformer python=3.10 -y
conda activate mvperformer
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r ./requirements.txt
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.8+pt2.6.0cu124 
pip install -e ./DiffSynth-Studio
#  Install ffmpeg

Model

The WAN model would download automaticly in to wan_models, If your have downloaded the Wan2.1-T2V-1.3B model, please link it to wan_models/Wan-AI/Wan2.1-T2V-1.3B.

We provide our dit checkpoints on OneDrive. Please download it and put it into checkpoints/mv-performer/dit/diffusion_pytorch_model.bin:

wget https://cuhko365-my.sharepoint.com/:u:/g/personal/223010099_link_cuhk_edu_cn/IQBPNKjkFpu_RJ1sMhNBk8-GAaaWCkEPzKd28Qn3dvu1ANs?download\=1 -O ./checkpoints/mv-performer/dit/diffusion_pytorch_model.bin

Evaluation

1. Prepare the validation dataset

We have uploaded the validation set to OneDrive, which includes the raw data of 10 MVHumanNet actors and 10 DNA-Rendering actors, along with their extracted video latents. Please download each {human_id}.zip to either the data/val_data/dna or data/val_data/mvhuman directory and unzip them. The val_data has the following folder structure:

├── dna / mvhuman
│   ├── {human_id}
│   │   ├── cam.pkl
│   │   ├── crop_gt
│   │   ├── depths
│   │   ├── images
│   │   ├── masks
│   │   ├── partial_render
│   │   ├── smpl_mesh
│   │   └── smpl_params
│   ├── .....
├── val_cache
│   ├── dna
│   ├── mvhuman

If you want to contruct more cases, please refer to this for more details.

2. Run validation from Command Line

To compute FVD, we need to download

# Generative novel view results
python val.py --data_type dna 
python val.py --data_type mvhuman
# The results would be stored in the ./outputs/val_results 

To compute FVD, we need to download i3d_pretrained_400.pt

wget https://raw.githubusercontent.com/SongweiGe/TATS/main/tats/fvd/i3d_pretrained_400.pt -O ./checkpoints/fvd/i3d_pretrained_400.pt

Run the evaluation.

# Compute metric on MVHumanNet
python evaluation.py --data_root outputs/val_results/mvhuman/dit_step50 --gt_root data/val_data/mvhuman
# Compute metric on DNA-Render
python evaluation.py --data_root outputs/val_results/dna/dit_step50 --gt_root data/val_data/dna

Inference with monocular video

We put the processed monocular video here, please download and unzip it into data/wild_data/:

wget https://cuhko365-my.sharepoint.com/:u:/g/personal/223010099_link_cuhk_edu_cn/IQDtQUpCyG3FSbnPx8ARGPXDAU2Oq5n5kYVRdYALmn-G900\?download\=1 -O data/test_data.zip
# Unzip
unzip data/test_data.zip -d data/test_data

Inference

python infer.py --vid_name vid01
# The results would be stored in the ./outputs/wild_results 

Process in-the-wild video

The process scripts rely on many third-party prior models, and we are working on cleaning the code. Stay tuned.

Acknowledgement

We thank the authors of CogVideoX, SynCamMaster, DiffSynth-Studio, MonoSDF, ViewCrafter, Pi3, SAMURAI, MoGe, and so on, for their great works. We use their code in our our project.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{zhi2025mv,
  title={MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis},
  author={Zhi, Yihao and Li, Chenghong and Liao, Hongjie and Yang, Xihe and Sun, Zhengwentai and Chang, Jiahao and Cun, Xiaodong and Feng, Wensen and Han, Xiaoguang},
  booktitle={Proceedings of the SIGGRAPH Asia 2025 Conference Papers},
  pages={1--14},
  year={2025}
}

About

[SIGGRAPH Asia 2025] The official repo for the conference paper "MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages