[SIGGRAPH Asia 2025] The official repo for the conference paper "MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis". [Paper]
# Create and activate conda environment
conda create -n mvperformer python=3.10 -y
conda activate mvperformer
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r ./requirements.txt
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.8+pt2.6.0cu124
pip install -e ./DiffSynth-Studio
# Install ffmpegThe WAN model would download automaticly in to wan_models,
If your have downloaded the Wan2.1-T2V-1.3B model, please link it to wan_models/Wan-AI/Wan2.1-T2V-1.3B.
We provide our dit checkpoints on OneDrive. Please download it and put it into checkpoints/mv-performer/dit/diffusion_pytorch_model.bin:
wget https://cuhko365-my.sharepoint.com/:u:/g/personal/223010099_link_cuhk_edu_cn/IQBPNKjkFpu_RJ1sMhNBk8-GAaaWCkEPzKd28Qn3dvu1ANs?download\=1 -O ./checkpoints/mv-performer/dit/diffusion_pytorch_model.binWe have uploaded the validation set to OneDrive, which includes the raw data of 10 MVHumanNet actors and 10 DNA-Rendering actors, along with their extracted video latents. Please download each {human_id}.zip to either the data/val_data/dna or data/val_data/mvhuman directory and unzip them. The val_data has the following folder structure:
├── dna / mvhuman
│ ├── {human_id}
│ │ ├── cam.pkl
│ │ ├── crop_gt
│ │ ├── depths
│ │ ├── images
│ │ ├── masks
│ │ ├── partial_render
│ │ ├── smpl_mesh
│ │ └── smpl_params
│ ├── .....
├── val_cache
│ ├── dna
│ ├── mvhuman
If you want to contruct more cases, please refer to this for more details.
To compute FVD, we need to download
# Generative novel view results
python val.py --data_type dna
python val.py --data_type mvhuman
# The results would be stored in the ./outputs/val_results To compute FVD, we need to download i3d_pretrained_400.pt
wget https://raw.githubusercontent.com/SongweiGe/TATS/main/tats/fvd/i3d_pretrained_400.pt -O ./checkpoints/fvd/i3d_pretrained_400.pt
Run the evaluation.
# Compute metric on MVHumanNet
python evaluation.py --data_root outputs/val_results/mvhuman/dit_step50 --gt_root data/val_data/mvhuman
# Compute metric on DNA-Render
python evaluation.py --data_root outputs/val_results/dna/dit_step50 --gt_root data/val_data/dnaWe put the processed monocular video here, please download and unzip it into data/wild_data/:
wget https://cuhko365-my.sharepoint.com/:u:/g/personal/223010099_link_cuhk_edu_cn/IQDtQUpCyG3FSbnPx8ARGPXDAU2Oq5n5kYVRdYALmn-G900\?download\=1 -O data/test_data.zip
# Unzip
unzip data/test_data.zip -d data/test_datapython infer.py --vid_name vid01
# The results would be stored in the ./outputs/wild_results The process scripts rely on many third-party prior models, and we are working on cleaning the code. Stay tuned.
We thank the authors of CogVideoX, SynCamMaster, DiffSynth-Studio, MonoSDF, ViewCrafter, Pi3, SAMURAI, MoGe, and so on, for their great works. We use their code in our our project.
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{zhi2025mv,
title={MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis},
author={Zhi, Yihao and Li, Chenghong and Liao, Hongjie and Yang, Xihe and Sun, Zhengwentai and Chang, Jiahao and Cun, Xiaodong and Feng, Wensen and Han, Xiaoguang},
booktitle={Proceedings of the SIGGRAPH Asia 2025 Conference Papers},
pages={1--14},
year={2025}
}
