Skip to content

Latest commit

 

History

History
183 lines (162 loc) · 6.72 KB

File metadata and controls

183 lines (162 loc) · 6.72 KB

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs (ICCV 2025 Highlight)

How to evaluate Text to Image Generation Model properly?

pipeline

T2V Model Ranks

modelrank

LMM-T2V Models

Models

LMM-VQA Models

LMMs

EvalMi-50K Download

huggingface-cli download IntMeGroup/EvalMi-50K --repo-type dataset --local-dir ./EvalMi-50K

🛠️ Installation

Clone this repository:

git clone https://github.com/IntMeGroup/LMM4LMM.git

Create a conda virtual environment and activate it:

conda create -n LMM4LMM python=3.9 -y
conda activate LMM4LMM

Install dependencies using requirements.txt:

pip install -r requirements.txt

Install flash-attn==2.3.6

pip install flash-attn==2.3.6 --no-build-isolation

Alternatively you can compile from source:

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install

Alternatively if you are cuda12 you can use the packed env from

huggingface-cli download IntMeGroup/env LMM4LMM.tar.gz --repo-type dataset --local-dir /home/user/anaconda3/envs
mkdir -p /home/user/anaconda3/envs/LMM4LMM
tar -xzf LMM4LMM.tar.gz -C /home/user/anaconda3/envs/LMM4LMM

🌈 Training

Preparation

huggingface-cli download IntMeGroup/EvalMi-50K/data --repo-type dataset --local-dir ./data

for stage1 training (Text-based quality levels)

sh shell/train_stage1.sh

for stage2 training (Fine-tuning the vision encoder and LLM with LoRA)

sh shell/train_stage2.sh

for quastion-answering training (QA)

sh shell/train_qa.sh

🌈 Evaluation

Download the pretrained weights

huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2
huggingface-cli download IntMeGroup/LMM4LMM-QA --local-dir ./weights/qa

for perception and correspondence score evaluation (Scores)

sh shell/eval_scores.sh

for quastion-answering evaluation (QA)

sh shell/eval_qa.sh

🌈 Inference

Download the pretrained weights

huggingface-cli download IntMeGroup/LMM4LMM-Perception --local-dir ./weights/stage2/stage2_mos1
huggingface-cli download IntMeGroup/LMM4LMM-Correspondence --local-dir ./weights/stage2/stage2_mos2

Configuration File Paths Before running the inference scripts, make sure to modify the paths in the data/infer_mos1.json and data/infer_mos2.json configuration files as follows: Make sure to update the paths accordingly:

root: Path to your root directory where img data is stored.
annotation_infer: Path to the file containing image paths for inference.
img_prompt: Path to the file containing image prompts for inference.

For Perception Score Inference:

sh shell/infer_perception.sh

For T2I Correspondence Score Inference:

sh shell/infer_correspondence.sh

📌 TODO

  • ✅ Release the training code
  • ✅ Release the evaluation code
  • ✅ Release the inference code
  • ✅ Release the EvalMi-50K Database

Quick Access of T2V Models

Model Code/Project Link
SD_v2-1 https://huggingface.co/stabilityai/stable-diffusion-2-1
i-Code-V3 https://github.com/microsoft/i-Code
SDXL_base_1 https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
DALLE3 https://openai.com/index/dall-e-3
LLMGA https://github.com/dvlab-research/LLMGA
Kandinsky-3 https://github.com/ai-forever/Kandinsky-3
LWM https://github.com/LargeWorldModel/LWM
Playground https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic
LaVi-Bridge https://github.com/ShihaoZhaoZSH/LaVi-Bridge
ELLA https://github.com/TencentQQGYLab/ELLA
Seed-xi https://github.com/AILab-CVC/SEED-X
PixArt-sigma https://github.com/PixArt-alpha/PixArt-sigma
LlamaGen https://github.com/FoundationVision/LlamaGen
Kolors https://github.com/Kwai-Kolors/Kolors
Flux_schnell https://huggingface.co/black-forest-labs/FLUX.1-schnell
Omnigen https://github.com/VectorSpaceLab/OmniGen
EMU3 https://github.com/baaivision/Emu
Vila-u https://github.com/mit-han-lab/vila-u
SD3_5_large https://huggingface.co/stabilityai/stable-diffusion-3.5-large
Show-o https://github.com/showlab/Show-o
Janus https://github.com/deepseek-ai/Janus
Hart https://github.com/mit-han-lab/hart
NOVA https://github.com/baaivision/NOVA
Infinity https://github.com/FoundationVision/Infinity

📧 Contact

If you have any inquiries, please don't hesitate to reach out via email at wangjiarui@sjtu.edu.cn

🎓Citations

If you find our work useful, please cite our paper as:

@misc{wang2025lmm4lmmbenchmarkingevaluatinglargemultimodal,
      title={LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs}, 
      author={Jiarui Wang and Huiyu Duan and Yu Zhao and Juntong Wang and Guangtao Zhai and Xiongkuo Min},
      year={2025},
      eprint={2504.08358},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.08358}, 
}
@InProceedings{Wang_2025_ICCV,
    author    = {Wang, Jiarui and Duan, Huiyu and Zhao, Yu and Wang, Juntong and Zhai, Guangtao and Min, Xiongkuo},
    title     = {LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {17312-17323}
}