This repository provides the official PyTorch implementation of the following paper:
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
[2026.2.10] 🚀 This project page has been built!
- Release the M$^3$-Bench dataset
- Release the evaluation code of RADAR
If you want to use our codebase for reproduction, you are recommended to build a new environment though the steps below.
We take LLaVA-OneVision as an example: (The following steps are just listed for Linux. If you are using macOS or Windows, please refer to LLaVA-NeXT)
- Clone this repository and navigate to RADAR folder
git clone https://github.com/Nieysh/RADAR.git
cd RADAR
- Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install -r requirements.txt
python -m pip install --upgrade pip # enable PEP 660 support
cd LLaVA-NeXT
python -m pip install -e .
pip install git+https://github.com/huggingface/transformers@745bbfe4bb2b61491dedd56e1e8ee4af8ef1a9ec
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
conda install nvidia/label/cuda-12.1.1::cuda --channel nvidia/label/cuda-12.1.1
To evaluate other MLLMs for reproduction, please refer to Qwen2-VL, InternVL-3.5 for environment installation.
Please follow the instructions below to prepare the checkpoints and data in directories:
(Take LLaVA-OneVision-0.5B (projector) as an example)
- Download pretrained LLaVA-OneVision weight from here.
- Download Qwen2-0.5B-Instruct model weight from here.
- Download SigLIP-so400m-patch14-384 model weight from here.
For other models, refer to their huggingface respositories for downloading the pretrained weights:
| Model | HF Link |
|---|---|
| InternVL3.5-1B-Pretrained | 🤗 link |
| InternVL3.5-2B-Pretrained | 🤗 link |
| InternVL3.5-4B-Pretrained | 🤗 link |
| InternVL3.5-8B-Pretrained | 🤗 link |
| InternVL3.5-14B-Pretrained | 🤗 link |
| Qwen2-VL-2B | 🤗 link |
| Qwen2-VL-7B | 🤗 link |
- Download M$^3$-Bench dataset from here and unzip it to
YOUR/PATH/TO/M3-BENCH/DATA.
YOUR/PATH/TO/M3-BENCH/DATA
├─ general_visual_question_answering
| ├─ images
| ├─ MMBench_action_recognition_54_pretrained_mllm_eval.json
| ├─ MMBench_attribute_comparison_44_pretrained_mllm_eval.json
| ├─ .json data files of other tasks
├─ mathematical_reasoning
| ├─ images
| ├─ MathVista_1000_pretrained_mllm_eval.json
├─ Wiki_animal_identification
| ├─ images
| ├─ Wiki_animal_identification_2000_pretrained_mllm_eval.jsonTo reproduce the RADAR implementation on this codebase, you can follow these steps:
- Specify the
data_dir,datasetandmodel_pathin the script for RADAR calculation. (Also specifymodel_baseandvision_towerfor LLaVA-OneVision projectors) - Run the script to conduct RADAR evaluation for different models.
For LLaVA-OneVision (projectors):
bash radar_eval_llava_ov.shFor Qwen2-VL:
bash radar_eval_qwen2l.shFor InternVL-3.5:
bash radar_eval_internvl3_5.shThis repo is based on the codebase of LLaVA, Qwen2-VL, and InternVL-3.5. Thanks for their impressive works!
If you find this work useful for your research, please cite our paper:
@article{nie2026radar,
title={RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training},
author={Nie, Yunshuang and Lin, Bingqian and Niu, Minzhe and Xiang, Kun and Han, Jianhua and Huang, Guowei and Quan, Xingyue and Xu, Hang and Chen, Bokui and Liang, Xiaodan},
journal={arXiv preprint arXiv:2602.12892},
year={2026}
}