Skip to content

Nieysh/RADAR

Repository files navigation

RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training

License: MIT Arxiv

This repository provides the official PyTorch implementation of the following paper:

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

🎯 News

[2026.2.10] 🚀 This project page has been built!

👨‍💻 Todo

  • Release the M$^3$-Bench dataset
  • Release the evaluation code of RADAR

⭐️ TL;DR

1. Installation

If you want to use our codebase for reproduction, you are recommended to build a new environment though the steps below.

We take LLaVA-OneVision as an example: (The following steps are just listed for Linux. If you are using macOS or Windows, please refer to LLaVA-NeXT)

  1. Clone this repository and navigate to RADAR folder
git clone https://github.com/Nieysh/RADAR.git
cd RADAR
  1. Install Package
conda create -n llava python=3.10 -y
conda activate llava
pip install -r requirements.txt
python -m pip install --upgrade pip  # enable PEP 660 support
cd LLaVA-NeXT
python -m pip install -e .
pip install git+https://github.com/huggingface/transformers@745bbfe4bb2b61491dedd56e1e8ee4af8ef1a9ec
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
conda install nvidia/label/cuda-12.1.1::cuda --channel nvidia/label/cuda-12.1.1

To evaluate other MLLMs for reproduction, please refer to Qwen2-VL, InternVL-3.5 for environment installation.

2. Data Preparation

Please follow the instructions below to prepare the checkpoints and data in directories:

(Take LLaVA-OneVision-0.5B (projector) as an example)

  1. Download pretrained LLaVA-OneVision weight from here.
  2. Download Qwen2-0.5B-Instruct model weight from here.
  3. Download SigLIP-so400m-patch14-384 model weight from here.

For other models, refer to their huggingface respositories for downloading the pretrained weights:

Model HF Link
InternVL3.5-1B-Pretrained 🤗 link
InternVL3.5-2B-Pretrained 🤗 link
InternVL3.5-4B-Pretrained 🤗 link
InternVL3.5-8B-Pretrained 🤗 link
InternVL3.5-14B-Pretrained 🤗 link
Qwen2-VL-2B 🤗 link
Qwen2-VL-7B 🤗 link
  1. Download M$^3$-Bench dataset from here and unzip it to YOUR/PATH/TO/M3-BENCH/DATA.

Referenced Data Directory

YOUR/PATH/TO/M3-BENCH/DATA
├─ general_visual_question_answering
|   ├─ images
|   ├─ MMBench_action_recognition_54_pretrained_mllm_eval.json
|   ├─ MMBench_attribute_comparison_44_pretrained_mllm_eval.json
|   ├─ .json data files of other tasks
├─ mathematical_reasoning
|   ├─ images
|   ├─ MathVista_1000_pretrained_mllm_eval.json
├─ Wiki_animal_identification
|   ├─ images
|   ├─ Wiki_animal_identification_2000_pretrained_mllm_eval.json

3. Evaluation

To reproduce the RADAR implementation on this codebase, you can follow these steps:

  1. Specify the data_dir, dataset and model_path in the script for RADAR calculation. (Also specify model_base and vision_tower for LLaVA-OneVision projectors)
  2. Run the script to conduct RADAR evaluation for different models.

For LLaVA-OneVision (projectors):

bash radar_eval_llava_ov.sh

For Qwen2-VL:

bash radar_eval_qwen2l.sh

For InternVL-3.5:

bash radar_eval_internvl3_5.sh

Acknowledgement

This repo is based on the codebase of LLaVA, Qwen2-VL, and InternVL-3.5. Thanks for their impressive works!

Citation

If you find this work useful for your research, please cite our paper:

@article{nie2026radar,
  title={RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training},
  author={Nie, Yunshuang and Lin, Bingqian and Niu, Minzhe and Xiang, Kun and Han, Jianhua and Huang, Guowei and Quan, Xingyue and Xu, Hang and Chen, Bokui and Liang, Xiaodan},
  journal={arXiv preprint arXiv:2602.12892},
  year={2026}
}

About

RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors