Spatial-MLLM Training Guide

We provide a SFT training guide for Spatial-MLLM-Instruct-v1.1 models.

Prepare Pretrained Checkpoints

First, prepare the necessary pretrained model checkpoints and place them in the checkpoints directory.

mkdir -p checkpoints

# Download Qwen2.5-VL-3B-Instruct and VGGT-1B checkpoints
hf download  Qwen/Qwen2.5-VL-3B-Instruct --local-dir checkpoints/Qwen2.5-VL-3B-Instruct
hf download facebook/VGGT-1B --local-dir checkpoints/VGGT-1B

Prepare Datasets Annotations

The Spatial-MLLM-v1.1-Instruct-135k model is trained on the following datasets:

spatial_mllm_mix_133k: A mixture of our self-created data and ScanQA/SQA3D data. The annotations are available here.
route_plan_scannet_2k: A subset of route planning data used in VLM-3R, containing around 2k samples from ScanNet.

The Spatial-MLLM-v1.1-Instruct-820k model is trained on the following datasets:

spatial_mllm_mix_203k: A mixture of our self-created data and ScanQA/SQA3D data. The annotations are available here.
route_plan_4k: Route planning data used in VLM-3R.
vsi_590k: The 590k dataset from Cambrian-S.
mindcube_21k: The 21k dataset from MindCube.

For spatial_mllm_mix_133k and spatial_mllm_mix_203k, please download the annotations from the provided links and place them in the datasets/annotations directory.

For other annotation files, you may need to process them to align with our expected format (similar to this instruction). We provide some scripts in the scripts/preprocess for your reference.

Prepare Datasets Visual Data

For vsi_590k and mindcube_21k, they provide the corresponding visual data.

For spatial_mllm_mix and route_plan data, you need download and process raw video data from scannet, scannetpp and arkitscenes, and place them in the datasets/visuals directory.

Before starting training, you may need to modify the dataset configuration file to ensure annotation_path and data_path are set correctly.

Start Training

You can follow the instructions in scripts/training/spatial_mllm_train_demo.sh to start training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial-MLLM Training Guide

Prepare Pretrained Checkpoints

Prepare Datasets Annotations

Prepare Datasets Visual Data

Start Training

FilesExpand file tree

TRAINING.md

Latest commit

History

TRAINING.md

File metadata and controls

Spatial-MLLM Training Guide

Prepare Pretrained Checkpoints

Prepare Datasets Annotations

Prepare Datasets Visual Data

Start Training