Skip to content

KuanchihHuang/Reason3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model [3DV 2025]

Kuan-Chih Huang, Xiangtai Li, Lu Qi, Shuicheng Yan, Ming-Hsuan Yang

arXiv Project

πŸ”₯ Update

  • 2025/01/19: Initial code for 3D referring segmentation has been released.
  • 2025/04/05: Code and dataset for 3D reasoning segmentation has been released.
  • 2025/05/18: We release the hierarchical searching code in the search branch.

Overview

vis

We introduce Reason3D, a novel LLM for comprehensive 3D understanding that processes point cloud data and text prompts to produce textual responses and segmentation masks. This enables advanced tasks such as 3D reasoning segmentation, hierarchical searching, referring expressions, and question answering with detailed mask outputs.

Installation

  1. Create conda environment. We use python=3.8 pytorch=1.11.0 and cuda=11.3.
conda create -n reason3d python=3.8
conda activate reason3d
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
  1. Install LAVIS
git clone https://github.com/salesforce/LAVIS.git SalesForce-LAVIS
cd SalesForce-LAVIS
pip install -e .
  1. Install segmentor from this repo (used for superpoint construction). We also provide an alternative PyTorch implementation segmentator_pytorch.py, though it may yield slightly lower performance.

  2. Install pointgroup_ops

cd lavis/models/reason3d_models/lib
sudo apt-get install libsparsehash-dev
python setup.py develop

Data Preparation

ScanNet v2 dataset

Download the ScanNet v2 dataset.

Put the downloaded scans folder as follows.

Reason3D
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ scannetv2
β”‚   β”‚   β”œβ”€β”€ scans

Split and preprocess point cloud data for 3D referring and 3D reasoning segmentation tasks:

cd data/scannetv2
bash prepare_data.sh #Scanrefer
bash prepare_data_reason.sh #Reason3D

After running the script, the scannetv2 dataset structure should look like below.

Reason3D
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ scannetv2
β”‚   β”‚   β”œβ”€β”€ scans
β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”‚   β”œβ”€β”€XXX_refer.pth
β”‚   β”‚   β”‚   β”œβ”€β”€XXX_reason.pth
β”‚   β”‚   β”œβ”€β”€ val

You can directly download our preprocessed data (train.zip and val.zip), please agree the official license before download it.

ScanRefer dataset

Download ScanRefer annotations.

Reason3D
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ ScanRefer
β”‚   β”‚   β”œβ”€β”€ ScanRefer_filtered_train.json
β”‚   β”‚   β”œβ”€β”€ ScanRefer_filtered_val.json

Matterport3D dataset

Please follow the instructions here to access official download_mp.py script, run the following in data/matterport/:

python2 download_mp.py -o . --type region_segmentations

Extract files and organize data as follows:

Reason3D
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ matterport
β”‚   β”‚   β”œβ”€β”€ scans
β”‚   β”‚   β”‚   β”œβ”€β”€ 17DRP5sb8fy
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€region_segmentations
β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€region0.ply
β”‚   β”‚   β”‚   β”œβ”€β”€ ...

Process data on Matterport3D dataset for 3D reasoning segmentation task:

cd data/matterport
python3 process_mp3d.py

After running the script, the Matterport3D dataset structure should look like below.

Reason3D
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ matterport
β”‚   β”‚   β”œβ”€β”€ mp3d_data
β”‚   β”‚   β”‚   β”œβ”€β”€ XXXXX_regionX.pth
β”‚   β”‚   β”‚   β”œβ”€β”€ ...

You can directly download our preprocessed data (mp3d_data.zip), please agree the official license before download it.

Reason3D dataset

Download our Reason3D annotations here.

Reason3D
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ reason3d
β”‚   β”‚   β”œβ”€β”€ reason3d_train.json
β”‚   β”‚   β”œβ”€β”€ reason3d_val.json

Pretrained Backbone

Download the SPFormer pretrained backbone (or provided by 3D-STMN) and move it to checkpoints.

mkdir checkpoints
mv ${Download_PATH}/sp_unet_backbone.pth checkpoints/

You can also pretrain the backbone by yourself and modify the path here.

Training

  • 3D referring segmentation: Train on ScanRefer dataset from scratch:
python -m torch.distributed.run --nproc_per_node=4 --master_port=29501 train.py --cfg-path lavis/projects/reason3d/train/reason3d_scanrefer_scratch.yaml
  • 3D reasoning segmentation: Train on Reason3D dataset using the pretrained checkpoint from the 3D referring segmentation model:
python -m torch.distributed.run --nproc_per_node=2 --master_port=29501 train.py --cfg-path lavis/projects/reason3d/train/reason3d_reason.yaml --options model.pretrained=<path_to_pretrained_checkpoint>

Replace <path_to_pretrained_checkpoint> with the path to your pretrained 3D referring segmentation model. For example: ./lavis/output/reason3d/xxxx/checkpoint_xx.pth

Evaluation

  • 3D referring segmentation: Evaluate on ScanRefer dataset:
python evaluate.py --cfg-path lavis/projects/reason3d/val/reason3d_scanrefer_scratch.yaml --options model.pretrained=<path_to_pretrained_checkpoint> run.save_results=True

Note: this repo currently only supports batch size = 1 for inference.

  • 3D reasoning segmentation: Evaluate on our Reason3D dataset:
python evaluate.py --cfg-path lavis/projects/reason3d/val/reason3d_reason.yaml --options model.pretrained=<path_to_pretrained_checkpoint> run.save_results=True

Add run.save_results=True option if you want to save prediction results.

We provide a pre-trained checkpoint for 3D Reasoning segmentation task. See the below table to check the performance.

Sample Number mIoU Acc50 Acc25
ScanNet 308 0.32 0.32 0.44
Matterport3D 837 0.22 0.21 0.33

Visualization

You can visualize prediction results using:

python visualize.py --idx <sample_index> --result_dir <results_directory>

<sample_index>: Index of the sample you wish to display. <results_directory>: Path to either the reason_preds or refer_preds directory containing the results.

Results

vis

TODO List

  • Release the initial code for 3D referring segmentation task.
  • Release final version paper.
  • Release the dataset and code for 3D reasoning segmentation task.
  • Release hierarchical mask decoder code.
  • Release demo and visualization code.
  • ...

Acknowlegment

Our codes are mainly based on LAVIS, 3D-LLM and 3D-STMN. Thanks for their contributions!

Citation

If you find our work useful for your project, please consider citing our paper:

@article{reason3d,
  title={Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model},
  author={Kuan-Chih Huang and Xiangtai Li and Lu Qi and Shuicheng Yan and Ming-Hsuan Yang},
  journal={3DV},
  year={2025}
}

About

[3DV 2025] Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors