Skip to content

wangzihanggg/Wild-Drive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wild-Drive

Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model

A vision-LiDAR-language model for off-road autonomous driving. It fuses camera and LiDAR features via MoRo-Former (Modality Routing Transformer), generates scene descriptions through an LLM, and predicts future trajectories with a GRU-based planner.


1. Data Preparation

Download OR-C2P Dataset

Download the dataset from Baidu Netdisk:

Link: https://pan.baidu.com/s/1DI2g7iQC36n4lJKpwHejdA?pwd=t3q1

Extract the three archives into an OR-C2P folder:

mkdir OR-C2P
tar -xzf training.tar.gz -C OR-C2P/
tar -xzf validation.tar.gz -C OR-C2P/
tar -xzf testing.tar.gz -C OR-C2P/

Clone Repository & Setup Data

git clone https://github.com/wangzihanggg/Wild-Drive.git
cd Wild-Drive

# Symlink the dataset
ln -s /path/to/your/OR-C2P ./OR-C2P

# Download and extract chat JSONs
# (download chat_jsons.zip from the release page, then:)
unzip chat_jsons.zip -d ./

Data Structure

After setup, your directory should look like this:

Wild-Drive/
├── config.py
├── train.py
├── accelerate_config.yaml
├── OR-C2P-training.json
├── OR-C2P-validation.json
├── OR-C2P-testing.json
├── OR-C2P -> /path/to/your/OR-C2P
│   ├── training/
│   │   ├── <sequence_name>/
│   │   │   ├── calib/
│   │   │   ├── image_data/
│   │   │   ├── lidar_data/
│   │   │   ├── path_planning/
│   │   │   ├── scene_caption/
│   │   │   └── ...
│   │   └── ...
│   ├── validation/
│   │   └── ...
│   └── testing/
│       └── ...
├── models/
│   ├── __init__.py
│   ├── vlm.py
│   ├── moroformer.py
│   ├── voxelnet.py
│   └── planner.py
├── data/
│   ├── __init__.py
│   └── dataset.py
└── utils/
    ├── __init__.py
    └── lidar.py

2. Environment Setup

conda create -n wilddrive python=3.12 -y
conda activate wilddrive
pip install -r requirements.txt

3. Training

Configuration

Edit config.py to adjust model and training settings:

VLMConfig(
    llm_model_path="Qwen/Qwen2.5-0.5B-Instruct",  # LLM backbone
    vision_model_path="facebook/dinov2-base",         # Vision encoder
    use_lidar=True,                                    # Enable LiDAR branch
    use_moroformer=True,                               # Enable MoRo-Former fusion
    use_planning=True,                                 # Enable trajectory planning head
    planning_loss_weight=1.0,                          # Weight for planning loss
    ...
)

Launch Training

Single GPU:

accelerate launch --num_processes=1 train.py

Multi-GPU (DDP):

accelerate launch --num_processes=2 train.py

You can also use torchrun for multi-GPU training:

torchrun --nproc_per_node=2 train.py

TODO

  • Release pretrained model weights
  • Release evaluation code
  • Release full two-stage training pipeline
  • Release data preprocessing pipeline

About

Official implementation of "WILD-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages