Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model
A vision-LiDAR-language model for off-road autonomous driving. It fuses camera and LiDAR features via MoRo-Former (Modality Routing Transformer), generates scene descriptions through an LLM, and predicts future trajectories with a GRU-based planner.
Download the dataset from Baidu Netdisk:
Link: https://pan.baidu.com/s/1DI2g7iQC36n4lJKpwHejdA?pwd=t3q1
Extract the three archives into an OR-C2P folder:
mkdir OR-C2P
tar -xzf training.tar.gz -C OR-C2P/
tar -xzf validation.tar.gz -C OR-C2P/
tar -xzf testing.tar.gz -C OR-C2P/git clone https://github.com/wangzihanggg/Wild-Drive.git
cd Wild-Drive
# Symlink the dataset
ln -s /path/to/your/OR-C2P ./OR-C2P
# Download and extract chat JSONs
# (download chat_jsons.zip from the release page, then:)
unzip chat_jsons.zip -d ./After setup, your directory should look like this:
Wild-Drive/
├── config.py
├── train.py
├── accelerate_config.yaml
├── OR-C2P-training.json
├── OR-C2P-validation.json
├── OR-C2P-testing.json
├── OR-C2P -> /path/to/your/OR-C2P
│ ├── training/
│ │ ├── <sequence_name>/
│ │ │ ├── calib/
│ │ │ ├── image_data/
│ │ │ ├── lidar_data/
│ │ │ ├── path_planning/
│ │ │ ├── scene_caption/
│ │ │ └── ...
│ │ └── ...
│ ├── validation/
│ │ └── ...
│ └── testing/
│ └── ...
├── models/
│ ├── __init__.py
│ ├── vlm.py
│ ├── moroformer.py
│ ├── voxelnet.py
│ └── planner.py
├── data/
│ ├── __init__.py
│ └── dataset.py
└── utils/
├── __init__.py
└── lidar.py
conda create -n wilddrive python=3.12 -y
conda activate wilddrive
pip install -r requirements.txtEdit config.py to adjust model and training settings:
VLMConfig(
llm_model_path="Qwen/Qwen2.5-0.5B-Instruct", # LLM backbone
vision_model_path="facebook/dinov2-base", # Vision encoder
use_lidar=True, # Enable LiDAR branch
use_moroformer=True, # Enable MoRo-Former fusion
use_planning=True, # Enable trajectory planning head
planning_loss_weight=1.0, # Weight for planning loss
...
)Single GPU:
accelerate launch --num_processes=1 train.pyMulti-GPU (DDP):
accelerate launch --num_processes=2 train.pyYou can also use torchrun for multi-GPU training:
torchrun --nproc_per_node=2 train.py- Release pretrained model weights
- Release evaluation code
- Release full two-stage training pipeline
- Release data preprocessing pipeline
