Wild-Drive

Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model

A vision-LiDAR-language model for off-road autonomous driving. It fuses camera and LiDAR features via MoRo-Former (Modality Routing Transformer), generates scene descriptions through an LLM, and predicts future trajectories with a GRU-based planner.

1. Data Preparation

Download OR-C2P Dataset

Download the dataset from Baidu Netdisk:

Link: https://pan.baidu.com/s/1DI2g7iQC36n4lJKpwHejdA?pwd=t3q1

Extract the three archives into an OR-C2P folder:

mkdir OR-C2P
tar -xzf training.tar.gz -C OR-C2P/
tar -xzf validation.tar.gz -C OR-C2P/
tar -xzf testing.tar.gz -C OR-C2P/

Clone Repository & Setup Data

git clone https://github.com/wangzihanggg/Wild-Drive.git
cd Wild-Drive

# Symlink the dataset
ln -s /path/to/your/OR-C2P ./OR-C2P

# Download and extract chat JSONs
# (download chat_jsons.zip from the release page, then:)
unzip chat_jsons.zip -d ./

Data Structure

After setup, your directory should look like this:

Wild-Drive/
├── config.py
├── train.py
├── accelerate_config.yaml
├── OR-C2P-training.json
├── OR-C2P-validation.json
├── OR-C2P-testing.json
├── OR-C2P -> /path/to/your/OR-C2P
│   ├── training/
│   │   ├── <sequence_name>/
│   │   │   ├── calib/
│   │   │   ├── image_data/
│   │   │   ├── lidar_data/
│   │   │   ├── path_planning/
│   │   │   ├── scene_caption/
│   │   │   └── ...
│   │   └── ...
│   ├── validation/
│   │   └── ...
│   └── testing/
│       └── ...
├── models/
│   ├── __init__.py
│   ├── vlm.py
│   ├── moroformer.py
│   ├── voxelnet.py
│   └── planner.py
├── data/
│   ├── __init__.py
│   └── dataset.py
└── utils/
    ├── __init__.py
    └── lidar.py

2. Environment Setup

conda create -n wilddrive python=3.12 -y
conda activate wilddrive
pip install -r requirements.txt

3. Training

Configuration

Edit config.py to adjust model and training settings:

VLMConfig(
    llm_model_path="Qwen/Qwen2.5-0.5B-Instruct",  # LLM backbone
    vision_model_path="facebook/dinov2-base",         # Vision encoder
    use_lidar=True,                                    # Enable LiDAR branch
    use_moroformer=True,                               # Enable MoRo-Former fusion
    use_planning=True,                                 # Enable trajectory planning head
    planning_loss_weight=1.0,                          # Weight for planning loss
    ...
)

Launch Training

Single GPU:

accelerate launch --num_processes=1 train.py

Multi-GPU (DDP):

accelerate launch --num_processes=2 train.py

You can also use torchrun for multi-GPU training:

torchrun --nproc_per_node=2 train.py

TODO

Release pretrained model weights
Release evaluation code
Release full two-stage training pipeline
Release data preprocessing pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wild-Drive

1. Data Preparation

Download OR-C2P Dataset

Clone Repository & Setup Data

Data Structure

2. Environment Setup

3. Training

Configuration

Launch Training

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
models		models
utils		utils
README.md		README.md
accelerate_config.yaml		accelerate_config.yaml
config.py		config.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Wild-Drive

1. Data Preparation

Download OR-C2P Dataset

Clone Repository & Setup Data

Data Structure

2. Environment Setup

3. Training

Configuration

Launch Training

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages