ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, Li Zhang

ICRA 2026

Abstract

Autonomous driving requires rich contextual comprehension and precise predictive reasoning to navigate dynamic and complex environments safely. Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge. VLMs provide interpretability and robust action prediction through their ability to understand multi-modal context, while DWMs excel in generating detailed and plausible future driving scenarios essential for proactive planning. Integrating VLMs with DWMs is an intuitive, promising, yet understudied strategy to exploit the complementary strengths of accurate behavioral prediction and realistic scene generation. Nevertheless, this integration presents notable challenges, particularly in effectively connecting action-level decisions with high-fidelity pixel-level predictions and maintaining computational efficiency. In this paper, we propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer to form a unified imagination-and-planning loop. The driving agent predicts initial driving trajectories based on multi-modal inputs, guiding the scene imaginer to generate corresponding future scenarios. These imagined scenarios are subsequently utilized to iteratively refine the driving agent’s planning decisions. To address efficiency and predictive accuracy challenges inherent in this integration, we introduce an early stopping mechanism and a trajectory selection strategy. Extensive experimental validation on the nuScenes and NAVSIM datasets demonstrates the robustness and superiority of ImagiDrive over previous alternatives under both open-loop and closed-loop conditions.

Pipeline

News

2026.03: We release the pipeline code!
2026.01: The paper is accepted by ICRA 2026.
2025.08: The paper is released on arXiv, and the code will be made publicly available upon acceptance.

Quick start

conda create -n vla_gen python=3.10 -y
conda activate vla_gen
conda install -y pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
git clone https://github.com/OpenGVLab/InternVL.git
cd InternVL
pip install -r requirements.txt
cd ..
git clone https://github.com/OpenDriveLab/Vista.git
cd Vista
pip3 install -r requirements.txt
cd ..
python interence_v2.py

📖 Citation

If you find our work useful, please cite:

@article{li2025imagidrive,
  title={ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving}, 
  author={Jingyu Li and Bozhou Zhang and Xin Jin and Jiankang Deng and Xiatian Zhu and Li Zhang}
  journal={arXiv preprint},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
figs		figs
pipeline		pipeline
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

Abstract

Pipeline

News

Quick start

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

Abstract

Pipeline

News

Quick start

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages