GitHub - MrZihan/D3D-VLP

D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation

Zihan Wang, Seungjun Lee, Guangzhao Dai, Gim Hee Lee

Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning, while modular systems ignore cross-component interdependencies and synergies. To bridge this gap, we propose the Dynamic 3D Vision-Language-Planning Model (D3D VLP). Our model introduces two key innovations: 1) A Dynamic 3D Chain-of-Thought (3D CoT) that unifies planning, grounding, navigation, and question answering within a single 3D-VLM and CoT pipeline; 2) A Synergistic Learning from Fragmented Supervision (SLFS) strategy, which uses a masked autoregressive loss to learn from massive and partially-annotated hybrid data. This allows different CoT components to mutually reinforce and implicitly supervise each other. To this end, we construct a large scale dataset with 10M hybrid samples from 5K real scans and 20K synthetic scenes that are compatible with online learning methods such as RL and DAgger. Our D3D VLP achieves state-of-the-art results on multiple benchmarks, including Vision-and-Language Navigation (R2R-CE, REVERIE-CE, NavRAG-CE), Object-goal Navigation (HM3D-OVON), and Task-oriented Sequential Grounding and Navigation (SG3D). Real-world mobile manipulation experiments further validate the effectiveness.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
demo1_5x.mp4		demo1_5x.mp4
demo2_5x.mp4		demo2_5x.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation

Zihan Wang, Seungjun Lee, Guangzhao Dai, Gim Hee Lee

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation

Zihan Wang, Seungjun Lee, Guangzhao Dai, Gim Hee Lee

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages