Implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation"

This is the PyTorch implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation" (ICCV 2025)

Installation

Please follow DUET to prepare the Matterport3D simulator, the environment, and the data and features of REVERIE and R2R.

After that, please download the following files and put them under the datasets drectory:

The CLIP features of the image observations, provided by ScaleVLN (clip_vit-b16_mp3d_original.hdf5, clip_vit-b16_mp3d_hm3d_gibson.hdf5)
The CLIP features of the textual descriptions (text_fts_LangNavGlobal.npy, text_fts_LangNavGlobal_SVLN.npy). The textual descriptions on MP3D are borrowed from LangNav, while the textual descriptions on the new scenes of ScaleVLN are extracted using BLIP in a similar mannar
The bert-base-uncased model

Also, please download these files related to the Q-Model training and put them under the Q_pretrain/Q_files directory.

Training the Q Model

To train the Q Model, you may first get in Q_pretrain and use the following codes.

python train_mae.py --out_dir=ckpt/MAE --use_SVLN=False --eval_interval=1000 
python train.py --out_dir=ckpt/Q --use_SVLN=False --eval_interval=1000 --resume_ckpt=ckpt/MAE/ckpt20000.pt

To use the additionl scenes provided by ScaleVLN, please run:

python train_mae.py --out_dir=ckpt/MAE-SVLN --use_SVLN=True --eval_interval=2000 
python train.py --out_dir=ckpt/Q-SVLN --use_SVLN=True --eval_interval=2000 --resume_ckpt=ckpt/MAE-SVLN/ckpt100000.pt

Our Pre-trained Q-Model can be found at here.

Agent Pretraining

Please modify the "WM_ckpt" in pretrain_src/config/reverie_obj_model_config.json to the path of the Q-model you would like to use.

Then, run the following lines to start pretraining the agent.

cd pretrain_src
bash run_reverie.sh

Agent Finetuning

Please modify the Q_ckpt and pretrain_ckpt in map_nav_src/scripts/run_reverie.sh to the path of the Q-model and the pretrained model.

Then, run the following lines to start finetuning the agent.

cd map_nav_src
bash scripts/run_reverie.sh

We have provided our trained model at here.

Acknowledgement

This repository is built upon DUET. The structure and training of the Q model are largely inspired by nanoGPT. We also make use of the data provided by ScaleVLN and LangNav. We sincerely thank these works for their valuable contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
code		code
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation"

Installation

Training the Q Model

Agent Pretraining

Agent Finetuning

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation"

Installation

Training the Q Model

Agent Pretraining

Agent Finetuning

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages