Skip to content

woyut/NavQ_ICCV25

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation"

This is the PyTorch implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation" (ICCV 2025)

Installation

Please follow DUET to prepare the Matterport3D simulator, the environment, and the data and features of REVERIE and R2R.

After that, please download the following files and put them under the datasets drectory:

  1. The CLIP features of the image observations, provided by ScaleVLN (clip_vit-b16_mp3d_original.hdf5, clip_vit-b16_mp3d_hm3d_gibson.hdf5)
  2. The CLIP features of the textual descriptions (text_fts_LangNavGlobal.npy, text_fts_LangNavGlobal_SVLN.npy). The textual descriptions on MP3D are borrowed from LangNav, while the textual descriptions on the new scenes of ScaleVLN are extracted using BLIP in a similar mannar
  3. The bert-base-uncased model

Also, please download these files related to the Q-Model training and put them under the Q_pretrain/Q_files directory.

Training the Q Model

To train the Q Model, you may first get in Q_pretrain and use the following codes.

python train_mae.py --out_dir=ckpt/MAE --use_SVLN=False --eval_interval=1000 
python train.py --out_dir=ckpt/Q --use_SVLN=False --eval_interval=1000 --resume_ckpt=ckpt/MAE/ckpt20000.pt

To use the additionl scenes provided by ScaleVLN, please run:

python train_mae.py --out_dir=ckpt/MAE-SVLN --use_SVLN=True --eval_interval=2000 
python train.py --out_dir=ckpt/Q-SVLN --use_SVLN=True --eval_interval=2000 --resume_ckpt=ckpt/MAE-SVLN/ckpt100000.pt

Our Pre-trained Q-Model can be found at here.

Agent Pretraining

Please modify the "WM_ckpt" in pretrain_src/config/reverie_obj_model_config.json to the path of the Q-model you would like to use.

Then, run the following lines to start pretraining the agent.

cd pretrain_src
bash run_reverie.sh

Agent Finetuning

Please modify the Q_ckpt and pretrain_ckpt in map_nav_src/scripts/run_reverie.sh to the path of the Q-model and the pretrained model.

Then, run the following lines to start finetuning the agent.

cd map_nav_src
bash scripts/run_reverie.sh

We have provided our trained model at here.

Acknowledgement

This repository is built upon DUET. The structure and training of the Q model are largely inspired by nanoGPT. We also make use of the data provided by ScaleVLN and LangNav. We sincerely thank these works for their valuable contributions.

About

Implementation of "NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation" (ICCV 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors