GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Junhyeok Kim* Jaewoo Park* Junhee Park Sangeyl Lee Jiwan Chung Jisung Kim Ji Hoon Joung Youngjae Yu

_{* Equal contribution · ACL 2026 (Main)}

Overview

GuideDog is a real-world egocentric multimodal dataset for accessibility-aware guidance for blind and low-vision (BLV) users. The dataset contains 22,084 image-description pairs (2,106 human-verified gold and 19,978 VLM-generated silver) collected from real walking videos across diverse cities, plus two derived multiple-choice subsets: depth (relative-distance reasoning, 383 questions) and object (object-grounded reasoning, 435 questions).

This repository is the reproduction harness for the paper. It is a focused fork of lmms-eval that adds the GuideDog evaluation tasks; everything else in the repo is upstream lmms-eval. The companion HuggingFace dataset and this repo are the only artifacts you need to reproduce the paper's eval numbers.

Dataset

The dataset lives at kjunh/GuideDog (CC BY-NC 4.0, auto-approve access gate). Three configs:

Config	Split	Rows	Use
`default`	gold	2,106	Human-verified guidance, eval split
`default`	silver	19,978	VLM-generated guidance, training split
`depth`	train	383	Relative-distance MCQA
`object`	train	435	Object-grounded MCQA

from datasets import load_dataset
gold   = load_dataset("kjunh/GuideDog",           split="gold")
silver = load_dataset("kjunh/GuideDog",           split="silver")
depth  = load_dataset("kjunh/GuideDog", "depth",  split="train")
obj    = load_dataset("kjunh/GuideDog", "object", split="train")

A HuggingFace token is required (the dataset is gated with auto-approval). Set HF_TOKEN in your environment or run huggingface-cli login.

Installation

Python 3.10+ recommended. From a clean virtual environment:

git clone git@github.com:jun297/GuideDog.git
cd GuideDog
pip install -e .

Optional extras for specific model families:

pip install -e ".[qwen]"      # Qwen-VL utilities
pip install -e ".[gemini]"    # Gemini API
pip install -e ".[all]"       # everything

A reference Dockerfile and docker-compose.yml are included for containerized eval. See .env.example for environment variables (HF_TOKEN, OPENAI_API_KEY, HF_CACHE, DATASET_DIR).

Quickstart: reproduce a single eval

The paper reports numbers across multiple models and seven tasks. Below is a minimal end-to-end command that reproduces the object MCQA result for one model:

python -m accelerate.commands.launch -m lmms_eval \
    --model qwen2_5_vl \
    --model_args pretrained=Qwen/Qwen2.5-VL-7B-Instruct \
    --tasks guidedog_object \
    --batch_size 1 \
    --output_path ./logs/

For the open-ended generation tasks (which include the GPTScore metric), set OPENAI_API_KEY first:

export OPENAI_API_KEY=sk-...
python -m accelerate.commands.launch -m lmms_eval \
    --model qwen2_5_vl \
    --model_args pretrained=Qwen/Qwen2.5-VL-7B-Instruct \
    --tasks guidedog_0shot \
    --batch_size 1 \
    --output_path ./logs/

Tasks

Task name	Config × split	Metrics
`guidedog_0shot`	default × gold	BLEU, Rouge-{1,2,L,Lsum}, METEOR, BERTScore, GPTScore
`guidedog_3shot`	default × gold	(same)
`guidedog_socratic_0shot`	default × gold	(same)
`guidedog_socratic_3shot`	default × gold	(same)
`guidedog_depth_closer`	depth × train	accuracy
`guidedog_depth_farther`	depth × train	accuracy
`guidedog_object`	object × train	accuracy

Use --tasks task1,task2,... to evaluate several at once.

Citation

@inproceedings{kim2026guidedog,
    title     = {GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance},
    author    = {Kim, Junhyeok and Park, Jaewoo and Park, Junhee and Lee, Sangeyl and Chung, Jiwan and Kim, Jisung and Joung, Ji Hoon and Yu, Youngjae},
    booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
    year      = {2026}
}

License

Dual-licensed, matching the upstream lmms-eval convention:

MIT for the original lmms-eval pipeline (everything outside lmms_eval/tasks/guidedog*).
Apache 2.0 for the GuideDog task code in lmms_eval/tasks/guidedog, lmms_eval/tasks/guidedog_depth, and lmms_eval/tasks/guidedog_object.

The dataset itself (kjunh/GuideDog) is released under CC BY-NC 4.0 for non-commercial research use.

Acknowledgements

This evaluation pipeline is built on top of lmms-eval by EvolvingLMMs-Lab. We thank the authors and contributors of that project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
docs		docs
lmms_eval		lmms_eval
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Overview

Dataset

Installation

Quickstart: reproduce a single eval

Tasks

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Overview

Dataset

Installation

Quickstart: reproduce a single eval

Tasks

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages