feat(models): add NanoVLM chat model with async multi-GPU eval by Jinghao-Guo · Pull Request #1207 · EvolvingLMMs-Lab/lmms-eval

Jinghao-Guo · 2026-02-25T13:26:18Z

Summary

Add NanoVLM evaluation model (lmms_eval/models/chat/nanovlm.py) as a chat-style model for lmms-eval
NanoVLM is a lightweight VLM architecture (SigLIP2 + MLP projector + Qwen3-0.6B) trained with lmms-engine
Supports async multi-GPU inference following the async_hf_model pattern: loads model replicas on N GPUs, dispatches work via job queue with independent worker threads
Handles NanoVLM-specific image token expansion (<|image_pad|> -> 256 tokens)

Usage

# Single GPU
python -m lmms_eval --model nanovlm --model_args pretrained=LMMs-Lab-Speedrun/NanoVLM_Init --tasks mme --batch_size 1

# Multi-GPU async (auto-detects all visible GPUs)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lmms_eval --model nanovlm --model_args pretrained=LMMs-Lab-Speedrun/NanoVLM_Init --tasks mme --batch_size 1

# Explicit GPU selection
python -m lmms_eval --model nanovlm --model_args "pretrained=LMMs-Lab-Speedrun/NanoVLM_Init,worker_gpus=0,1,2,3" --tasks mme --batch_size 1

Test plan

Verified single-GPU eval on MME benchmark (2374 samples)
Verified async 4-GPU eval on full SpeedRun benchmark suite (11455 samples across 6 benchmarks)
Pre-commit hooks (black, isort) pass

Add a chat-style evaluation model for NanoVLM (SigLIP2 + MLP projector + Qwen3-0.6B) trained with lmms-engine. Key features: - Async multi-GPU inference: loads model replicas on N GPUs, dispatches work via job queue with independent worker threads (no sync overhead) - NanoVLM-specific image token expansion (<|image_pad|> -> 256 tokens) - Single GPU fallback when only one device is available - Configurable via worker_gpus/worker_count model args

Add a chat-style evaluation model for NanoVLM (SigLIP2 + MLP projector + Qwen3-0.6B) trained with lmms-engine. Key features: - Async multi-GPU inference: loads model replicas on N GPUs, dispatches work via job queue with independent worker threads (no sync overhead) - NanoVLM-specific image token expansion (<|image_pad|> -> 256 tokens) - Single GPU fallback when only one device is available - Configurable via worker_gpus/worker_count model args Co-authored-by: Brian Li <drluodian@gmail.com>

Jinghao-Guo force-pushed the feature/add-nanovlm-model-v0d7 branch from 92e0de2 to 31a42d6 Compare February 25, 2026 14:26

Luodian merged commit b2f7a92 into EvolvingLMMs-Lab:dev-v0d7 Feb 25, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): add NanoVLM chat model with async multi-GPU eval#1207

feat(models): add NanoVLM chat model with async multi-GPU eval#1207
Luodian merged 1 commit intoEvolvingLMMs-Lab:dev-v0d7from
Jinghao-Guo:feature/add-nanovlm-model-v0d7

Jinghao-Guo commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jinghao-Guo commented Feb 25, 2026

Summary

Usage

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants