Skip to content

feat(models): add NanoVLM chat model with async multi-GPU eval#1207

Merged
Luodian merged 1 commit intoEvolvingLMMs-Lab:dev-v0d7from
Jinghao-Guo:feature/add-nanovlm-model-v0d7
Feb 25, 2026
Merged

feat(models): add NanoVLM chat model with async multi-GPU eval#1207
Luodian merged 1 commit intoEvolvingLMMs-Lab:dev-v0d7from
Jinghao-Guo:feature/add-nanovlm-model-v0d7

Conversation

@Jinghao-Guo
Copy link
Contributor

Summary

  • Add NanoVLM evaluation model (lmms_eval/models/chat/nanovlm.py) as a chat-style model for lmms-eval
  • NanoVLM is a lightweight VLM architecture (SigLIP2 + MLP projector + Qwen3-0.6B) trained with lmms-engine
  • Supports async multi-GPU inference following the async_hf_model pattern: loads model replicas on N GPUs, dispatches work via job queue with independent worker threads
  • Handles NanoVLM-specific image token expansion (<|image_pad|> -> 256 tokens)

Usage

# Single GPU
python -m lmms_eval --model nanovlm --model_args pretrained=LMMs-Lab-Speedrun/NanoVLM_Init --tasks mme --batch_size 1

# Multi-GPU async (auto-detects all visible GPUs)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m lmms_eval --model nanovlm --model_args pretrained=LMMs-Lab-Speedrun/NanoVLM_Init --tasks mme --batch_size 1

# Explicit GPU selection
python -m lmms_eval --model nanovlm --model_args "pretrained=LMMs-Lab-Speedrun/NanoVLM_Init,worker_gpus=0,1,2,3" --tasks mme --batch_size 1

Test plan

  • Verified single-GPU eval on MME benchmark (2374 samples)
  • Verified async 4-GPU eval on full SpeedRun benchmark suite (11455 samples across 6 benchmarks)
  • Pre-commit hooks (black, isort) pass

Add a chat-style evaluation model for NanoVLM (SigLIP2 + MLP projector +
Qwen3-0.6B) trained with lmms-engine.

Key features:
- Async multi-GPU inference: loads model replicas on N GPUs, dispatches
  work via job queue with independent worker threads (no sync overhead)
- NanoVLM-specific image token expansion (<|image_pad|> -> 256 tokens)
- Single GPU fallback when only one device is available
- Configurable via worker_gpus/worker_count model args
@Jinghao-Guo Jinghao-Guo force-pushed the feature/add-nanovlm-model-v0d7 branch from 92e0de2 to 31a42d6 Compare February 25, 2026 14:26
@Luodian Luodian merged commit b2f7a92 into EvolvingLMMs-Lab:dev-v0d7 Feb 25, 2026
1 of 2 checks passed
Luodian added a commit that referenced this pull request Feb 28, 2026
Add a chat-style evaluation model for NanoVLM (SigLIP2 + MLP projector +
Qwen3-0.6B) trained with lmms-engine.

Key features:
- Async multi-GPU inference: loads model replicas on N GPUs, dispatches
  work via job queue with independent worker threads (no sync overhead)
- NanoVLM-specific image token expansion (<|image_pad|> -> 256 tokens)
- Single GPU fallback when only one device is available
- Configurable via worker_gpus/worker_count model args

Co-authored-by: Brian Li <drluodian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants