- [2026.05] π₯ Code, dataset, and function pool released.
- [2026.02] π ProactiveMobile accepted to CVPR 2026.
ProactiveMobile is the first executable benchmark designed to evaluate proactive intelligence in mobile agents.
Unlike traditional reactive benchmarks that measure instruction-following ability, ProactiveMobile evaluates whether models can:
- Anticipate latent user intent
- Reason over multi-source contextual signals
- Generate executable API call sequences
- Avoid unnecessary or unsafe triggers
The benchmark includes:
- π± 14 real-world mobile scenarios
- π 3,600+ annotated instances
- π§ 63 executable API functions
- π― Multi-answer annotations
- π‘ Safety-aware evaluation protocol
This repository provides:
- Official dataset splits
- Training and inference scripts
- Evaluation toolkit
- Baseline reproduction scripts