Official website for the ICML 2025 paper submission
🌐 Website: https://grow-ai-like-a-child.github.io/core-knowledge/
📄 Paper: https://arxiv.org/abs/2410.10855
🤗 Dataset: https://huggingface.co/grow-ai-like-a-child
This repository contains the official website for our paper "Core Knowledge Deficits in Multi-Modal Language Models". The website presents our comprehensive evaluation of 230 multi-modal language models using the CoreCognition benchmark, which assesses 12 foundational cognitive concepts grounded in developmental cognitive science.
Our research reveals four critical shortcomings in state-of-the-art Multi-modal Large Language Models (MLLMs):
- Core Knowledge Deficits: MLLMs excel at higher-level abilities but struggle with lower-level cognitive abilities
- Misaligned Dependency: Core abilities show weak cross-stage correlations, lacking developmental scaffolding
- Predictability: Performance on core knowledge predicts higher-level abilities
- Limited Scaling: MLLMs show minimal scalability improvements on low-level abilities compared to high-level ones
The CoreCognition benchmark evaluates twelve foundational cognitive concepts:
- Permanence - Objects persist when not perceived
- Continuity - Objects remain unified across space and time
- Boundary - Transitions between objects
- Spatiality - Understanding Euclidean properties
- Perceptual Constancy - Appearance changes ≠ property changes
- Intuitive Physics - Laws of physical interaction
- Perspective - Seeing what others see
- Hierarchy - Inclusion/exclusion of objects and categories
- Conservation - Property invariances despite transformations
- Tool Use - Manipulating objects to achieve goals
- Intentionality - Understanding what others want
- Mechanical Reasoning - Inferring actions from system states
We introduce Concept Hacking, a novel controlled evaluation method that systematically manipulates task-relevant features while preserving task-irrelevant conditions. This reveals that MLLMs fail to develop genuine core knowledge understanding and instead rely on shortcut learning as they scale.
- 230 MLLMs evaluated across different model families and sizes
- 11 different prompts to ensure robust evaluation
- >26,000 total judgments across all models and tasks
- 2,530 image-question pairs in the benchmark
├── _config.yml # Jekyll configuration
├── _layouts/
│ └── default.html # Main layout template
├── index.html # Homepage with full paper content
├── assets/
│ ├── images/ # Paper figures and illustrations
│ ├── growai.png # Site favicon
│ └── favicon.svg # Backup favicon
├── Gemfile # Ruby dependencies
└── README.md # This file
To run the website locally:
# Install dependencies
gem install jekyll bundler
bundle install
# Serve the site locally
bundle exec jekyll serve --livereloadThen visit: http://localhost:4000/core-knowledge
Yijiang Li¹, Qingying Gao²,§, Tianwei Zhao²,§, Bingyang Wang³,§, Haoran Sun², Haiyun Lyu⁴, Robert D. Hawkins⁵, Nuno Vasconcelos¹, Tal Golan⁶, Dezhi Luo⁷,⁸,†, Hokin Deng⁹,†
¹University of California San Diego, ²Johns Hopkins University, ³Emory University, ⁴University of North Carolina at Chapel Hill, ⁵Stanford University, ⁶Ben-Gurion University of the Negev, ⁷University of Michigan, ⁸University College London, ⁹Carnegie Mellon University
§Equal Contribution, †Corresponding author
If you find this work useful in your research, please consider citing:
@article{li2025core,
title={Core Knowledge Deficits in Multi-Modal Language Models},
author={Li, Yijiang and Gao, Qingying and Zhao, Tianwei and Wang, Bingyang and Sun, Haoran and Lyu, Haiyun and Luo, Dezhi and Deng, Hokin},
journal={arXiv preprint arXiv:2410.10855},
year={2025}
}The website is built with:
- Jekyll for static site generation
- Tailwind CSS for styling
- GitHub Pages for hosting
- Responsive design optimized for all devices
- SEO optimization for better discoverability
This website presents the official results and findings from our comprehensive evaluation of multi-modal language models on core cognitive abilities.