Core Knowledge Deficits in Multi-Modal Language Models

Official website for the ICML 2025 paper submission

🌐 Website: https://grow-ai-like-a-child.github.io/core-knowledge/
📄 Paper: https://arxiv.org/abs/2410.10855
🤗 Dataset: https://huggingface.co/grow-ai-like-a-child

📖 About

This repository contains the official website for our paper "Core Knowledge Deficits in Multi-Modal Language Models". The website presents our comprehensive evaluation of 230 multi-modal language models using the CoreCognition benchmark, which assesses 12 foundational cognitive concepts grounded in developmental cognitive science.

🔍 Key Findings

Our research reveals four critical shortcomings in state-of-the-art Multi-modal Large Language Models (MLLMs):

Core Knowledge Deficits: MLLMs excel at higher-level abilities but struggle with lower-level cognitive abilities
Misaligned Dependency: Core abilities show weak cross-stage correlations, lacking developmental scaffolding
Predictability: Performance on core knowledge predicts higher-level abilities
Limited Scaling: MLLMs show minimal scalability improvements on low-level abilities compared to high-level ones

🧠 CoreCognition Benchmark

The CoreCognition benchmark evaluates twelve foundational cognitive concepts:

Permanence - Objects persist when not perceived
Continuity - Objects remain unified across space and time
Boundary - Transitions between objects
Spatiality - Understanding Euclidean properties
Perceptual Constancy - Appearance changes ≠ property changes
Intuitive Physics - Laws of physical interaction
Perspective - Seeing what others see
Hierarchy - Inclusion/exclusion of objects and categories
Conservation - Property invariances despite transformations
Tool Use - Manipulating objects to achieve goals
Intentionality - Understanding what others want
Mechanical Reasoning - Inferring actions from system states

🔬 Concept Hacking

We introduce Concept Hacking, a novel controlled evaluation method that systematically manipulates task-relevant features while preserving task-irrelevant conditions. This reveals that MLLMs fail to develop genuine core knowledge understanding and instead rely on shortcut learning as they scale.

📊 Evaluation Scale

230 MLLMs evaluated across different model families and sizes
11 different prompts to ensure robust evaluation
>26,000 total judgments across all models and tasks
2,530 image-question pairs in the benchmark

🏗️ Website Structure

├── _config.yml              # Jekyll configuration
├── _layouts/
│   └── default.html         # Main layout template
├── index.html               # Homepage with full paper content
├── assets/
│   ├── images/              # Paper figures and illustrations
│   ├── growai.png          # Site favicon
│   └── favicon.svg         # Backup favicon
├── Gemfile                  # Ruby dependencies
└── README.md               # This file

🚀 Local Development

To run the website locally:

# Install dependencies
gem install jekyll bundler
bundle install

# Serve the site locally
bundle exec jekyll serve --livereload

Then visit: http://localhost:4000/core-knowledge

👥 Authors

Yijiang Li¹, Qingying Gao²,§, Tianwei Zhao²,§, Bingyang Wang³,§, Haoran Sun², Haiyun Lyu⁴, Robert D. Hawkins⁵, Nuno Vasconcelos¹, Tal Golan⁶, Dezhi Luo⁷,⁸,†, Hokin Deng⁹,†

¹University of California San Diego, ²Johns Hopkins University, ³Emory University, ⁴University of North Carolina at Chapel Hill, ⁵Stanford University, ⁶Ben-Gurion University of the Negev, ⁷University of Michigan, ⁸University College London, ⁹Carnegie Mellon University

§Equal Contribution, †Corresponding author

📄 Citation

If you find this work useful in your research, please consider citing:

@article{li2025core,
    title={Core Knowledge Deficits in Multi-Modal Language Models}, 
    author={Li, Yijiang and Gao, Qingying and Zhao, Tianwei and Wang, Bingyang and Sun, Haoran and Lyu, Haiyun and Luo, Dezhi and Deng, Hokin},
    journal={arXiv preprint arXiv:2410.10855},
    year={2025}
}

🔧 Technical Details

The website is built with:

Jekyll for static site generation
Tailwind CSS for styling
GitHub Pages for hosting
Responsive design optimized for all devices
SEO optimization for better discoverability

This website presents the official results and findings from our comprehensive evaluation of multi-modal language models on core cognitive abilities.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
_data		_data
_layouts		_layouts
assets		assets
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
_config.yml		_config.yml
convert.py		convert.py
index.html		index.html
leaderboard.html		leaderboard.html
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Core Knowledge Deficits in Multi-Modal Language Models

📖 About

🔍 Key Findings

🧠 CoreCognition Benchmark

🔬 Concept Hacking

📊 Evaluation Scale

🏗️ Website Structure

🚀 Local Development

👥 Authors

📄 Citation

🔧 Technical Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Core Knowledge Deficits in Multi-Modal Language Models

📖 About

🔍 Key Findings

🧠 CoreCognition Benchmark

🔬 Concept Hacking

📊 Evaluation Scale

🏗️ Website Structure

🚀 Local Development

👥 Authors

📄 Citation

🔧 Technical Details

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages