🌍 Awesome World Models

The Most Comprehensive Collection of World Models Research

Awesome Generative World Models: Video, 3D, Robotics & Driving

Spanning Video Generation, 3D/4D Modeling, Autonomous Driving, Embodied AI, and Beyond

🌳 World Models Evolutionary Tree

🔥 News & Updates

[2026-03-06] 🎉 Repository launched! Unified collection of 489 papers from four major World Models repositories
[2026-03-06] 🆕 Added 11 latest papers from arXiv 2026 (LaST-VLA, ResWorld, DriveWorld-VLA, etc.)
[2026-03-06] 📊 Added comprehensive statistics and visualizations: 74 starred papers, 195 with code
[2026-03-06] 🗂️ Introduced dual-dimension taxonomy: Paradigm (VideoGen/OccGen/LiDARGen) + Application domains
[2026-03-06] 🤖 Integrated learning resources: talks, courses, tutorials, and datasets

📖 Table of Contents

💡 What are World Models?
🗺️ Taxonomy
📚 Research
🎓 Learning Resources
- 📺 Talks & Presentations
- 🎓 Courses & Tutorials
🔧 Practical Resources
🌐 Community
- 🏆 Workshops & Challenges
- 👥 Research Groups
🤝 Contributing
📜 Citation
⭐ Star History

💡 What are World Models?

World Models are AI systems that learn internal representations of their environment to predict future states, simulate scenarios, and enable intelligent decision-making. They bridge perception and action by building a mental model of how the world works.

Key Concepts

Predictive Modeling: Learning to forecast future observations from current state and actions
Latent Representations: Compressing high-dimensional sensory data into meaningful internal states
Simulation: Generating synthetic experiences for planning, training, and evaluation
Generalization: Transferring learned world knowledge to new scenarios and tasks

Why World Models Matter

Data Efficiency: Learn from fewer real-world interactions by leveraging simulated experience
Safety: Test dangerous scenarios in simulation before deployment
Interpretability: Explicit world representations enable better understanding of AI decisions
Generalization: Transfer knowledge across tasks and domains
Planning: Enable look-ahead reasoning for complex decision-making

🗺️ Taxonomy

This repository organizes world models research along two complementary dimensions:

Dimension 1: Representation Paradigms

How world models represent and generate environmental states:

🎬 VideoGen: Video-based representations using pixel-space generation
- Leverages powerful video generation models (diffusion, transformers)
- Natural for camera-based perception systems
- Examples: Genie, GAIA-1, DriveDreamer
🧊 OccGen: Occupancy-based 3D representations
- Explicit 3D spatial structure using voxel grids or occupancy fields
- Efficient for 3D reasoning and planning
- Examples: OccWorld, GaussianWorld, UniScene
📡 LiDARGen: LiDAR-based point cloud generation
- Direct modeling of 3D sensor data
- Preserves geometric precision
- Examples: LiDARGen, DynamicCity, LiSTAR

Dimension 2: Application Domains

Where world models are applied:

🚗 Autonomous Driving: Scene prediction, planning, and simulation for self-driving vehicles
🤖 Embodied AI & Robotics: Manipulation, navigation, and interaction in physical environments
🎮 Game Simulation & XR: Procedural content generation and interactive experiences
🔬 Scientific Applications: Physics simulation, molecular dynamics, climate modeling

📖 For detailed taxonomy explanation, see docs/research/taxonomy.md

📚 Research

Surveys & Reviews

Comprehensive surveys and review papers on world models:

Title	Venue	Resources
⭐ The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey	`arXiv 25.02`
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI	`TMECH 25`
3D and 4D World Modeling: A Survey	`arXiv 25.09`
A Comprehensive Survey on World Models for Embodied AI	`arXiv 25.10`
A Step Toward World Models: A Survey on Robotic Manipulation	`arXiv 25.11`
A Survey of Embodied World Models	`25.09`
A Survey on Future Physical World Generation for Autonomous Driving	`MMAsia 25`
The Safety Challenge of World Models for Embodied AI Agents: A Review	`arXiv 25.10`
Progressive Robustness-Aware World Models in Autonomous Driving: A Review and Outlook	`techrXiv 25.11`
⭐ A Path Towards Autonomous Machine Intelligence (Yann LeCun)	`OpenReview`

📖 For complete list of surveys, see docs/research/surveys.md

Papers by Paradigm

🎬 VideoGen: Video-based World Models

Video-based world models generate future frames in pixel space, leveraging powerful video generation architectures.

Key Papers:

🚗 Autonomous Driving Papers

Paper	Venue	Resources
⭐ LaST-VLA: Latent Spatio-Temporal VLA for Autonomous Driving	`arXiv 2026`
⭐ Hyper Diffusion Planner: End-to-End Autonomous Driving with Real-Vehicle Deployment	`arXiv 2026`
⭐ ResWorld: Temporal Residual World Model for Dynamic Object Modeling	`arXiv 2026`
⭐ DriveWorld-VLA: Unifying World Modeling and Planning in Latent Space	`arXiv 2026`
RaWMPC: Risk-aware World Model Predictive Control	`arXiv 2026`
DiffusionHarmonizer: Online Generative Enhancement for Driving Simulation	`arXiv 2026`
⭐ A Survey of World Models for Autonomous Driving	`-`
⭐ Cosmos-Drive-Dreams	`-`
Drive-OccWorld	`AAAI 2025`
DriveDreamer-2	`AAAI 2025`
DriveDreamer4D	`CVPR 2025`
MagicDrive-V2	`ICCV 2025`
SubjectDrive	`AAAI 2025`
DriveDreamer	`ECCV 2024`
DriveWorld	`CVPR 2024`
DrivingDiffusion	`ECCV 2024`
DrivingDojo Dataset	`IPS 2024`
MagicDrive	`ICLR 2024`
Workshop on Foundation Models for Autonomous Systems	`CVPR 2024`	-
AdaptiveDriver	`-`
Dream to Drive	`-`
Dream4Drive	`-`
Drive-JEPA	`-`
DriveGenVLM	`-`
DrivePhysica	`-`
DriveVLA-W0	`-`
DrivingGPT	`-`
⭐ GAIA-2	`-`
⭐ WorldLens	`-`
STAGE	`-`
ReSim	`-`
Dreamland	`-`
LongDWM	`-`
GeoDrive	`-`
FutureSightDrive	`-`
Raw2Drive	`-`
VL-SAFE	`-`
PosePilot	`-`
DriVerse	`-`
MiLA	`-`
SimWorld	`-`
UniFuture	`-`
EOT-WM	`-`
MaskGWM	`-`
HERMES	`-`
AdaWM	`-`
AD-L-JEPA	`-`
DrivingWorld	`-`
GEM	`-`
GaussianWorld	`-`
Doe-1	`-`
InfiniCube	`-`
InfinityDrive	`-`
ReconDreamer	`-`
Imagine-2-Drive	`-`
DynamicCity	`-`
DOME	`-`
SSR	`-`
LatentDriver	`-`
RenderWorld	`-`
OccLLaMA	`-`
CarFormer	`-`
BEVWorld	`-`
TOKEN	`-`
SimGen	`-`
UnO	`-`
LAW	`-`
Delphi	`-`
OccSora	`-`
MagicDrive3D	`-`
Vista	`-`
CarDreamer	`-`
DriveSim	`-`
LidarDM	`-`
GenAD	`-`
GenAD (End-to-End)	`-`
ViDAR	`-`
Drive-WM	`-`
Cam4DOCC	`-`
Panacea	`-`
OccWorld	`-`
SafeDreamer	`-`
SEM2	`-`
DrivingGen	`-`
DrivingWorld	`-`
GenieDrive	`-`
ImagiDrive	`-`
Imagine-2-Drive	`-`
InfinityDrive	`-`
Interplay Between Video Generation and World Models in Autonomous Driving	`-`
LatentDriver	`-`
MagicDrive3D	`-`
MoVieDrive	`-`
Think Before You Drive	`-`
Think2Drive	`-`
TrafficBots	`-`
UniDrive-WM	`-`
World Models for Autonomous Driving: An Initial Survey	`-`
World4Drive	`-`
**`CVPR 24 Workshop & Challenge	OpenDriveLab`**	`-`
**`CVPR 25 Workshop & Challenge	OpenDriveLab`**	`-`

Papers by Application

🚗 Autonomous Driving

World models for scene prediction, planning, and simulation in self-driving vehicles.

📄 See complete list above (40 papers total)

🤖 Embodied AI & Robotics

World models for manipulation, navigation, and interaction in physical environments.

🤖 Embodied AI Papers

Paper	Venue	Resources
Hand2World: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures	`arXiv 2026`
GaussTwin: Unified Simulation and Correction with Gaussian Splatting for Robotic Digital Twins	`arXiv 2026`
⭐ "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning"	`-`
⭐ Agent Learning via Early Experience	`-`
⭐ General agents Contain World Models	`-`
⭐ Persistent Embodied World Models	`-`
⭐ Self-Improving Embodied Foundation Models	`-`
⭐ World Models for Embodied AI	`-`
Workshop on Embodied World Models for Decision Making	`IPS 2025`	-
EmbodieDreamer	`-`
Embodied AI Agents: Modeling the World	`-`
PhysicalAgent	`-`
Video Agent	`-`
Web Agents with World Models	`-`

🦾 Robotics Papers

Paper	Venue	Resources
⭐ "Learning Primitive Embodied World Models: Towards Scalable Robotic Learning"	`-`
⭐ "Multi-Task Interactive Robot Fleet Learning with Visual World Models"	`-`
⭐ "Object-Centric World Model for Language-Guided Manipulation" [![arXiv](https://img	`-`
⭐ Robotic World Model	`-`
⭐ Genie Envisioner	`-`
⭐ WoW	`-`
UnifoLM-WMA-0	`-`
⭐ iVideoGPT	`-`
⭐ PointWorld	`-`
⭐ Dex-WM	`-`
⭐ FLARE	`-`
⭐ Enerverse	`-`
⭐ AgiBot-World	`-`
⭐ DyWA	`-`
⭐ TesserAct	`-`
⭐ DreamGen	`-`
⭐ HiP	`-`
PAR	`-`
iMoWM	`-`
WristWorld	`-`
EMMA	`-`
PhysTwin	`-`
⭐ KeyWorld	`-`
World4RL	`-`
⭐ SAMPO	`-`
⭐ GWM	`-`
⭐ Flow-as-Action	`-`
RoboScape	`-`
⭐ ParticleFormer	`-`
ManiGaussian++	`-`
GAF	`-`
⭐ 3DFlowAction	`-`
⭐ ORV	`-`
⭐ WoMAP	`-`
⭐ OSVI-WM	`-`
⭐ LaDi-WM	`-`
FlowDreamer	`-`
PIN-WM	`-`
RoboMaster	`-`
ManipDreamer	`-`
⭐ AdaWorld	`-`
⭐ EVA	`-`
DexSim2Real²	`-`
⭐ LUMOS	`-`
⭐ DEMO³	`-`
RoboHorizon	`-`
Dream to Manipulate	`-`
⭐ RoboDreamer	`-`
⭐ Vidar	`-`
ManiGaussian	`-`
⭐ WHALE	`-`
⭐ VisualPredicator	`-`
PIVOT-R	`-`
Video2Action	`-`
Diffuser	`-`
Decision Diffuser	`-`

🚶 Navigation Papers

Paper	Venue	Resources
⭐ NWM (Navigation World Models)	`-`
⭐ MindJourney	`-`
NavMorph	`-`
Unified World Models	`-`
RECON	`-`
WMNav	`-`
NaVi-WM	`-`
AIF	`-`
X-MOBILITY	`-`
MWM	`-`

🦿 Locomotion Papers

Paper	Venue	Resources
⭐ Ego-VCP	`-`
⭐ RWM-O	`-`
⭐ DWL	`-`
HRSSM	`-`
WMP	`-`
TrajWorld	`-`
Puppeteer	`-`
ProTerrain	`-`
Occupancy World Model	`-`
⭐ 1X World Model	`-`
⭐ GROOT-Dreams	`-`
Humanoid World Models	`-`
Ego-Agent	`-`
D²PO	`-`
COMBO	`-`

🤖💬 Vision-Language-Action (VLA) Models

World models integrated with vision-language-action architectures for robotic control.

Paper	Venue	Resources
⭐ CoT-VLA	`-`
⭐ UP-VLA	`-`
⭐ VPP	`-`
⭐ MinD	`-`
⭐ DreamVLA	`-`
⭐ WorldVLA	`-`
3D-VLA	`-`
LAWM	`-`
⭐ UniVLA	`-`
⭐ dVLA	`-`
⭐ UD-VLA	`-`
Goal-VLA	`-`
Vidarc	`-`
⭐ VideoVLA	`-`
⭐ Motus	`-`
⭐ mimic-video	`-`
⭐ Ctrl-World	`-`
VLA-RFT	`-`
World-Env	`-`
⭐ GigaBrain-0.5M	`-`
RISE	`-`
GigaBrain-0	`-`
WMPO	`-`

🎯 Policy Learning with World Models

General policy learning methods leveraging world models for embodied AI.

Paper	Venue	Resources
⭐ LingBot-VA	`-`
⭐ UWM	`-`
⭐ UVA	`-`
DiWA	`-`
⭐ Dreamerv4	`-`
LVP	`-`
⭐ LDA-1B	`-`

🎮 Game Simulation & XR

World models for procedural content generation and interactive experiences.

🎮 Game & Simulation Papers

Paper	Venue	Resources
⭐ Is Sora a World Simulator	`-`
⭐ Matrix-Game	`-`
⭐ Matrix-Game 2.0	`-`
AnimeGamer	`-`
GameFactory	`-`
Hunyuan-GameCraft-2	`-`
Interactive Generative Video as Next-Generation Game Engine	`-`
Interplay Between Video Generation and World Models in Autonomous Driving	`-`
World Models and Physical Simulation	`-`
⭐ GameNGen	`-`
⭐ DIAMOND	`-`
⭐ MineWorld	`-`
⭐ HunyuanWorld 1.0	`-`
Oasis	`-`
Genie	`-`
Genie 2	`-`
WorldCrafter	`-`
Cosmos	`-`

💡 Theory & Explainability

Theoretical foundations and explainability of world models.

Paper	Venue	Resources
⭐ Inductive Biases in Transformers	`arXiv 2026`
⭐ Physical Grounding in World Models	`arXiv 2026`

👥 Social World Models & Multi-Agent Systems

World models for social interaction, multi-agent coordination, and human behavior prediction.

Paper/Project	Type	Resources
FreeAskWorld	Framework
Model-Based Social Navigation	Navigation
SOMA: Socio-physical Model of Activities	Activity Model
Mini-Genie: Multi-Agent World Model	Multi-Agent
Social World Model Simulation	Simulation
MotionLM: Multi-Agent Motion Forecasting	Prediction
Melting Pot: Multi-Agent RL Evaluation	Benchmark

Key Topics:

🤝 Human-robot interaction modeling
🚶 Pedestrian behavior prediction
🎭 Social navigation in crowded environments
🤖 Multi-agent coordination and communication
🧠 Theory of mind for AI agents

📖 For more on multi-agent systems, see Multi-Agent Reinforcement Learning

🔬 World Models for Science

World models applied to scientific domains including medicine, biology, and social sciences.

Natural Science:

Paper	Venue	Resources
World Models for Clinical Prediction	`-`
⭐ CellFlux	`-`
CheXWorld	`-`
EchoWorld	`-`
ODesign	`-`
⭐ SFP	`-`
Xray2Xray	`-`
⭐ Medical World Model	`-`
Surgical Vision World Model	`-`

Social Science:

Paper	Venue	Resources
Social World Models	`-`
Social World Model-Augmented Mechanism Design	`-`
SocioVerse	`-`

🎓 Learning Resources

📺 Talks & Presentations

Key talks and presentations on world models:

Title	Speaker	Venue
A Path Towards Autonomous Machine Intelligence	Yann LeCun	Meta AI
World Models for Autonomous Driving	Ashok Elluswamy	Tesla AI Day 2024
GAIA-1: A Generative World Model for Autonomous Driving	Wayve Team	NVIDIA GTC 2023
NVIDIA Cosmos: Physical AI with World Foundation Models	NVIDIA Team	GTC 2025
Genie: Generative Interactive Environments	DeepMind Team	Tech Talk 2024

📺 For complete list of talks (50+), see docs/learning/talks.md

🎓 Courses & Tutorials

Online Courses:

Deep Reinforcement Learning (UC Berkeley CS285) - Covers model-based RL and world models
Stanford CS330: Deep Multi-Task and Meta Learning - Includes world model architectures
MIT 6.S898: Deep Learning - Foundation models and world models

Tutorials:

Implementing DreamerV3 from Scratch - Official PyTorch implementation
World Models Tutorial - Interactive introduction to world models
CARLA Autonomous Driving Tutorial - Simulation-based learning

🎓 For complete list of courses and tutorials (30+), see docs/learning/tutorials.md

🔧 Practical Resources

📊 Datasets

Autonomous Driving:

nuScenes - 1000 scenes with camera, LiDAR, radar
Waymo Open Dataset - 1000 segments, 200k frames
KITTI - Classic autonomous driving benchmark
Argoverse 2 - 1000 scenarios, forecasting
Occ3D - 16k frames for occupancy prediction
CARLA - Open-source driving simulator

Robotics:

CALVIN - 24k manipulation trajectories
RoboNet - 15M robot manipulation frames
RoboCasa - 100k kitchen manipulation trajectories
Open X-Embodiment - 1M+ multi-robot trajectories
Habitat-Matterport 3D - 90 indoor scenes

Games:

Minecraft - Procedural 3D environments
Atari 2600 - Classic RL benchmark (57 games)
MineRL - 60M frames of Minecraft gameplay

📊 For complete list of datasets (50+), see docs/resources/datasets.md

🎯 Benchmarks & Leaderboards

Evaluation Tools:

WorldLens - Comprehensive evaluation framework for driving world models
VBench - Video generation quality metrics
FVD - Fréchet Video Distance for video quality
LPIPS - Learned Perceptual Image Patch Similarity

🎯 For complete benchmarks and leaderboards, see docs/resources/benchmarks.md

🛠️ Tools & Libraries

Frameworks:

DreamerV3 - State-of-the-art model-based RL
Stable Diffusion - Foundation for video generation models
PyTorch3D - 3D deep learning library

Simulation:

CARLA - Open-source driving simulator
Isaac Sim - NVIDIA robotics simulator
MuJoCo - Physics engine for robotics

🛠️ For complete list of tools, see docs/resources/tools.md

🌐 Community

🏆 Workshops & Challenges

CVPR 2025 Workshop on World Models
ICCV 2025 Workshop on 4D World Models - Bridging Generation and Reconstruction
OpenDriveLab Challenges - Annual autonomous driving competitions

🏆 For complete list of workshops, see docs/community/workshops.md

👥 Research Groups

Leading Labs:

NVIDIA Toronto AI Lab - Cosmos, GAIA series
Wayve - End-to-end driving with world models
Tesla AI - FSD world model development
UC Berkeley RAIL - Model-based RL research
DeepMind - Genie, DreamerV3

👥 For complete list of research groups, see docs/community/research-groups.md

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

How to contribute:

Fork the repository
Add your paper/resource following the format
Ensure all links are valid
Submit a pull request

Format for papers:

| **Paper Title** | `Venue Year` | [![arXiv](badge)](url) [![Code](badge)](url) |

📜 Citation

If you find this repository useful, please consider citing:

@misc{awesome-world-models-2026,
  title={Awesome World Models: A Comprehensive Collection},
  author={Jing, Bowen},
  year={2026},
  howpublished={\url{https://github.com/Bowen12137/Awesome-World-Models}}
}

⭐ Star History

📊 Repository Statistics

Overview

Total Papers: 489
Starred Papers: 74
Papers with Code: 195 (40%)
Year Range: 2012 - 2026
Top Venues: CVPR (41), ICCV (16), ICLR (10)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets/images		assets/images
docs		docs
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🌍 Awesome World Models

🌳 World Models Evolutionary Tree

🔥 News & Updates

📖 Table of Contents

💡 What are World Models?

Key Concepts

Why World Models Matter

🗺️ Taxonomy

Dimension 1: Representation Paradigms

Dimension 2: Application Domains

📚 Research

Surveys & Reviews

Papers by Paradigm

🎬 VideoGen: Video-based World Models

🚗 Autonomous Driving Papers

Papers by Application

🚗 Autonomous Driving

🤖 Embodied AI & Robotics

🤖 Embodied AI Papers

🦾 Robotics Papers

🚶 Navigation Papers

🦿 Locomotion Papers

🤖💬 Vision-Language-Action (VLA) Models

🎯 Policy Learning with World Models

🎮 Game Simulation & XR

🎮 Game & Simulation Papers

💡 Theory & Explainability

👥 Social World Models & Multi-Agent Systems

🔬 World Models for Science

🎓 Learning Resources

📺 Talks & Presentations

🎓 Courses & Tutorials

🔧 Practical Resources

📊 Datasets

🎯 Benchmarks & Leaderboards

🛠️ Tools & Libraries

🌐 Community

🏆 Workshops & Challenges

👥 Research Groups

🤝 Contributing

📜 Citation

⭐ Star History

📊 Repository Statistics

Overview

Visualizations

Papers by Year

Top Venues

Category Distribution

Resource Availability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages