The Most Comprehensive Collection of World Models Research
Awesome Generative World Models: Video, 3D, Robotics & Driving
Spanning Video Generation, 3D/4D Modeling, Autonomous Driving, Embodied AI, and Beyond
- [2026-03-06] 🎉 Repository launched! Unified collection of 489 papers from four major World Models repositories
- [2026-03-06] 🆕 Added 11 latest papers from arXiv 2026 (LaST-VLA, ResWorld, DriveWorld-VLA, etc.)
- [2026-03-06] 📊 Added comprehensive statistics and visualizations: 74 starred papers, 195 with code
- [2026-03-06] 🗂️ Introduced dual-dimension taxonomy: Paradigm (VideoGen/OccGen/LiDARGen) + Application domains
- [2026-03-06] 🤖 Integrated learning resources: talks, courses, tutorials, and datasets
- 💡 What are World Models?
- 🗺️ Taxonomy
- 📚 Research
- 🎓 Learning Resources
- 🔧 Practical Resources
- 🌐 Community
- 🤝 Contributing
- 📜 Citation
- ⭐ Star History
World Models are AI systems that learn internal representations of their environment to predict future states, simulate scenarios, and enable intelligent decision-making. They bridge perception and action by building a mental model of how the world works.
- Predictive Modeling: Learning to forecast future observations from current state and actions
- Latent Representations: Compressing high-dimensional sensory data into meaningful internal states
- Simulation: Generating synthetic experiences for planning, training, and evaluation
- Generalization: Transferring learned world knowledge to new scenarios and tasks
- Data Efficiency: Learn from fewer real-world interactions by leveraging simulated experience
- Safety: Test dangerous scenarios in simulation before deployment
- Interpretability: Explicit world representations enable better understanding of AI decisions
- Generalization: Transfer knowledge across tasks and domains
- Planning: Enable look-ahead reasoning for complex decision-making
This repository organizes world models research along two complementary dimensions:
How world models represent and generate environmental states:
-
🎬 VideoGen: Video-based representations using pixel-space generation
- Leverages powerful video generation models (diffusion, transformers)
- Natural for camera-based perception systems
- Examples: Genie, GAIA-1, DriveDreamer
-
🧊 OccGen: Occupancy-based 3D representations
- Explicit 3D spatial structure using voxel grids or occupancy fields
- Efficient for 3D reasoning and planning
- Examples: OccWorld, GaussianWorld, UniScene
-
📡 LiDARGen: LiDAR-based point cloud generation
- Direct modeling of 3D sensor data
- Preserves geometric precision
- Examples: LiDARGen, DynamicCity, LiSTAR
Where world models are applied:
- 🚗 Autonomous Driving: Scene prediction, planning, and simulation for self-driving vehicles
- 🤖 Embodied AI & Robotics: Manipulation, navigation, and interaction in physical environments
- 🎮 Game Simulation & XR: Procedural content generation and interactive experiences
- 🔬 Scientific Applications: Physics simulation, molecular dynamics, climate modeling
📖 For detailed taxonomy explanation, see docs/research/taxonomy.md
Comprehensive surveys and review papers on world models:
📖 For complete list of surveys, see docs/research/surveys.md
Video-based world models generate future frames in pixel space, leveraging powerful video generation architectures.
Key Papers:
World models for scene prediction, planning, and simulation in self-driving vehicles.
📄 See complete list above (40 papers total)
World models for manipulation, navigation, and interaction in physical environments.
| ⭐ DreamDojo | arXiv 2026 |
|
| ⭐ World Models as Data Engine |
arXiv 2025 | |
| View-Consistent 4D World Model | arXiv 2026 | |
| ⭐ World Models as Reliable Simulators |
arXiv 2026 | |
| Paper | Venue | Resources |
|---|---|---|
| ⭐ NWM (Navigation World Models) | - |
|
| ⭐ MindJourney | - |
|
| NavMorph | - |
|
| Unified World Models | - |
|
| RECON | - |
|
| WMNav | - |
|
| NaVi-WM | - |
|
| AIF | - |
|
| X-MOBILITY | - |
|
| MWM | - |
World models integrated with vision-language-action architectures for robotic control.
General policy learning methods leveraging world models for embodied AI.
| Paper | Venue | Resources |
|---|---|---|
| ⭐ LingBot-VA | - |
|
| ⭐ UWM | - |
|
| ⭐ UVA | - |
|
| DiWA | - |
|
| ⭐ Dreamerv4 | - |
|
| LVP | - |
|
| ⭐ LDA-1B | - |
World models for procedural content generation and interactive experiences.
| ⭐ Web World Models | arXiv 2025 |
|
| Large-Scale World Model for Web Agent |
arXiv 2026 | |
| World-Model-Augmented Web Agents |
arXiv 2026 | |
| Multiplayer Video World Model in Minecraft |
arXiv 2026 | |
Theoretical foundations and explainability of world models.
| Paper | Venue | Resources |
|---|---|---|
| ⭐ Inductive Biases in Transformers | arXiv 2026 |
|
| ⭐ Physical Grounding in World Models | arXiv 2026 |
World models for social interaction, multi-agent coordination, and human behavior prediction.
Key Topics:
- 🤝 Human-robot interaction modeling
- 🚶 Pedestrian behavior prediction
- 🎭 Social navigation in crowded environments
- 🤖 Multi-agent coordination and communication
- 🧠 Theory of mind for AI agents
📖 For more on multi-agent systems, see Multi-Agent Reinforcement Learning
World models applied to scientific domains including medicine, biology, and social sciences.
Natural Science:
| Paper | Venue | Resources |
|---|---|---|
| World Models for Clinical Prediction | - |
|
| ⭐ CellFlux | - |
|
| CheXWorld | - |
|
| EchoWorld | - |
|
| ODesign | - |
|
| ⭐ SFP | - |
|
| Xray2Xray | - |
|
| ⭐ Medical World Model | - |
|
| Surgical Vision World Model | - |
Social Science:
| Paper | Venue | Resources |
|---|---|---|
| Social World Models | - |
|
| Social World Model-Augmented Mechanism Design | - |
|
| SocioVerse | - |
Key talks and presentations on world models:
📺 For complete list of talks (50+), see docs/learning/talks.md
Online Courses:
- Deep Reinforcement Learning (UC Berkeley CS285) - Covers model-based RL and world models
- Stanford CS330: Deep Multi-Task and Meta Learning - Includes world model architectures
- MIT 6.S898: Deep Learning - Foundation models and world models
Tutorials:
- Implementing DreamerV3 from Scratch - Official PyTorch implementation
- World Models Tutorial - Interactive introduction to world models
- CARLA Autonomous Driving Tutorial - Simulation-based learning
🎓 For complete list of courses and tutorials (30+), see docs/learning/tutorials.md
Autonomous Driving:
- nuScenes - 1000 scenes with camera, LiDAR, radar
- Waymo Open Dataset - 1000 segments, 200k frames
- KITTI - Classic autonomous driving benchmark
- Argoverse 2 - 1000 scenarios, forecasting
- Occ3D - 16k frames for occupancy prediction
- CARLA - Open-source driving simulator
Robotics:
- CALVIN - 24k manipulation trajectories
- RoboNet - 15M robot manipulation frames
- RoboCasa - 100k kitchen manipulation trajectories
- Open X-Embodiment - 1M+ multi-robot trajectories
- Habitat-Matterport 3D - 90 indoor scenes
Games:
- Minecraft - Procedural 3D environments
- Atari 2600 - Classic RL benchmark (57 games)
- MineRL - 60M frames of Minecraft gameplay
📊 For complete list of datasets (50+), see docs/resources/datasets.md
Evaluation Tools:
- WorldLens - Comprehensive evaluation framework for driving world models
- VBench - Video generation quality metrics
- FVD - Fréchet Video Distance for video quality
- LPIPS - Learned Perceptual Image Patch Similarity
🎯 For complete benchmarks and leaderboards, see docs/resources/benchmarks.md
Frameworks:
- DreamerV3 - State-of-the-art model-based RL
- Stable Diffusion - Foundation for video generation models
- PyTorch3D - 3D deep learning library
Simulation:
- CARLA - Open-source driving simulator
- Isaac Sim - NVIDIA robotics simulator
- MuJoCo - Physics engine for robotics
🛠️ For complete list of tools, see docs/resources/tools.md
- CVPR 2025 Workshop on World Models
- ICCV 2025 Workshop on 4D World Models - Bridging Generation and Reconstruction
- OpenDriveLab Challenges - Annual autonomous driving competitions
🏆 For complete list of workshops, see docs/community/workshops.md
Leading Labs:
- NVIDIA Toronto AI Lab - Cosmos, GAIA series
- Wayve - End-to-end driving with world models
- Tesla AI - FSD world model development
- UC Berkeley RAIL - Model-based RL research
- DeepMind - Genie, DreamerV3
👥 For complete list of research groups, see docs/community/research-groups.md
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
How to contribute:
- Fork the repository
- Add your paper/resource following the format
- Ensure all links are valid
- Submit a pull request
Format for papers:
| **Paper Title** | `Venue Year` | [](url) [](url) |If you find this repository useful, please consider citing:
@misc{awesome-world-models-2026,
title={Awesome World Models: A Comprehensive Collection},
author={Jing, Bowen},
year={2026},
howpublished={\url{https://github.com/Bowen12137/Awesome-World-Models}}
}- Total Papers: 489
- Starred Papers: 74
- Papers with Code: 195 (40%)
- Year Range: 2012 - 2026
- Top Venues: CVPR (41), ICCV (16), ICLR (10)
Last Updated: March 6, 2026
Made with ❤️ by the World Models community




