Skip to content

Bowen12137/Awesome-World-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

🌍 Awesome World Models

Awesome Papers License PRs Welcome

The Most Comprehensive Collection of World Models Research

Awesome Generative World Models: Video, 3D, Robotics & Driving

Spanning Video Generation, 3D/4D Modeling, Autonomous Driving, Embodied AI, and Beyond

🌳 World Models Evolutionary Tree

🔥 News & Updates

  • [2026-03-06] 🎉 Repository launched! Unified collection of 489 papers from four major World Models repositories
  • [2026-03-06] 🆕 Added 11 latest papers from arXiv 2026 (LaST-VLA, ResWorld, DriveWorld-VLA, etc.)
  • [2026-03-06] 📊 Added comprehensive statistics and visualizations: 74 starred papers, 195 with code
  • [2026-03-06] 🗂️ Introduced dual-dimension taxonomy: Paradigm (VideoGen/OccGen/LiDARGen) + Application domains
  • [2026-03-06] 🤖 Integrated learning resources: talks, courses, tutorials, and datasets

📖 Table of Contents


💡 What are World Models?

World Models are AI systems that learn internal representations of their environment to predict future states, simulate scenarios, and enable intelligent decision-making. They bridge perception and action by building a mental model of how the world works.

Key Concepts

  • Predictive Modeling: Learning to forecast future observations from current state and actions
  • Latent Representations: Compressing high-dimensional sensory data into meaningful internal states
  • Simulation: Generating synthetic experiences for planning, training, and evaluation
  • Generalization: Transferring learned world knowledge to new scenarios and tasks

Why World Models Matter

  1. Data Efficiency: Learn from fewer real-world interactions by leveraging simulated experience
  2. Safety: Test dangerous scenarios in simulation before deployment
  3. Interpretability: Explicit world representations enable better understanding of AI decisions
  4. Generalization: Transfer knowledge across tasks and domains
  5. Planning: Enable look-ahead reasoning for complex decision-making

🗺️ Taxonomy

This repository organizes world models research along two complementary dimensions:

Dimension 1: Representation Paradigms

How world models represent and generate environmental states:

  • 🎬 VideoGen: Video-based representations using pixel-space generation

    • Leverages powerful video generation models (diffusion, transformers)
    • Natural for camera-based perception systems
    • Examples: Genie, GAIA-1, DriveDreamer
  • 🧊 OccGen: Occupancy-based 3D representations

    • Explicit 3D spatial structure using voxel grids or occupancy fields
    • Efficient for 3D reasoning and planning
    • Examples: OccWorld, GaussianWorld, UniScene
  • 📡 LiDARGen: LiDAR-based point cloud generation

    • Direct modeling of 3D sensor data
    • Preserves geometric precision
    • Examples: LiDARGen, DynamicCity, LiSTAR

Dimension 2: Application Domains

Where world models are applied:

  • 🚗 Autonomous Driving: Scene prediction, planning, and simulation for self-driving vehicles
  • 🤖 Embodied AI & Robotics: Manipulation, navigation, and interaction in physical environments
  • 🎮 Game Simulation & XR: Procedural content generation and interactive experiences
  • 🔬 Scientific Applications: Physics simulation, molecular dynamics, climate modeling

📖 For detailed taxonomy explanation, see docs/research/taxonomy.md


📚 Research

Surveys & Reviews

Comprehensive surveys and review papers on world models:

Title Venue Resources
The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey arXiv 25.02 arXiv
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI TMECH 25 arXiv Code
3D and 4D World Modeling: A Survey arXiv 25.09 arXiv
A Comprehensive Survey on World Models for Embodied AI arXiv 25.10 arXiv
A Step Toward World Models: A Survey on Robotic Manipulation arXiv 25.11 arXiv
A Survey of Embodied World Models 25.09 Paper
A Survey on Future Physical World Generation for Autonomous Driving MMAsia 25 Paper
The Safety Challenge of World Models for Embodied AI Agents: A Review arXiv 25.10 arXiv
Progressive Robustness-Aware World Models in Autonomous Driving: A Review and Outlook techrXiv 25.11 Paper
A Path Towards Autonomous Machine Intelligence (Yann LeCun) OpenReview OpenReview Video

📖 For complete list of surveys, see docs/research/surveys.md


Papers by Paradigm

🎬 VideoGen: Video-based World Models

Video-based world models generate future frames in pixel space, leveraging powerful video generation architectures.

Key Papers:

🚗 Autonomous Driving Papers

Paper Venue Resources
LaST-VLA: Latent Spatio-Temporal VLA for Autonomous Driving arXiv 2026 arXiv
Hyper Diffusion Planner: End-to-End Autonomous Driving with Real-Vehicle Deployment arXiv 2026 arXiv
ResWorld: Temporal Residual World Model for Dynamic Object Modeling arXiv 2026 arXiv
DriveWorld-VLA: Unifying World Modeling and Planning in Latent Space arXiv 2026 arXiv
RaWMPC: Risk-aware World Model Predictive Control arXiv 2026 arXiv
DiffusionHarmonizer: Online Generative Enhancement for Driving Simulation arXiv 2026 arXiv
A Survey of World Models for Autonomous Driving - arXiv
Cosmos-Drive-Dreams - arXiv Code Website
Drive-OccWorld AAAI 2025 arXiv Code Website
DriveDreamer-2 AAAI 2025 arXiv Code Website
DriveDreamer4D CVPR 2025 arXiv Code Website
MagicDrive-V2 ICCV 2025 arXiv Website
SubjectDrive AAAI 2025 arXiv Website
DriveDreamer ECCV 2024 arXiv Code Website
DriveWorld CVPR 2024 arXiv
DrivingDiffusion ECCV 2024 arXiv Code Website
DrivingDojo Dataset IPS 2024 arXiv Website
MagicDrive ICLR 2024 arXiv Code Website
Workshop on Foundation Models for Autonomous Systems CVPR 2024 -
AdaptiveDriver - arXiv Code
Dream to Drive - arXiv
Dream4Drive - arXiv Website
Drive-JEPA - arXiv
DriveGenVLM - arXiv
DrivePhysica - arXiv Code Website
DriveVLA-W0 - arXiv
DrivingGPT - arXiv Website
GAIA-2 - arXiv Website
WorldLens - arXiv Website
STAGE - arXiv
ReSim - arXiv Code Website
Dreamland - arXiv Website
LongDWM - arXiv Website
GeoDrive - arXiv Code
FutureSightDrive - arXiv Code
Raw2Drive - arXiv
VL-SAFE - arXiv Website
PosePilot - arXiv
DriVerse - arXiv
MiLA - arXiv Website
SimWorld - arXiv Website
UniFuture - arXiv Website
EOT-WM - arXiv
MaskGWM - arXiv
HERMES - arXiv
AdaWM - arXiv
AD-L-JEPA - arXiv
DrivingWorld - arXiv Code Website
GEM - arXiv Website
GaussianWorld - arXiv Code
Doe-1 - arXiv Website Code
InfiniCube - arXiv Website
InfinityDrive - arXiv Website
ReconDreamer - arXiv Website
Imagine-2-Drive - arXiv Website
DynamicCity - arXiv Website Code
DOME - arXiv Website
SSR - arXiv Code
LatentDriver - arXiv Code
RenderWorld - arXiv
OccLLaMA - arXiv
CarFormer - arXiv Code
BEVWorld - arXiv Code
TOKEN - arXiv
SimGen - arXiv Code
UnO - arXiv Code
LAW - arXiv Code
Delphi - arXiv Code
OccSora - arXiv Code
MagicDrive3D - arXiv Code
Vista - arXiv Code
CarDreamer - arXiv Code
DriveSim - arXiv Code
LidarDM - arXiv Code
GenAD - arXiv Website
GenAD (End-to-End) - arXiv Code
ViDAR - arXiv Code
Drive-WM - arXiv Code
Cam4DOCC - arXiv Code
Panacea - arXiv Code
OccWorld - arXiv Code
SafeDreamer - OpenReview Code
SEM2 - Paper
DrivingGen - arXiv Website
DrivingWorld - arXiv Code Website
GenieDrive - arXiv Website
ImagiDrive - arXiv Code
Imagine-2-Drive - arXiv Website
InfinityDrive - arXiv Website
Interplay Between Video Generation and World Models in Autonomous Driving - arXiv
LatentDriver - arXiv Code
MagicDrive3D - arXiv Code Website
MoVieDrive - arXiv
Think Before You Drive - arXiv
Think2Drive - arXiv
TrafficBots - arXiv Code
UniDrive-WM - arXiv Website
World Models for Autonomous Driving: An Initial Survey - arXiv
World4Drive - arXiv
**`CVPR 24 Workshop & Challenge OpenDriveLab`** -
**`CVPR 25 Workshop & Challenge OpenDriveLab`** -

Papers by Application

🚗 Autonomous Driving

World models for scene prediction, planning, and simulation in self-driving vehicles.

📄 See complete list above (40 papers total)


🤖 Embodied AI & Robotics

World models for manipulation, navigation, and interaction in physical environments.

| ⭐ DreamDojo | arXiv 2026 | arXiv Website | | ⭐ World Models as Data Engine | arXiv 2025 | arXiv |

🤖 Embodied AI Papers

Paper Venue Resources
Hand2World: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures arXiv 2026 arXiv Website
GaussTwin: Unified Simulation and Correction with Gaussian Splatting for Robotic Digital Twins arXiv 2026 arXiv
"Learning Primitive Embodied World Models: Towards Scalable Robotic Learning" - arXiv Website
Agent Learning via Early Experience - arXiv
General agents Contain World Models - arXiv
Persistent Embodied World Models - arXiv
Self-Improving Embodied Foundation Models - arXiv
World Models for Embodied AI - arXiv
Workshop on Embodied World Models for Decision Making IPS 2025 -
EmbodieDreamer - arXiv Code Website
Embodied AI Agents: Modeling the World - arXiv
PhysicalAgent - arXiv
Video Agent - arXiv Website
Web Agents with World Models - arXiv

| View-Consistent 4D World Model | arXiv 2026 | arXiv | | ⭐ World Models as Reliable Simulators | arXiv 2026 | arXiv |

🦾 Robotics Papers

Paper Venue Resources
"Learning Primitive Embodied World Models: Towards Scalable Robotic Learning" - arXiv Website
"Multi-Task Interactive Robot Fleet Learning with Visual World Models" - arXiv Code Website
"Object-Centric World Model for Language-Guided Manipulation" [![arXiv](https://img - arXiv
Robotic World Model - arXiv
Genie Envisioner - arXiv Website Code
WoW - arXiv Website Code
UnifoLM-WMA-0 - Website Code
iVideoGPT - arXiv Website Code
PointWorld - arXiv Website
Dex-WM - arXiv Website
FLARE - arXiv Website
Enerverse - arXiv Website
AgiBot-World - arXiv Website Code
DyWA - arXiv Website Code
TesserAct - arXiv Website Code
DreamGen - arXiv Website Code
HiP - arXiv Website Code
PAR - arXiv Website Code
iMoWM - arXiv Website
WristWorld - arXiv Website Code
EMMA - arXiv Website
PhysTwin - arXiv Website Code
KeyWorld - arXiv
World4RL - arXiv Website
SAMPO - arXiv
GWM - arXiv Website Code
Flow-as-Action - arXiv
RoboScape - arXiv Code
ParticleFormer - arXiv Website
ManiGaussian++ - arXiv Code
GAF - arXiv Website
3DFlowAction - arXiv Code
ORV - arXiv Code Website
WoMAP - arXiv Website
OSVI-WM - arXiv
LaDi-WM - arXiv Website Code
FlowDreamer - arXiv Website Code
PIN-WM - arXiv Website Code
RoboMaster - arXiv Website Code
ManipDreamer - arXiv Website
AdaWorld - arXiv Website Code
EVA - arXiv Website Code
DexSim2Real² - arXiv Code
LUMOS - arXiv Website Code
DEMO³ - arXiv Website Code
RoboHorizon - arXiv
Dream to Manipulate - arXiv Website
RoboDreamer - arXiv Code Website
Vidar - arXiv
ManiGaussian - arXiv Code Website
WHALE - arXiv
VisualPredicator - arXiv
PIVOT-R - arXiv Website Code
Video2Action - arXiv Website Code
Diffuser - arXiv Website Code
Decision Diffuser - arXiv Code

🚶 Navigation Papers

Paper Venue Resources
NWM (Navigation World Models) - arXiv Website
MindJourney - arXiv Website
NavMorph - arXiv Code
Unified World Models - arXiv Code
RECON - arXiv Website
WMNav - arXiv Website
NaVi-WM - arXiv Website
AIF - arXiv
X-MOBILITY - arXiv
MWM - arXiv Website Code

🦿 Locomotion Papers

Paper Venue Resources
Ego-VCP - arXiv Website Code
RWM-O - arXiv
DWL - arXiv
HRSSM - arXiv Code
WMP - arXiv Website
TrajWorld - arXiv Code
Puppeteer - arXiv Code
ProTerrain - arXiv
Occupancy World Model - arXiv
1X World Model - Blog
GROOT-Dreams - Blog
Humanoid World Models - arXiv
Ego-Agent - arXiv
D²PO - arXiv
COMBO - arXiv Website Code

🤖💬 Vision-Language-Action (VLA) Models

World models integrated with vision-language-action architectures for robotic control.

Paper Venue Resources
CoT-VLA - arXiv Website
UP-VLA - arXiv Code
VPP - arXiv Website
MinD - arXiv Website
DreamVLA - arXiv Code Website
WorldVLA - arXiv Code
3D-VLA - arXiv
LAWM - arXiv Code
UniVLA - arXiv Code
dVLA - arXiv
UD-VLA - arXiv Code Website
Goal-VLA - arXiv Website
Vidarc - arXiv
VideoVLA - arXiv Website
Motus - arXiv Website
mimic-video - arXiv Website
Ctrl-World - arXiv Website Code
VLA-RFT - arXiv
World-Env - arXiv
GigaBrain-0.5M - arXiv Website Code
RISE - arXiv Website
GigaBrain-0 - arXiv Website
WMPO - arXiv Website

🎯 Policy Learning with World Models

General policy learning methods leveraging world models for embodied AI.

Paper Venue Resources
LingBot-VA - arXiv Website Code
UWM - arXiv Website
UVA - arXiv Website Code
DiWA - arXiv Code
Dreamerv4 - arXiv Website
LVP - arXiv Website
LDA-1B - arXiv Website Code

🎮 Game Simulation & XR

World models for procedural content generation and interactive experiences.

| ⭐ Web World Models | arXiv 2025 | arXiv Website | | Large-Scale World Model for Web Agent | arXiv 2026 | arXiv | | World-Model-Augmented Web Agents | arXiv 2026 | arXiv | | Multiplayer Video World Model in Minecraft | arXiv 2026 | arXiv |

🎮 Game & Simulation Papers

Paper Venue Resources
Is Sora a World Simulator - arXiv Website
Matrix-Game - arXiv Code
Matrix-Game 2.0 - arXiv Code Website
AnimeGamer - arXiv Website
GameFactory - arXiv Code Website
Hunyuan-GameCraft-2 - arXiv Website
Interactive Generative Video as Next-Generation Game Engine - arXiv
Interplay Between Video Generation and World Models in Autonomous Driving - arXiv
World Models and Physical Simulation - arXiv Website
GameNGen - arXiv Website Code
DIAMOND - arXiv Website Code
MineWorld - arXiv
HunyuanWorld 1.0 - arXiv Website
Oasis - arXiv Website Code
Genie - arXiv Website
Genie 2 - Website
WorldCrafter - arXiv Website Code
Cosmos - Website

💡 Theory & Explainability

Theoretical foundations and explainability of world models.

Paper Venue Resources
Inductive Biases in Transformers arXiv 2026 arXiv
Physical Grounding in World Models arXiv 2026 arXiv

👥 Social World Models & Multi-Agent Systems

World models for social interaction, multi-agent coordination, and human behavior prediction.

Paper/Project Type Resources
FreeAskWorld Framework Code Paper
Model-Based Social Navigation Navigation Code
SOMA: Socio-physical Model of Activities Activity Model Code
Mini-Genie: Multi-Agent World Model Multi-Agent Code
Social World Model Simulation Simulation Code
MotionLM: Multi-Agent Motion Forecasting Prediction Website
Melting Pot: Multi-Agent RL Evaluation Benchmark Website

Key Topics:

  • 🤝 Human-robot interaction modeling
  • 🚶 Pedestrian behavior prediction
  • 🎭 Social navigation in crowded environments
  • 🤖 Multi-agent coordination and communication
  • 🧠 Theory of mind for AI agents

📖 For more on multi-agent systems, see Multi-Agent Reinforcement Learning


🔬 World Models for Science

World models applied to scientific domains including medicine, biology, and social sciences.

Natural Science:

Paper Venue Resources
World Models for Clinical Prediction - arXiv
CellFlux - arXiv Website
CheXWorld - arXiv Code
EchoWorld - arXiv Code
ODesign - arXiv Website
SFP - arXiv
Xray2Xray - arXiv
Medical World Model - arXiv
Surgical Vision World Model - arXiv

Social Science:

Paper Venue Resources
Social World Models - arXiv
Social World Model-Augmented Mechanism Design - arXiv
SocioVerse - arXiv Code

🎓 Learning Resources

📺 Talks & Presentations

Key talks and presentations on world models:

Title Speaker Venue Resources
A Path Towards Autonomous Machine Intelligence Yann LeCun Meta AI Video Paper
World Models for Autonomous Driving Ashok Elluswamy Tesla AI Day 2024 Video
GAIA-1: A Generative World Model for Autonomous Driving Wayve Team NVIDIA GTC 2023 Website
NVIDIA Cosmos: Physical AI with World Foundation Models NVIDIA Team GTC 2025 Website
Genie: Generative Interactive Environments DeepMind Team Tech Talk 2024 Website

📺 For complete list of talks (50+), see docs/learning/talks.md


🎓 Courses & Tutorials

Online Courses:

  • Deep Reinforcement Learning (UC Berkeley CS285) - Covers model-based RL and world models Website
  • Stanford CS330: Deep Multi-Task and Meta Learning - Includes world model architectures Website
  • MIT 6.S898: Deep Learning - Foundation models and world models Website

Tutorials:

🎓 For complete list of courses and tutorials (30+), see docs/learning/tutorials.md


🔧 Practical Resources

📊 Datasets

Autonomous Driving:

  • nuScenes - 1000 scenes with camera, LiDAR, radar Website Paper
  • Waymo Open Dataset - 1000 segments, 200k frames Website
  • KITTI - Classic autonomous driving benchmark Website
  • Argoverse 2 - 1000 scenarios, forecasting Website
  • Occ3D - 16k frames for occupancy prediction Website
  • CARLA - Open-source driving simulator Website

Robotics:

  • CALVIN - 24k manipulation trajectories Website Code
  • RoboNet - 15M robot manipulation frames Website
  • RoboCasa - 100k kitchen manipulation trajectories Website
  • Open X-Embodiment - 1M+ multi-robot trajectories Website
  • Habitat-Matterport 3D - 90 indoor scenes Website

Games:

  • Minecraft - Procedural 3D environments
  • Atari 2600 - Classic RL benchmark (57 games)
  • MineRL - 60M frames of Minecraft gameplay Website

📊 For complete list of datasets (50+), see docs/resources/datasets.md


🎯 Benchmarks & Leaderboards

Evaluation Tools:

  • WorldLens - Comprehensive evaluation framework for driving world models arXiv
  • VBench - Video generation quality metrics arXiv
  • FVD - Fréchet Video Distance for video quality
  • LPIPS - Learned Perceptual Image Patch Similarity

🎯 For complete benchmarks and leaderboards, see docs/resources/benchmarks.md


🛠️ Tools & Libraries

Frameworks:

  • DreamerV3 - State-of-the-art model-based RL Code
  • Stable Diffusion - Foundation for video generation models
  • PyTorch3D - 3D deep learning library

Simulation:

  • CARLA - Open-source driving simulator Website
  • Isaac Sim - NVIDIA robotics simulator
  • MuJoCo - Physics engine for robotics

🛠️ For complete list of tools, see docs/resources/tools.md


🌐 Community

🏆 Workshops & Challenges

  • CVPR 2025 Workshop on World Models Website
  • ICCV 2025 Workshop on 4D World Models - Bridging Generation and Reconstruction
  • OpenDriveLab Challenges - Annual autonomous driving competitions

🏆 For complete list of workshops, see docs/community/workshops.md


👥 Research Groups

Leading Labs:

  • NVIDIA Toronto AI Lab - Cosmos, GAIA series
  • Wayve - End-to-end driving with world models
  • Tesla AI - FSD world model development
  • UC Berkeley RAIL - Model-based RL research
  • DeepMind - Genie, DreamerV3

👥 For complete list of research groups, see docs/community/research-groups.md


🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

How to contribute:

  1. Fork the repository
  2. Add your paper/resource following the format
  3. Ensure all links are valid
  4. Submit a pull request

Format for papers:

| **Paper Title** | `Venue Year` | [![arXiv](badge)](url) [![Code](badge)](url) |

📜 Citation

If you find this repository useful, please consider citing:

@misc{awesome-world-models-2026,
  title={Awesome World Models: A Comprehensive Collection},
  author={Jing, Bowen},
  year={2026},
  howpublished={\url{https://github.com/Bowen12137/Awesome-World-Models}}
}

⭐ Star History

Star History Chart


📊 Repository Statistics

Overview

  • Total Papers: 489
  • Starred Papers: 74
  • Papers with Code: 195 (40%)
  • Year Range: 2012 - 2026
  • Top Venues: CVPR (41), ICCV (16), ICLR (10)

Visualizations

Papers by Year

Papers by Year

Top Venues

Top Venues

Category Distribution

Category Distribution

Resource Availability

Resource Availability

Last Updated: March 6, 2026


Made with ❤️ by the World Models community

⬆ Back to Top

About

This repository is the collection of World model Papers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors