
World Models Can Render Anything. But Can They Think?
Introducing WM Bench: A Benchmark for Cognitive Intelligence in World Models FINAL Bench Family · March 2026 The field of world models has made remarkable progress. From NVIDIA Cosmos to Meta V-JEPA 2, from DeepMind Genie 3 to Physical Intelligence π0, the pace of development is extraordinary. Yet a question remains largely unanswered: How do we measure whether a world model actually understands what is happening — not just renders it convincingly? FID tells us a model's output looks realistic. FVD tells us its videos flow naturally. HumanML3D and BABEL tell us its motions are human-like. None of them tell us whether the model thinks . The Gap We're Trying to Address Consider a simple scenario: a charging beast, 3 meters away, closing fast. A world model with excellent FID scores can generate that scene beautifully. But does it know the character should sprint away — not walk? Does it respond differently when the threat is a human rather than an animal? Does it remember that the left c
Continue reading on Dev.to Webdev
Opens in a new tab


