Concepts & Patterns

World Models

An alternative AI paradigm to LLMs — models that simulate the physical world internally rather than predicting text tokens. Relevant to robotics, autonomous vehicles, video generation, and spatial reasoning. Key players: Yann LeCun/AMI, World Labs (Fei-Fei Li), Google (SEMA/Genie), NVIDIA (Cosmos).

Created Apr 9, 2026·Updated Apr 20, 2026

Overview

World models approach intelligence differently from LLMs. Instead of training on text tokens (the "highest abstraction that describes the physical world"), world models are trained to simulate physical environments internally — learning cause-and-effect, spatial relationships, and physics.

The original architecture (David Ha, 2018) uses three components:

  1. Vision model — Observes the environment via variational autoencoder, compresses visual input to latent space
  2. MDN-RNN — Processes compressed observations, stores hidden states for temporal prediction
  3. Controller — Samples from both components to produce actions

The key claim: this modeling approach is closer to how humans develop understanding and may get closer to AGI than pure language models.

The LLM vs. World Models Debate

Yann LeCun (who left Meta to start AMI, seeking ~$5B valuation) argues LLMs "don't understand the physical world beyond auto-regressive token prediction." But this view has nuances:

  • Language contains more information about the physical world than the pure critique suggests — grammar, figures of speech, and word categories all encode physical world facts
  • Multimodal LLMs (GPT-4, Gemini) blur the boundary by adding vision capabilities via cross-attention
  • Vision-Language-Action (VLA) models combine vision transformers with LLMs to create action tokens — powering systems like the Neo humanoid robot

Andrej Karpathy offers a middle view: LLMs can "regurgitate word-for-word" Wikipedia pages but struggle with abstract concept learning compared to children. Humans' poor memorization is "actually a feature" — it forces pattern-finding at a more general level.

LeCun's Full Position (Wired Interview)

In a detailed Wired interview (Steven Levy), LeCun laid out his comprehensive stance on AI's trajectory and the open-source imperative:

On current AI's limits: "Machine learning is great. But the idea that we're going to just scale up the techniques that we have and get to human-level AI? No. We're missing something big to get machines to learn efficiently, like humans and animals do." LLMs produce fluent text with good style, "but they're boring, and what they come up with can be completely false."

On AGI: LeCun rejects the term entirely — "there is no such thing as general intelligence. Intelligence is not a linear thing you can measure." Machines will eventually be smarter than humans ("could be years, could be centuries"), but there's no reason to believe intelligent systems will want to dominate us. "People are mistaken when they imagine that AI systems will have the same motivations as humans."

Objective-driven AI: LeCun's current research direction. Build machines with explicit objectives and guardrails as architectural constraints, not post-hoc patches. "Putting drives into AI systems is the only way to make them controllable and safe." No working demonstration exists yet.

The open-source imperative: When all interactions with the digital world are mediated by AI, "you do not want that AI system to be controlled by a small number of companies on the West Coast of the US." European governments bought this argument — France, Germany, and Italy blocked EU attempts to regulate open source AI. LeCun personally influenced the French government's position. "Open platforms progress faster. The systems you end up with are more secure and perform better."

On OpenAI: "They imagined creating a nonprofit as a counterweight to bad guys like Google and Meta. I said that's wrong. They are no longer open. Meta has always been open and still is." The research world "doesn't care too much about OpenAI anymore, because they're not publishing."

On AI doom: LeCun stands firmly against the doomer camp, arguing fear-mongering risks scaring people away from beneficial technology — the same mistake made with the printing press. The way to stay ahead of bad actors: "progress faster. The way to progress faster is to open the research."

On art and soul: AI will produce technically superior music, but lacks "the essence of improvised music, which relies on communication of mood and emotion from a human." Truthfulness is essential to the artistic experience — AI-generated art at Charlie Parker's level would be "like Milli Vanilli."

See also: AGI Definitions for where LeCun's views fit in the broader AGI debate.

Open vs. Closed Models: The Mid-2026 Landscape

Nathan Lambert (Apr 2026) published 13 beliefs on the open vs. closed model dynamic, extending the debate LeCun framed around open-source AI:

The surprise: Closed models did not show a growing capability margin over open models through 2025-2026. Many expected the gap to widen as scaling laws kicked in. Instead, open models kept pace on benchmarks — though with important caveats.

Key beliefs:

  1. Closed models are more robust and generally useful than benchmarks suggest — they handle edge cases, long-context, and multi-step reasoning more reliably
  2. Economics will matter more than capabilities long-term — whoever can serve tokens cheapest wins the commodity layer
  3. Chinese labs (DeepSeek, Qwen) may face funding difficulties as the geopolitical situation tightens and domestic capital reallocates
  4. The RL (reinforcement learning) era favors closed labs — RL training requires massive compute and proprietary reward signals that open efforts struggle to replicate
  5. Local agents running on consumer hardware are a "dark matter market" for open models — usage is real but doesn't show up in API revenue or benchmarks
  6. Open model progress is increasingly driven by distillation from closed models, which creates a dependency that could become a vulnerability
  7. The best open models in 2026 are fine-tunes of base models from 2-3 labs (Meta, Mistral, sometimes Alibaba) — the open ecosystem is less diverse than it appears

Connection to LeCun's position: LeCun argues open-source AI is essential for democratic control. Lambert's data suggests open models are viable but increasingly dependent on closed-model innovations trickling down — a tension that may test the open-source imperative as the RL era deepens.

Key Players

  • AMI (Yann LeCun) — Advanced Machine Intelligence, JEPA-based world models
  • World Labs (Fei-Fei Li) — Spatial intelligence startup ($230M+ raised). Product: Marble (Gaussian splat environments)
  • Google — SEMA (Mar 2024), SEMA 2 (Nov 2025), Genie 3 (hyperrealistic interactive worlds)
  • NVIDIA Cosmos — Open-source world foundation model for data augmentation and downstream training (autonomous vehicles, robots, video agents)

Relevance

Y Combinator's 2026 RFS explicitly calls out spatial reasoning models as a major opportunity — "unlocking the next wave of AI capability" and potentially defining the next foundation model company on the scale of OpenAI or Anthropic.

See Also

  • Physical AI — the five shared technical primitives (perception, planning, control, simulation, hardware) converging across robotics, autonomous science, and new interfaces

Sources

  • "World Models explained in 10min" — Caleb Writes Code (video, Apr 2026) (link)
  • "How Not to Be Stupid About AI, With Yann LeCun" — Steven Levy / Wired (link)
  • "My bets on open models, mid-2026" — Nathan Lambert (Apr 2026) (link)