# DVD-JEPA – a JEPA world model that dreams a bouncing DVD logo

> Source: <https://dvd-jepa.vercel.app>
> Published: 2026-06-13 12:50:06+00:00

DVD-JEPA — a world model that dreams a bouncing logo

A small but real Joint-Embedding Predictive Architecture: a context encoder, an
EMA target encoder, and a predictor that imagines the future in representation
space. It learned the physics of a bouncing DVD logo without ever being told a
coordinate. The decoder is optional — a pure JEPA only speaks in vectors. Everything below
is the trained model running client-side; no server, no GPU.

Realityground truth

JEPA's expectationdecoded

Predictive surprise (reality vs. expectation)

surprise: —⚠ ANOMALY DETECTED

The model's mind — 32-d latent z

mode: monitor

Tip: turn the Decoder off to see what a pure JEPA actually gives you —
just the 32 latent bars. It understands the bounce perfectly and refuses to draw it. Turn it
back on to render the dream. Hit Inject anomaly to teleport the logo and watch
the surprise meter spike.

01 / predict

Future in latent space

The predictor steps one tick forward as a vector, not a picture. Trained to match an EMA
target encoder's embedding of the real next frame — the core JEPA objective.

02 / render

The optional decoder

A pure JEPA has no decoder. Bolt one on and the latent dream becomes pixels — turning the
model into a future-frame video predictor you can actually watch.

03 / detect

Surprise = anomaly

When reality stops matching the rendered expectation, prediction error spikes. That's a
usable anomaly signal — the same job a real egocentric-video world model does.
