DVD-JEPA — a world model that dreams a bouncing logo
A small but real Joint-Embedding Predictive Architecture: a context encoder, an EMA target encoder, and a predictor that imagines the future in representation space. It learned the physics of a bouncing DVD logo without ever being told a coordinate. The decoder is optional — a pure JEPA only speaks in vectors. Everything below is the trained model running client-side; no server, no GPU.
Realityground truth
JEPA's expectationdecoded
Predictive surprise (reality vs. expectation)
surprise: —⚠ ANOMALY DETECTED
The model's mind — 32-d latent z
mode: monitor
Tip: turn the Decoder off to see what a pure JEPA actually gives you — just the 32 latent bars. It understands the bounce perfectly and refuses to draw it. Turn it back on to render the dream. Hit Inject anomaly to teleport the logo and watch the surprise meter spike.
01 / predict
Future in latent space
The predictor steps one tick forward as a vector, not a picture. Trained to match an EMA target encoder's embedding of the real next frame — the core JEPA objective.
02 / render
The optional decoder
A pure JEPA has no decoder. Bolt one on and the latent dream becomes pixels — turning the model into a future-frame video predictor you can actually watch.
03 / detect
Surprise = anomaly
When reality stops matching the rendered expectation, prediction error spikes. That's a usable anomaly signal — the same job a real egocentric-video world model does.