8 FPS Real‑Time Video on Consumer GPU MoVerse achieves 8 FPS real-time 360° walkthrough video on a single NVIDIA RTX 4090, demonstrating that interactive generative video no longer requires a multi-GPU cluster. The system uses a causal autoregressive renderer and a persistent 3D Gaussian scaffold to stream high-fidelity panoramas on consumer hardware. MoVerse delivers 360° walkthrough video at roughly 8 FPS on a single RTX 4090, proving that interactive generative video no longer requires a multi‑GPU cluster. The system expands a narrow‑field input into a full panorama, lifts it onto a persistent 3D Gaussian scaffold, and streams the result through a causal autoregressive renderer the distilled student model . All of this runs on desktop‑class hardware, not on a laptop or server farm. Before this contribution, high‑fidelity video synthesis depended on either diffusion pipelines that rendered frames offline with dozens of GPUs or explicit 3D representations that struggled to exceed 1–2 FPS on a single card. Researchers typically traded visual quality for speed, and real‑time roaming was considered out of reach for consumer GPUs. Consequently, AR/VR prototypes had to fall back to pre‑recorded assets or low‑resolution sprites. In our deployment configuration, this causal renderer reaches 8 FPS end‑to‑end roaming of the scene on a single NVIDIA RTX 4090 GPU. The authors emphasize that the same hardware can handle the full pipeline—from panorama diffusion to scaffold rendering—without bottlenecking the user‑controlled camera path 1 https://arxiv.org/abs/2606.13376 . The key to that throughput is the decoupling of world construction from observation rendering: a gravity‑aligned 360° panorama is first completed with topology‑aware diffusion, then a geometry‑aware residual predictor populates a dense Gaussian scaffold that can be rasterized in a single pass. Because the scaffold is explicit, the downstream autoregressive student only needs to refine already rendered views, keeping the per‑frame latency bounded. The approach still caps the interactive experience at 8 FPS, well short of the 30 FPS horizon typically required for fluid VR head‑mounted displays. Moreover, the method assumes a single narrow‑field input and a static scene geometry, leaving open how it would cope with dynamic objects or longer trajectories. If these numbers hold, benchmark suites for real‑time generative video should add a consumer‑GPU track that measures end‑to‑end FPS on a single RTX 4090. Downstream developers can now prototype free‑roaming AR experiences without procuring a GPU cluster, shifting the cost curve from capital‑intensive farms to a single high‑end desktop.