{"slug": "neural-voxel-dynamics-learning-implicit-3d-physics-via-volumetric-feature", "title": "Neural Voxel Dynamics: Learning Implicit 3D Physics via Volumetric Feature Advection", "summary": "Researchers introduced Neural Voxel Dynamics, a self-supervised framework that learns implicit 3D physics from video by lifting 2D features into a volumetric latent space. The method achieves long-term structural stability and physical plausibility on benchmarks without relying on explicit simulators, offering a scalable path toward general-purpose dynamic world models.", "body_md": "arXiv:2606.26410v1 Announce Type: new\nAbstract: We present a self-supervised framework for learning implicit 3D physical dynamics directly from video-derived supervisory signals. While current generative video models achieve high visual fidelity, they lack a 3D geometric foundation, often resulting in physical inconsistencies and a failure to maintain object permanence. We address this by shifting the predictive bottleneck from 2D image space to a `lifted' 3D Volumetric Latent Space. Our method unprojects semantic features from a Video Joint-Embedding Predictive Architecture (V-JEPA) into a voxelized grid, grounded by monocular depth priors. This lifting enables a Volumetric Feature Advection to learn an action-conditioned transition operator that treats physics as a spatio-temporal state advection problem, i.e., learn implicit 3D physics. Unlike state-of-the-art hybrid models that rely on explicit classical simulators for training and/or inference, our architecture tracks material states implicitly within high-dimensional V-JEPA features. This allows for the emergent simulation of heterogeneous phenomena (e.g., rigid body motion in fluid flow) within a single, unified pipeline. Supervised solely via end-to-end video-derived signal plus action conditions, without access to physics engine internal states, labels, or surrogate models, our model demonstrates good long-term structural stability and physical plausibility on multiple benchmarks (CLEVERER, PhysInOne, PhysGaia). We believe that this work opens a scalable pathway toward general-purpose dynamic world models that internalize the 3D invariants of the physical world solely through passive observation of monocular videos.", "url": "https://wpnews.pro/news/neural-voxel-dynamics-learning-implicit-3d-physics-via-volumetric-feature", "canonical_source": "https://arxiv.org/abs/2606.26410", "published_at": "2026-06-26 04:00:00+00:00", "updated_at": "2026-06-26 04:09:31.431928+00:00", "lang": "en", "topics": ["machine-learning", "computer-vision", "neural-networks", "ai-research"], "entities": ["Neural Voxel Dynamics", "V-JEPA", "CLEVERER", "PhysInOne", "PhysGaia"], "alternates": {"html": "https://wpnews.pro/news/neural-voxel-dynamics-learning-implicit-3d-physics-via-volumetric-feature", "markdown": "https://wpnews.pro/news/neural-voxel-dynamics-learning-implicit-3d-physics-via-volumetric-feature.md", "text": "https://wpnews.pro/news/neural-voxel-dynamics-learning-implicit-3d-physics-via-volumetric-feature.txt", "jsonld": "https://wpnews.pro/news/neural-voxel-dynamics-learning-implicit-3d-physics-via-volumetric-feature.jsonld"}}