04:00
2026-06-25
arxiv.org
machine-learning
MJEPA: A Simple and Scalable Joint-Embedding Predictive Architecture for Audio-Visual Learning
Researchers introduced MJEPA, a joint-embedding predictive architecture for audio-visual learning that uses a single unified encoder and a single predictive objective. The model outperforms prior frozβ¦