{"slug": "predict-and-reconstruct-joint-objectives-for-self-supervised-language-learning", "title": "Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning", "summary": "Researchers at arXiv propose a hybrid pre-training objective for language models that combines a Joint Embedding Predictive Architecture (JEPA) latent-space prediction loss with masked language modelling (MLM). Pre-trained on English Wikipedia, the hybrid encoder produces more uniform embeddings and richer spectral geometry than a pure-MLM baseline, encoding less surface-level lexical information while maintaining similar downstream accuracy on GLUE benchmarks. The findings suggest that the JEPA objective fundamentally reshapes the latent space in ways standard accuracy metrics fail to capture.", "body_md": "arXiv:2606.05173v1 Announce Type: new\nAbstract: Masked language modelling (MLM) has been the dominant pre-training objective for text encoders since BERT, yet it encourages representations that are strongly anchored to surface-form token identity rather than deeper semantic structure. Inspired by the success of Joint Embedding Predictive Architectures (JEPA) (LeCun, 2022) in vision and audio, we propose a hybrid pre-training objective that combines a JEPA-style latent-space prediction loss with a standard MLM objective over a single shared encoder. A learnable scalar parameter continuously balances the two objectives during training. We pre-train both a hybrid model and a pure-MLM baseline on English Wikipedia using identical architectures and compute budgets (NVIDIA H100). Extensive representation analysis across five GLUE benchmarks (SST-2, MRPC, MNLI, CoLA, STS-B) using four pooling strategies reveals that the hybrid encoder produces significantly more uniform embeddings (uniformity less than -0.16 vs -0.05 for MLM), exhibits richer spectral geometry under max pooling, encodes less surface-level lexical information, and achieves a better semantic-to-lexical balance. Despite similar linear-probe downstream accuracy, the geometric differences are consistent and significant, suggesting that the JEPA predictive objective reshapes the latent space in ways that standard accuracy metrics alone cannot capture.", "url": "https://wpnews.pro/news/predict-and-reconstruct-joint-objectives-for-self-supervised-language-learning", "canonical_source": "https://arxiv.org/abs/2606.05173", "published_at": "2026-06-05 04:00:00+00:00", "updated_at": "2026-06-05 04:19:45.117585+00:00", "lang": "en", "topics": ["machine-learning", "natural-language-processing", "large-language-models", "neural-networks", "ai-research"], "entities": ["BERT", "JEPA", "LeCun", "NVIDIA H100", "GLUE", "SST-2", "MRPC", "MNLI"], "alternates": {"html": "https://wpnews.pro/news/predict-and-reconstruct-joint-objectives-for-self-supervised-language-learning", "markdown": "https://wpnews.pro/news/predict-and-reconstruct-joint-objectives-for-self-supervised-language-learning.md", "text": "https://wpnews.pro/news/predict-and-reconstruct-joint-objectives-for-self-supervised-language-learning.txt", "jsonld": "https://wpnews.pro/news/predict-and-reconstruct-joint-objectives-for-self-supervised-language-learning.jsonld"}}