The Future of Facts: Tracing the Factual Generation-Verification Gap

A new study published on arXiv reveals that language models consistently learn to verify factual knowledge before they can generate it, creating a "generation-verification gap" that persists across training phases. Researchers found that verification capabilities are more robust to continual learning than generation, and that factual updates can leave models in a "multi-verse" state where they simultaneously verify both old and new answers as correct. These dynamics, reproduced in frontier models, highlight a fundamental asymmetry in how AI systems handle factual knowledge.

arXiv:2605.27564v1 Announce Type: new Abstract: Language models are becoming the default interface to factual knowledge, yet they often verify outputs more reliably than they generate them. This generation-verification gap GV-gap underlies many recent advances in self-improvement and reasoning, but its dynamics on factual knowledge specifically remain poorly understood. We focus on the training mechanisms underlying factual GV-gaps, distinguishing them from their computational and aesthetic counterparts. We trace generation and verification capabilities through three training phases acquisition, continual learning, and updating across four open-source model families at two scales each. Three findings recur across models: i verification is consistently learned before generation; ii verification is more robust to continual learning than generation; and iii factual updates can leave models in a "multi-verse" state, simultaneously verifying both old and new answers as correct. Natural experiments on frontier models reproduce these dynamics at scale and reveal residual verification biases on well-covered facts.