Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

Researchers used a developmental approach to study how neural language models learn statistical patterns, finding that Transformers first acquire abstract global statistics before local dependencies, with early over-generalizations later constrained. The study, published on arXiv, proposes a new framework for understanding NLM statistical learning and language cognition.

arXiv:2606.27460v1 Announce Type: new Abstract: In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models NLM . A series of Generative Transformer models are trained on a synthetic grammar. The model states are saved at multiple stages in the course of training. Through analyzing how the internal representations of these models change in the developmental path, we found that NLMs acquire the most abstract global statistical knowledge at the beginning of learning and later acquire the relatively local statistical dependencies. This learning path contains many over-generalizations from the very beginning and these over-generalizations are gradually constrained in the later stage of learning. Based on this observation, we propose a new framework to explain the statistical learning and language cognition of NLMs.