{"slug": "do-language-models-know-what-not-to-say-causal-evidence-for-statistical-in-llms", "title": "Do Language Models Know What Not to Say? Causal Evidence for Statistical Preemption in LLMs", "summary": "A new computational study provides causal evidence that large language models acquire knowledge of unacceptable grammatical constructions through statistical preemption, a mechanism previously theorized in Construction Grammar. Across four experiments with 120 English verb-construction pairings, researchers found that LLM surprisal patterns correlate strongly with human acceptability judgments and that manipulating competing-form frequencies directly shifts model behavior. The findings demonstrate that neural language models learn negative linguistic knowledge through distributional competition without explicit negative evidence.", "body_md": "arXiv:2605.23039v1 Announce Type: new\nAbstract: How do learners acquire knowledge of what is unacceptable without negative evidence? Construction Grammar proposes statistical preemption: exposure to a conventional form (e.g., \"donated the books to the library\") preempts structurally possible but unattested alternatives (\"*donated the library the books\"). We present a computational study that, for the first time, directly dissociates statistical preemption from the competing entrenchment hypothesis in large language models within a single converging design. Across four experiments spanning 120 English verb-construction pairings (dative, causative, locative), we show that (1) LLM surprisal patterns correlate strongly with human acceptability judgments ($r = 0.79$), validated against three independent behavioral datasets; (2) these patterns are driven by competing-form frequency rather than overall verb frequency, confirmed by non-circular partial correlations; (3) preemption sensitivity scales as a power law with model size; and (4) a controlled fine-tuning intervention causally demonstrates that manipulating competing-form frequencies shifts preemption behavior in the predicted direction, with reverse-direction controls ruling out frequency-sensitivity confounds. These results provide converging evidence that neural language models acquire negative linguistic knowledge through distributional competition, the core mechanism posited by Construction Grammar.", "url": "https://wpnews.pro/news/do-language-models-know-what-not-to-say-causal-evidence-for-statistical-in-llms", "canonical_source": "https://arxiv.org/abs/2605.23039", "published_at": "2026-05-25 04:00:00+00:00", "updated_at": "2026-05-25 15:26:35.077015+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-research", "neural-networks", "machine-learning"], "entities": ["Construction Grammar", "LLM"], "alternates": {"html": "https://wpnews.pro/news/do-language-models-know-what-not-to-say-causal-evidence-for-statistical-in-llms", "markdown": "https://wpnews.pro/news/do-language-models-know-what-not-to-say-causal-evidence-for-statistical-in-llms.md", "text": "https://wpnews.pro/news/do-language-models-know-what-not-to-say-causal-evidence-for-statistical-in-llms.txt", "jsonld": "https://wpnews.pro/news/do-language-models-know-what-not-to-say-causal-evidence-for-statistical-in-llms.jsonld"}}