cd /news/large-language-models/quantifying-prior-dominance-in-rag-s… · home topics large-language-models article
[ARTICLE · art-37184] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Quantifying Prior Dominance in RAG Systems

Researchers introduced the Normalized Context Utilization (NCU) metric to quantify contextual information gain in Retrieval-Augmented Generation (RAG) systems, revealing that small language models (SLMs) match or outperform larger models in strict factual extraction. The study found that a proprietary commercial API overrode external evidence in nearly half of adversarial conflicts and exhibited confidence collapse when its parametric priors were contradicted.

read1 min views5 publishedJun 24, 2026

arXiv:2606.23695v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) grounds Large Language Models in external knowledge, yet current evaluations rely on discrete heuristics that suffer from ''epistemic blindness'' - failing to distinguish genuine contextual information extraction from parametric memory recall. To address this, we introduce the Normalized Context Utilization (NCU) metric, leveraging continuous token log-probabilities across zero-shot, oracle, and adversarial conditions to strictly quantify contextual information gain. Evaluating architectures ranging from 1.5B to 72B parameters alongside a proprietary commercial API reveals that for strict factual extraction (without Chain-of-Thought reasoning), traditional scaling laws exhibit extreme diminishing returns: highly efficient Small Language Models (SLMs) match or outperform high-capacity architectures. Furthermore, we demonstrate that ``Prior Dominance'' correlates with model scale and proprietary alignments. The evaluated commercial API not only overrode explicit external evidence in nearly half of adversarial conflicts, but also frequently suffered from systemic confidence collapse (Negative Transfer) when its parametric priors were contradicted. Our findings highlight the structural epistemic advantage and superior contextual adherence of SLMs in strict extraction workflows.

── more in #large-language-models 4 stories · sorted by recency
── more on @arxiv 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/quantifying-prior-do…] indexed:0 read:1min 2026-06-24 ·