Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

wpnews.pro

cd /news/large-language-models/quantifying-aleatoric-uncertainty-of… · home › topics › large-language-models › article

[ARTICLE · art-33541] src=arxiv.org ↗ pub=2026-06-19T04:00Z topic=large-language-models verified=true sentiment=· neutral

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

Researchers introduced self-function vectors to quantify aleatoric uncertainty in in-context learning for large language models, enabling more reliable prediction confidence and applications like hallucination detection. The method uses internal model representations to separate aleatoric from epistemic uncertainty, outperforming existing approaches in controlled evaluations.

read1 min views27 publishedJun 19, 2026

arXiv:2606.19353v1 Announce Type: new Abstract: In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context, obscuring whether failures arise from data properties or model limitations. Uncertainty decomposition-separating aleatoric from epistemic sources-is particularly crucial in this setting, yet existing methods, designed for standard generation tasks, fail to capture the unique dynamics of ICL. To address this, we introduce a concept of self-function vectors, built upon Bayesian views and the mechanistic interpretability of ICL. These vectors leverage internal model representations to model the latent concept learned during in-context prompting, thereby enabling a direct estimation of aleatoric uncertainty within a Bayesian framework and circumventing the reliance on brittle input or decoding manipulations. Given the lack of established benchmarks and suitable evaluation protocols, we also propose the first and rigorous evaluation protocol, in which data is manipulated in controlled ways so as to quantify aleatoric uncertainty precisely and separately from epistemic uncertainty. With this new evaluation framework, initially grounded in synthetic tasks for conceptual development and subsequently extended to real-world datasets, we show that our proposed methodology can measure uncertainty of LLM predictions made under ICL more reliably than existing alternative methods. Moreover, we show it can be used as a practical tool for trustworthy-related applications, such as hallucination detection. Our findings pave a new direction for connecting the quantitative view of uncertainty with the mechanistic understanding of model behavior.

source & further reading

arxiv.org — original article

── more in #large-language-models 4 stories · sorted by recency

vox.com · 3 Aug · #large-language-models

The US might lose the AI race to China. Should Americans care?

schneier.com · 3 Aug · #large-language-models

The OpenAI Hack Shows the Genie Is Out of the Bottle

mlq.ai · 3 Aug · #large-language-models

Google DeepMind Launches Gemini Robotics 2 With Full Humanoid Body Control

github.com · 3 Aug · #large-language-models

Claude gen-5 models show significant regression in BullshitBench

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required