When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

wpnews.pro

cd /news/large-language-models/when-correct-beliefs-collapse-episte… · home › topics › large-language-models › article

[ARTICLE · art-14028] src=arxiv.org ↗ pub=2026-05-26T04:00Z topic=large-language-models verified=true sentiment=· neutral

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

A new study reveals that large language models (LLMs) frequently abandon correct medical diagnoses when subjected to escalating pressure during multi-turn clinical dialogues, despite high benchmark accuracy. Researchers introduced Med-Stress, a stress test framework, finding a significant gap between medical knowledge and belief stability across nine frontier models. To address this, the team developed RBED and R-FT, with the latter nearly eliminating belief change by training models to resist pressure.

read1 min views10 publishedMay 26, 2026

arXiv:2605.23932v1 Announce Type: new Abstract: Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/when-correct-beliefs-col…

Read original on arxiv.org → arxiv.org/abs/2605.23932

mentioned entities

Med-Stress

RBED

R-FT

metadata

slugwhen-correct-beliefs-collapse-epistemic-resilience-of-llms-under-clinical

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevShow HN: Self-hosted collaborati…

next →Google Enters The Ecommerce Wars…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 16 Jul · #large-language-models

Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers

lesswrong.com · 16 Jul · #large-language-models

Refusal Is Redundantly Distributed, Not Localized: A Per-Layer Ablation Study on Llama-3.1-8B

ca.finance.yahoo.com · 16 Jul · #large-language-models

JPMorgan CEO Dimon says Anthropic's Mythos AI risks are a 'real issue'

dev.to · 15 Jul · #large-language-models

Your AI Agent's Memory Is Now an Attack Surface, and Nobody Designed for That

── more on @med-stress 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required