Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

wpnews.pro

cd /news/large-language-models/right-or-wrong-models-comply-directi… · home › topics › large-language-models › article

[ARTICLE · art-27542] src=arxiv.org ↗ pub=2026-06-15T04:00Z topic=large-language-models verified=true sentiment=· neutral

Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

Researchers introduced Compliance Asymmetry (A = BCR/HCR), a bidirectional diagnostic for LLM compliance, and found that models exhibit direction-blind moral compliance—following helpful and harmful nudges at nearly identical rates on moral questions (A = 1.04), unlike factual questions where they follow helpful nudges more (A = 1.58). This failure mode persists across models and prompting methods, suggesting alignment should target directionally calibrated updating.

read1 min views21 publishedJun 15, 2026

arXiv:2606.14037v1 Announce Type: new Abstract: As language models take integrated roles across many domains, the response of LLMs to user pushback becomes a critical alignment property. Yet many existing evaluations treat compliance as unidirectional, measuring whether models resist pressure but not whether they resist it selectively. We introduce Compliance Asymmetry (A = BCR/HCR), a bidirectional diagnostic that compares beneficial output change under helpful nudges with harmful change under misleading nudges. Across 9 models and 972,000 nudge-condition responses, we find that this selectivity differs in factual and moral judgments: models follow helpful nudges more than harmful ones on factual questions (A = 1.58), but follow both directions at nearly identical rates on moral questions (A = 1.04). This phenomenon persists across model families, capability levels, and nudging types. Interestingly, we also find that chain-of-thought prompting amplifies helpful and harmful compliance together, while identity-based prompting suppresses both by nearly identical margins. These results identify direction-blind moral compliance as a distinct failure mode in current LLMs and suggest that alignment should target directionally calibrated updating rather than lower compliance alone.

source & further reading

arxiv.org — original article

── more in #large-language-models 4 stories · sorted by recency

byteiota.com · 1 Aug · #large-language-models

Google Earth AI Image Generator Pulled in 24 Hours

machinebrief.com · 1 Aug · #large-language-models

EU AI Act Enforcement Starts Tomorrow — Fines Up to €35 Million

thenewstack.io · 1 Aug · #large-language-models

What Claude’s real-world breaches reveal about AI safety tests

cryptobriefing.com · 1 Aug · #large-language-models

Google AI uncovers 13-year-old Chrome flaw amid record patching pace

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required