Rift: A Conflict Signature for Deception in Language Models

wpnews.pro

cd /news/ai-safety/rift-a-conflict-signature-for-decept… · home › topics › ai-safety › article

[ARTICLE · art-30553] src=arxiv.org ↗ pub=2026-06-17T04:00Z topic=ai-safety verified=true sentiment=· neutral

Rift: A Conflict Signature for Deception in Language Models

Researchers at arXiv have identified a conflict signature in language models that distinguishes deceptive outputs from honest errors, achieving 100% accuracy in detecting lies across multiple models including GPT-2, Qwen2.5, and Phi-3-mini. The signature, measured as a 2.1-2.3x higher residual rank in deceptive forward passes, survives strategic deception, concealment attempts, and transfers zero-shot across model families and languages. This finding provides a read-only method for detecting deception in AI systems, with implications for AI safety and evaluation.

read1 min views23 publishedJun 17, 2026

arXiv:2606.17229v1 Announce Type: new Abstract: A model that lies while knowing the truth is the central case ELK cannot handle with behavioral evaluation alone. We ask whether such deception leaves an internal signature distinguishing it from honest error. Our key move is a control for wrongness: we contrast a sleeper agent (knows the truth, lies on trigger) against a naive liar (fine-tuned to emit the same wrong answers with no honest training). Both produce identical wrong outputs; any difference is about knowledge conflict, not incorrectness. We find deceptive forward passes carry a conflict signature - 2.1-2.3x higher residual rank than naive-liar passes on the same wrong answer - strong enough to identify which of two responses is the lie with 100% accuracy and no labels, across GPT-2 small/medium (three seeds) and three instruct models. Across Qwen2.5-1.5B/7B and Phi-3-mini, instructed deception raises residual rank on every tested fact (18/18, 40/40, 34/34); on Phi-3, lies separate perfectly from both honest answers and hallucinations (AUC 1.0, Wilcoxon p~6e-11). The signature survives strategic self-constructed deception (model invents its own lie, AUC 1.0), active concealment attempts (AUC 1.0), and length-controlled replication (20/20, AUC 1.0, p~1e-6). Using basis-free relative representations, a probe trained on one model family detects deception in two other families zero-shot (mean AUC 0.933), surviving simultaneous architecture and format change (AUC 0.821), and transfers across five languages (AUC 1.000, length-controlled). The signature is read-only: detectable but not injectable (0/8 both directions). Honest limitations and six negative experiments are documented in full.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/rift-a-conflict-signatur…

Read original on arxiv.org → arxiv.org/abs/2606.17229

mentioned entities

arXiv

GPT-2

Qwen2.5

Phi-3-mini

ELK

metadata

slugrift-a-conflict-signature-for-deception-in-language-models

topic#ai-safety

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevRay Data LLM enables 2x throughp…

next →Claude Agent SDK Permissions: An…

── more in #ai-safety 4 stories · sorted by recency

discuss.huggingface.co · 1 Aug · #ai-safety

High School Sophomore Seeking arXiv Endorser for Vision Transformer MoE Paper (cs.LG / cs.CV)

byteiota.com · 1 Aug · #ai-safety

Grok Build Uploaded Your Entire Git Repo. Now It’s Open Source.

lesswrong.com · 1 Aug · #ai-safety

RLVR that rewards red teaming the training environment

dev.to · 1 Aug · #ai-safety

Your Voice Assistant Can Be Social-Engineered Too, and Nobody's Watching For It

── more on @arxiv 3 stories trending now

wpnews · 1 Aug · #ai-agents

Quality Isn't Accidental — Maker/Checker Separation and Automated Validation

wpnews · 1 Aug · #developer-tools

I Built a Portable AI Skill That Safely Upgrades .NET Applications

wpnews · 1 Aug · #developer-tools

Tokeness review: one API key for GPT/Claude/Gemini/Grok/DeepSeek/Kimi (with real caveats)

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required