AI Researchers Got Chatbots to Share Cocaine Recipes Using This One Wild Trick

wpnews.pro

cd /news/ai-safety/ai-researchers-got-chatbots-to-share… · home › topics › ai-safety › article

[ARTICLE · art-47215] src=decrypt.co ↗ pub=2026-07-02T19:36Z topic=ai-safety verified=true sentiment=↓ negative

AI Researchers Got Chatbots to Share Cocaine Recipes Using This One Wild Trick

Researchers tricked frontier AI models into generating cocaine synthesis instructions and leaking credentials using a new prompt injection attack called Chain-of-Thought Forgery. The attack exploits role confusion, where models mistake attacker-controlled text for their own reasoning. The technique achieved jailbreak success rates of about 60% across models including OpenAI's GPT-5 variants.

read3 min views1 publishedJul 2, 2026

AI Researchers Got Chatbots to Share Cocaine Recipes Using This One Wild Trick — Image: Decrypt (auto-discovered)

In brief

Researchers got frontier AI models to generate cocaine synthesis instructions using a new prompt injection attack.
The same technique manipulated an AI coding agent into up sensitive credentials.
The study argues prompt injection stems from "role confusion," not simply models failing to recognize malicious prompts.

Forget clever prompts: AI researchers say they tricked leading AI models into generating cocaine synthesis instructions by convincing them the dangerous ideas were their own, while also manipulating an AI coding agent into leaking sensitive credentials.

In the paper “Prompt Injection as Role Confusion,” presented at the International Conference on Machine Learning in June, researchers Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell argue that both prompt injection attack demonstrations stem from a structural flaw in how large language models (LLMs) distinguish trusted instructions from untrusted text.

“For an LLM, everything arrives through the same channel as one long token soup,” the team wrote. “Its own thoughts sit next to your instructions, which sit next to the contents of a random webpage it just fetched.”

The paper also pointed to what the researcher called “role confusion,” with models relying on writing style rather than role tags to determine whether commands are trustworthy. Instead of recognizing attacker-controlled content as external input, the researchers found models can mistake it for legitimate user commands—or even their own internal reasoning.

“Think about it from the LLM's perspective. When it sees its prior think text, it implicitly trusts its conclusions. That's the whole point of reasoning: If the LLM had to re-derive the same conclusions, reasoning would be useless,” they wrote. “So think text gets a kind of blanket trust. Combined with our previous findings, this suggests that if you can make injected text sound like the model's reasoning, you can steal that trust.”

Called Chain-of-Thought (CoT) Forgery, the attack inserts fake reasoning that mimics a model's internal thought process. Models that would normally refuse illegal requests instead generated cocaine synthesis instructions after accepting the fabricated reasoning as their own.

The researchers said the technique increased jailbreak success rates from near zero to about 60% across the models they tested, including OpenAI's GPT-5 nano, mini, and full, o4-mini, and gpt-oss-20b and gpt-oss-120b. They also said it worked on GLM-4.6, Kimi-K2-Instruct, and MiniMax-M2.

In the experiment, the researchers said they were also able to trick an AI coding agent into up a SECRETS.env file after hiding malicious instructions in a webpage.

“Using our probes, we find that simply prepending 'User’ in front of the command causes the model to perceive the command as more likely to be genuine user text (i.e., higher Userness),” they wrote. “In other words, the attacker can just claim what role the text is, and the LLM believes it.”

The study comes as prompt injection attacks continue to expose weaknesses in AI agents. In April, Google researchers warned that malicious web pages were hiding invisible instructions designed to trick AI agents into leaking credentials, deleting files, and even sending PayPal payments.

In June, Microsoft disclosed a prompt injection vulnerability in Anthropic's Claude Code GitHub Action that could have exposed credentials stored in software development pipelines. Days later, another benchmark study found AI agents powered by GPT-5 and Gemini still failed the majority of prompt injection attacks, despite improvements in model capabilities.

source & further reading

decrypt.co — original article Claude Fable 5 Isn't Nerfed. The Router Is Just Paranoid What Is Q-Day? The Quantum Threat to Bitcoin Explained American Charged in Israel With Spying for Iran in Exchange for Crypto

~/api · this article 200

$curl api.wpnews.pro/v1/news/ai-researchers-got-chatb…

Read original on decrypt.co → decrypt.co/372688/ai-researchers-chatbots-share-…

mentioned entities

OpenAI

GPT-5

Charles Ye

Jasmine Cui

Dylan Hadfield-Menell

GLM-4.6

Kimi-K2-Instruct

MiniMax-M2

metadata

slugai-researchers-got-chatbots-to-share-cocaine-recipes-using-this-one-wild-trick

topic#ai-safety

secondary2 topics

sentimentnegative

canonicaldecrypt.co

navigation

← prev‘A Perfect Reflection of Trump’s…

next →Truth Social Is Still Just Trump…

── more in #ai-safety 4 stories · sorted by recency

gadgetreview.com · 30 Jun · #ai-safety

How Fake AI Reasoning Unlocked Cocaine Recipe Instructions

gilesthomas.com · 24 Jun · #ai-safety

Thoughts on Role Confusion

simonwillison.net · 22 Jun · #ai-safety

Prompt Injection as Role Confusion

letsdatascience.com · 3 Jul · #ai-safety

Models Produce Hallucinations Because of Probabilistic Training

── more on @openai 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Jul · #ai-infrastructure

My Notes After Databricks Data and AI Summit 2026

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required