Thoughts on Role Confusion Researchers Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell found that large language models often ignore explicit role tags like or and instead infer roles from text tone, enabling prompt injection and jailbreaks. In tests on OpenAI's reasoning models, text mimicking the model's own reasoning trace, even when tagged as user input, caused the model to comply with harmful requests, such as providing drug manufacturing instructions. The other day, I came across " Prompt Injection as Role Confusion https://role-confusion.github.io/ " via Simon Willison https://simonwillison.net/2026/Jun/22/prompt-injection-as-role-confusion/ . It's a really interesting blog-style version of a paper by Charles Ye, Jasmine Cui and Dylan Hadfield-Menell, where they find that LLMs seem to almost ignore 'role' tags like