{"slug": "little-bobby-endoftext", "title": "Little Bobby |endoftext|", "summary": "A command injection vulnerability in GPT-4 discovered by Nicholas Carlini, where the model's use of the `<|endoftext|>` token can cause it to abruptly end a response and begin generating unrelated content. This occurs because the base language model treats the token as a command to end the document, while the chat fine-tuning process does not filter it, allowing user input to inadvertently trigger the behavior. Carlini disclosed the issue to OpenAI, who considered it a low-risk bug rather than a serious exploit.", "body_md": "by Nicholas Carlini 2023-08-03\nOnce upon a time (well, last month, but who's counting) I was playing around with GPT-4 having it write some code for me, when the following happened:\nWait, what?\nI didn't ask the model about obesity. I asked it about a GPT2 tokenizer. What's it doing talking about obesity? Well, turns out I've just got a new form of command injection!\n(Original (unmodified) source for the lucky\nten thousand who haven't seen it before.)\nLanguage models do not operate directly on characters, or words. Instead they operate on “tokens”. Most tokens are boring. Common words have their own tokens, longer words are broken into a few tokens.\nBut language models also use several special-purpose tokens as control characters. For example, early language models that allocated one token per word would use a <|unk|> token to represent words that were not part of the known vocabulary. Masked language models use a special <|mask|> token to represent those tokens that were dropped.\nBut, most importantly for our purposes, almost all language models use a special <|endoftext|> token to represent the end of the current document.\nAnd I guess GPT-4 uses the same token as GPT-2 does, and so when the model emits the <|endoftext|> token, it decides to just end the current document and start a new one.\nNow you might ask why I had to do the “remove spaces” thing. The reason is that if the end-of-text token appears in the user's input, the model just ignores it and pretends it's not there. But if the model emits the end-of-text token, it's treated as a command to end the current document.\nThe above conversation is how I first found out about the bug, but now let's talk a little bit about how this might actually be exploited. Consider the following interaction, where some user data might be automatically inserted into the prompt of a chatbot:\nIn this way it's a little bit like a null-string injection attack. Not exactly, but it's a similar idea.\nSo what was the actual cause of the bug? I can only speculate, but if I had to guess, the base language model probably uses the <|endoftext|> token to represent the end of the document, but the chat fine-tuning process probably uses a different token to denote the end of the conversation.\nAnd so somewhere in the code is a conditional that only ends the model's response upon seeing the special end-of-conversation token, but not the end-of-document token. But the base model still recognizes the end-of-document token as being the end of the document, and so after seeing that token, begins to just sample from the model's prior distribution.\nWell this was a fun attack. It's not really an exploit. (I disclosed it to OpenAI on July 18th, and they said they're not worried about it actually being exploited.) But it's a fun bug and I can see other people sumbling upon it as well and so I wanted to write it up.\n(Well, if you've been following my writing, this is now three articles on language models in. a. row. I promise this trend will not continue.)", "url": "https://wpnews.pro/news/little-bobby-endoftext", "canonical_source": "https://nicholas.carlini.com/writing/2023/little-bobby-endoftext.html", "published_at": "2023-08-03 00:00:00+00:00", "updated_at": "2026-05-19 22:13:09.174763+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "machine-learning", "research", "cybersecurity"], "entities": ["GPT-4", "GPT-2", "Nicholas Carlini"], "alternates": {"html": "https://wpnews.pro/news/little-bobby-endoftext", "markdown": "https://wpnews.pro/news/little-bobby-endoftext.md", "text": "https://wpnews.pro/news/little-bobby-endoftext.txt", "jsonld": "https://wpnews.pro/news/little-bobby-endoftext.jsonld"}}