Emily Bender Sets the Record Straight on “Stochastic Parrots”

Emily M. Bender, lead author of the influential 'Stochastic Parrots' paper, debunked common misconceptions about the term on its five-year anniversary. The paper, which argued that large language models are stochastic parrots that repeat patterns without comprehension, has been widely cited but often misrepresented. Bender also criticized the term 'artificial intelligence' for obscuring the true nature of technologies like LLMs.

In March 2021, a group of four linguists and computer scientists published their now legendary paper “ On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? https://dl.acm.org/doi/10.1145/3442188.3445922 🦜” The paper received significant attention at the time in part because Google fired two of the authors, Timnit Gebru https://spectrum.ieee.org/timnit-gebru-dair-ai-ethics and Margaret Mitchell, shortly before its publication . It argued that large language models generate text by statistically predicting likely sequences of words rather than understanding what they are saying—a process the authors captured with the metaphor of a “stochastic parrot,” a system that repeats patterns without comprehension. And over the past five years, the analogy has spread well beyond the academic field where it originated, spawning debates and inspiring projects such as a shoulder-mounted robot https://www.media.mit.edu/projects/the-stochastic-parrot/overview/ named the Stochastic Parrot. But that wider usage has also led to misconceptions about what the phrase originally meant. Lead author Emily M. Bender https://faculty.washington.edu/ebender/ , a professor of computational linguistics at the University of Washington, recently wrote a blog post https://medium.com/@emilymenonbender/stochastic-parrots-frequently-unasked-questions-49c2e7d22d11 to debunk common misconceptions about the paper on its five-year anniversary. Bender spoke with IEEE Spectrum about these misconceptions, the field of computational linguistics, and the current discourse around artificial intelligence. How would you describe your work as a computational linguist? Emily M. Bender: Linguistics, very generally, is the study of how language works and how we work with language. I contribute to that, and I also work in computational linguistics, training students who are going to go on to build language technology. Language technology actually stands alone as valuable and interesting, independent of whether or not someone wants to use it for their project of artificial intelligence. Language technology includes things like automatic transcription, machine translation, spell check. And a lot of the work that I do personally, when I am building things, has to do with building machine-readable, but also human-readable grammars that model linguistic phenomena in different languages. That’s about using computers in the service of linguistic hypothesis testing. You’ve argued that the term “artificial intelligence” obscures more than it clarifies. Why? Bender: Many reasons. I think that it makes it difficult to actually have good discussions about technology and make wise decisions about it, if the way we’re talking about it doesn’t make clear what the technology is. The phrase “artificial intelligence” both groups together disparate technologies and oversells what each one of them can do. So if we are trying to decide whether or not to use something, how to regulate something, we are much better off with clearer descriptions. In general conversation, AI has become almost synonymous with “chatbots” or “LLMs.” Is that a problem? Bender: For many people, they’ll say, “I use it to do blah blah blah.” So what do you mean by “it”? And then they’ll say, “oh, I mean Claude” or ChatGPT or Gemini, so they are talking about these chatbots. But then other people will say, “You can’t say AI is all bad, because what about AlphaFold https://spectrum.ieee.org/alphafold-proves-that-ai-can-crack-fundamental-scientific-problems ?” So yes, for many people, they are talking about chatbots built on top of large language models, but they’re also not really clear that those things are separate from something like AlphaFold. And when we have news reporting that says, “scientists use AI to discover a new drug https://spectrum.ieee.org/isomorphic-labs-ai-drug-discovery ,” well, what did they use? If what they’re talking about is something much more narrow, maybe it’s protein folding, maybe it’s some other kind of statistical modeling like in weather modeling https://spectrum.ieee.org/ai-weather-forecasting . That’s a very different kind of technology than ChatGPT. Do you think there’s a value to an umbrella term like “artificial intelligence”? Bender: Well, there’s a value to people who are trying to sell this—so to the tech companies trying to raise their valuations. Also, the way research funding is set up right now, it is very hard to get funded if you don’t call what you’re doing artificial intelligence. That I think is a net negative, but for any individual trapped in that system, that can have value in the moment. What are the most common misconceptions about the stochastic parrots metaphor? Bender: I think one of the biggest ones is, “Bender says, AI is a stochastic parrot.” That paper was written in late 2020. We were talking about large language models. I’m pretty sure the word AI comes up only once at the very end, and that’s talking about how, if you’re going to develop systems that are meant to do things like what people do, you have to be very careful that you are not creating something that can be mistaken for a person. The fact that these systems are designed to mimic the way we use language makes it very easy for people to mistake them for other people. So in the paper, towards the very end, we sort of generalized to AI. But the phrase “stochastic parrots” specifically refers to large language models, and the phrase “artificial intelligence” refers to many different things. So we were never claiming that a chess engine or Alpha Fold or an image labeling system or a machine translation system, any of those things that are sometimes called artificial intelligence, are stochastic parrots. We were specifically talking about using large language models to produce synthetic text. Another one is that “stochastic parrot” got picked up and interpreted by other people as a minimization or an insult. It was not meant that way. Other people might be using it that way, but that’s not how I intended it, because it’s just a description of what these systems actually are. To see it as an insult requires either the belief that the large language model is the kind of thing that can take offense, which it isn’t, or that these large language models should be understood as steps towards this grand ideal that I don’t hold of artificial intelligence. What I have been doing in many places— the octopus thought experiment https://aclanthology.org/2020.acl-main.463.pdf , stochastic parrots, the phrase “synthetic text-extruding machines”—it’s all about trying to make vivid to people who aren’t in the business of building language technology what these systems actually do, which is not the same thing as insulting the systems or insulting the people who like the systems. RELATED: The Great Chatbot Debate: Do They Really Understand? https://spectrum.ieee.org/ai-chatbot For readers who don’t know, the “octopus test” comes from a 2020 paper that imagined an octopus recognizing the statistical patterns within messages passed through an undersea cable. With the octopus test and stochastic parrots, you’ve used animal metaphors a couple times now. Is that intentional? Bender: No, it’s not intentional. With the octopus thought experiment, I initially had told the story in terms of a dolphin, because dolphins clearly are intelligent animals. My co-author on that paper, Alexander Koller https://www.coli.uni-saarland.de/koller/ , said it should be an octopus, because first of all, the environment that octopuses live in is much more distinct from where people live. It makes the metaphor more vivid, that the octopus is just feeling these pulses in the cable and has no way to look at what the people are looking at. But also, octopuses are just inherently funnier. I was looking back at that paper and was surprised that the term “stochastic parrots” actually only appears twice in the text itself. Why did you include it in your title? Bender: Because we liked it And a catchy title is good self-marketing of an academic paper. The reason that there’s not so much of it in the paper is that we were really looking at the full range of risks of making language models ever bigger. The phrase large language model also doesn’t show up in the paper, because people weren’t talking about them that way. So the section on synthetic text, in some ways it felt like we were on thin ice, because at that point in time it was hard to imagine that anybody would want synthetic text. That part of the paper became much more relevant when OpenAI imposed ChatGPT on the world. Then that particular part of the paper comes out as important. But we also talk about environmental impact. We talk about the ways in which these systems will absorb the biases of their training data. We talk about how the training data is never collected well. There’s a lot of various points in there, and the issues about synthetic text were just one. Researchers at MIT Media Lab created a Stochastic Parrot robot as a response to the observation that many chatbots tend to be sycophantic, or overly agreeable. Does that trend relate to the dangers you laid out in your paper? Bender: When we wrote that paper in late 2020 and at the time, people were not super excited about synthetic text, nor about chatbots. Chatbots had been around. We had Weizenbaum’s Eliza in the 1960s https://spectrum.ieee.org/why-people-demanded-privacy-to-confide-in-the-worlds-first-chatbot , and then the very annoying automatic customer service systems that have gotten much more fluent with the large language models, and no less annoying. So, that was the state of things. OpenAI had put out GPT-2 and GPT-3 for people to play with, and you could get them to extrude synthetic text, but the chat interface hadn’t been wrapped around those yet. We also hadn’t seen the layers of additional training that lead to the behavior that’s interpreted as sycophantic. The reason that you get the chatbot saying, “Oh, that’s a good idea,” or if you say you’re wrong, it says, “Oh, I’m so sorry, you’re right,” that kind of response has to do with additional layers of training https://www.ibm.com/think/topics/rlhf past the original pre-training. What do you wish more people understood about language models? Bender: The message that I always bring when I have a chance is that, when the text that comes out of one of these systems makes sense, it’s because we are making sense of it. This is also in the Stochastic Parrots paper. Anytime we are evaluating this kind of technology, we have to account for our ability to make sense of language and keep that in view as we are deciding what’s going on with the technology. That is frequently lost in these discussions. If you were to redo or update the stochastic parrot paper now, is there anything that you would change about it? Bender: There was one really big form of harm that we did not cover in the paper, and that has to do with exploitative labor practices. Under that, I include both the horrible conditions that many data workers face, and also the massive theft of people’s creative and intellectual output https://spectrum.ieee.org/generative-ai-ip-problem that underlies these systems. Those issues should have been included in the paper. It’s not that they were unknown in the world then, but they didn’t make it into what we surveyed, and should be there.