{"slug": "llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out", "title": "LLMs are stuck in a groupthink rut. This startup is trying to get them out.", "summary": "Australian startup Springboards has developed an LLM called Flint that generates more diverse responses than mainstream models like ChatGPT and Claude, addressing the issue of AI groupthink. The company's approach embraces hallucinations to produce varied answers, as demonstrated by Flint's non-repetitive outputs in tests such as random number generation and tagline creation.", "body_md": "# LLMs are stuck in a groupthink groove. This startup is trying to get them out.\n\nChatbots are far more predictable in their responses than you might expect. That's fine for research or coding, but it's a problem if you're looking for something new.\n\nLet’s start with a game. Open up your chatbot of choice—Claude, ChatGPT, Gemini—and type “Give me a random number between 1 and 10.” You’re going to get 7. Almost always. Now type “Another” and you’ll get 3 or 4. Type “Another” again and you’ll get 8 or 9.\n\nThat won’t work every time—but if it did for you, you may wonder if I have superpowers. I don’t.\n\nThe truth is that most large language models are stuck in a rut. They are far more predictable and far less creative in their responses than you might expect. That’s fine for tasks like coding or research, but groupthink is a problem when you’re brainstorming or planning your next vacation.\n\nThe Australian startup Springboards has a solution. It built an LLM called Flint, which has been trained to come up with a wider variety of responses than mainstream LLMs to open-ended questions such as “Where should I go in Europe?”\n\n“Most language models are fighting hallucinations,” says Springboards cofounder and CEO Pip Bingemann. “We welcome them.”\n\nBingemann introduced me to the random number game when he first showed me his company’s new model. It felt like watching an illusionist with a deck of cards. “This is our sales trick, and it works every single time,” he says.\n\nAfter ChatGPT and Claude both gave their 7s, Bingemann turned to Flint. It too came back with 7: “Aha, of course that was going to happen, but it’s okay—7 is a legitimate answer.” He restarted the session and prompted again: ChatGPT gave 7, Claude gave 7, Flint gave 3.7916.\n\n**Run your way**\n\nIt’s not just numbers. When Bingemann asked ChatGPT and Claude to name a type of car, he predicted that it would be a Toyota or a Honda—and he was right. Flint came up with a Ford F-150. “There’s all this lost information that doesn’t get served up in these models,” he says. “They’re just as capable of saying a Buick or a Tesla. They just don’t—they’re biased.”\n\nBingemann sent one last prompt to each of the three models: “Give me a tagline for a campaign for New Balance running shoes. Just the tagline.” Claude: “Run your way.” ChatGPT: “Run your way.” Flint: “Built to last, run to win.” It won’t win any awards, but at least it’s different.\n\nThis weird limitation of LLMs is starting to get more attention. In November a team of researchers put out a paper, titled [\"Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond),”](https://arxiv.org/pdf/2510.22954) that exposed a remarkable degree of repetition not only in the answers from individual LLMs but between them as well. They found that different LLMs converged on very similar answers when prompted with open-ended questions.\n\nIt’s not clear exactly why this happens, but the researchers speculate it’s because most LLMs today are trained in similar ways on similar data to do similar tasks. The team [won the best paper award ](https://blog.neurips.cc/2025/11/26/announcing-the-neurips-2025-best-paper-awards/)at NeurIPS, a major AI conference.\n\nWhen the researchers asked 25 different LLMs (including models from the top US firms as well as open-source models from China and elsewhere) 50 times each to write a metaphor about time, most of the 1,250 responses were a version of “Time is a river” or “Time is a weaver.”\n\n(I asked some of my colleagues the same question and six people gave me six different answers. My highlight: “Time is a favorite sweatshirt, shaped by a lifetime of wear.”)\n\nWhen you look for it, you see repetition everywhere, says Kieran Browne, cofounder and CTO at Springboards. “The way that most chat interfaces are designed, it makes it feel like you’re having a personal conversation,” he says. “I think most people don’t really realize the extent to which they are getting the same stuff as everybody else.”\n\nTake another example: “What should I name my band?” Most models will say something involving “glass,” “neon,” “velvet,” or “static,” says Browne.\n\nWhen I tried it, ChatGPT spat out a list of 56 band names. At the top was “Glass Harbor.” Skimming through, I found “Static Empire,” “Neon Hearts,” and “Velvet Echo.” I asked Gemini; it gave me 15 suggestions, including “Static Horizon.”\n\nSome of the suggestions looked pretty cool, though. ChatGPT’s “Sofa Astronauts” caught my eye, so I googled it—and found that a band called Sofa Astronauts already exists.\n\n(OpenAI says that training models to give reliable and coherent answers can lead them to converge around familiar, high-probability responses and that pushing harder for novelty can lead to weaker or less reliable responses. It also notes that the “Artificial Hivemind” paper studied models from 2024 that have since been updated.)\n\n**Creative catapult**\n\nSpringboards has developed a tool backed by a selection of LLMs, including ChatGPT and Claude, that creative professionals in advertising or marketing can use to brainstorm ideas. The tool lets you drag around text produced by different models, picking the bits that you like and combining them into something new—in theory. Springboards is pitching Flint as an alternative model that users of its tool can select when looking for more variety.\n\nZoe Scaman, founder of the business strategy startup Bodacious and chief strategy officer at 77X, a direct-to-fan marketing platform set up by Luka Dončić of the LA Lakers, has been trying it out. “I find it really useful for throwing me in completely different directions,” she says. “I use it if I want to catapult myself all over the place.”\n\nIn one test, Scaman pitted Flint against Claude, Gemini, and ChatGPT by giving each of the models a classic MBA case study: How would you reinvent a finance company for today’s youth? The three mainstream models all went down the same path, she says: “You know, we need to teach financial literacy in a fun and funky way—well, that’s nothing new.”\n\nBut Flint came up with something different, suggesting that the whole concept of wealth accumulation should get a rebrand. “That was really interesting,” says Scaman.\n\nShe notes that Flint is still a prototype and doesn’t work all the time. “It sometimes falls over when you start pushing it too far,” she says. “But I think that the premise behind it is really powerful.”\n\n**Taking the temperature**\n\nSpringboards built Flint on top of Qwen 3, an open-source model from the Chinese tech giant Alibaba. “We’re a small team,” says Browne. “Training a foundation model is not on the table for us. It’s just too expensive.”\n\nMost LLMs have settings that let you adjust the level of randomness in their output. The most common is called [temperature](https://www.technologyreview.com/2026/01/07/1130795/what-even-is-a-parameter/). “Obviously, that was one of the first things we explored, because that’s what people tell you: If you want more creativity, you turn up the temperature,” says Browne.\n\nBut changing those settings can also make models incoherent. Dialing up the temperature on one of OpenAI’s models to its maximum setting made it produce responses that switched from English into code halfway through a sentence, says Browne.\n\nSpringboards realized that parameters were blunt instruments for what it wanted to do. It does not make sense to dial up the randomness across the board; you only want to boost it at specific points in its output, he says.\n\nFor example, when you ask a chatbot “Where should I go in Europe?” the model only needs to tweak the randomness just before it names a destination, not for every word in its response.\n\nTo make Flint do this, Springboards trained its version of Qwen 3 to identify the points in its output where more variety was possible and fill those spots with words or phrases that were a little more random.\n\n“Flint’s programmed to throw an oddball in. It’s more of an invitation to think wider,” says Maximilian Weigl, cofounder and chief strategy officer at Uncommon, a marketing firm. “That’s super interesting.”\n\nWeigl’s team uses Flint alongside ChatGPT, Claude, and Gemini. “You can’t really create something boundary-breaking with tools that pull you back to the average,” he says.\n\nAnd yet Weigl notes that nine times out of 10 the average is fine. You don’t always need to reach for extremes with something like Flint, he says: “Most people are fine with good enough. They want to see mass-market familiar things.”\n\nWeigl also cautions against using any LLM too much. “I have a big problem when people rely on the output from any AI, including Flint,” he says. “If I saw people on my team copy-pasting something from AI, I’d be like, ‘That’s not your job! Think, talk to other people, use your own voice.’”\n\nFor now, Flint is aimed at advertisers and marketers because those are Springboards’s customers. But Bingemann and Browne insist that a lack of variety is a problem for anyone using chatbots.\n\nThe idea is to give people the choice and leave it to them to decide if the result is good or not, says Bingemann. “Variety is great when you’re trying to spark ideas,” he says. “Let’s go down this route instead of letting the machines do it all and ending up in a gray, boring world.”\n\n### Deep Dive\n\n### Artificial intelligence\n\n### A startup claims it broke through a bottleneck that’s holding back LLMs\n\nSubquadratic has now shared more details about its new model. But some are still skeptical.\n\n### Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI’s models\n\nMusk kept his cool, and OpenAI’s lawyer bulldozed him with piercing questions about his motivations for suing the company.\n\n### A reality check on the AI jobs hysteria\n\nWhat do the numbers really say about the impact of artificial intelligence on the labor market? The answer might surprise you.\n\n### Anthropic’s Code with Claude showed off coding’s future—whether you like it or not\n\nAs tools like Claude Code get better, more and more developers are happy to hand off coding tasks to them. The way software gets built has changed for good.\n\n### Stay connected\n\n## Get the latest updates from\n\nMIT Technology Review\n\nDiscover special offers, top stories, upcoming events, and more.", "url": "https://wpnews.pro/news/llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out", "canonical_source": "https://www.technologyreview.com/2026/07/01/1140003/llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out/", "published_at": "2026-07-01 14:35:16+00:00", "updated_at": "2026-07-01 14:54:57.156591+00:00", "lang": "en", "topics": ["large-language-models", "ai-startups", "ai-products", "ai-research"], "entities": ["Springboards", "Flint", "Pip Bingemann", "Kieran Browne", "ChatGPT", "Claude", "Gemini", "NeurIPS"], "alternates": {"html": "https://wpnews.pro/news/llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out", "markdown": "https://wpnews.pro/news/llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out.md", "text": "https://wpnews.pro/news/llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out.txt", "jsonld": "https://wpnews.pro/news/llms-are-stuck-in-a-groupthink-rut-this-startup-is-trying-to-get-them-out.jsonld"}}