{"slug": "but-dumbo-could-already-fly", "title": "But Dumbo could already fly?", "summary": "An unnamed internal OpenAI model disproved an 80-year-old geometry conjecture by mathematician Paul Erdős, marking the first verifiably new and non-trivial mathematical contribution autonomously produced by artificial intelligence. Mathematicians consulted by OpenAI confirmed the proof meets high standards, though human rewriting was needed for clarity, and the discovery challenges the long-held assumption that Erdős' original construction was optimal. The achievement raises questions about whether the result stems from genuine model advancement or factors like search across many problems, as OpenAI has not demonstrated that previous models could not have produced the same finding.", "body_md": "# But Dumbo could already fly?\n\n### 100% pure human copium about OpenAI solving Erdős problems\n\nAnd lo, the machine thought, and thought, and thought, and one day it answered.\n\nWe finally have the first truly impactful intellectual contribution where explicit credit must be given to AI. It’s a historic moment. [OpenAI released](https://openai.com/index/model-disproves-discrete-geometry-conjecture/) a disproof of a geometry conjecture first proposed by Paul Erdős 80 years ago, discovered by an unnamed internal model. According to * Scientific American*:\n\n“No previous AI-generated proof has come close” to meeting those high standards, wrote Timothy Gowers, a mathematician at the University of Cambridge, in commentary solicited by OpenAI.\n\n“This is\n\ntheunique interesting result produced autonomously by AI so far,” says Daniel Litt, a mathematician at the University of Toronto, who was consulted by OpenAI to verify the proof but is not involved with the company.\n\nThe AI’s insight behind the finding is elegant (although the proof needed re-writing by humans to be clear and up-to-standard). There are many far greater problems in math, but it is still very much the definition of “new scientific or mathematical knowledge” which, for many—including myself—has been the highest bar when it comes to AI.\n\nNow, “new information” is notoriously hard to define, since of course by any strict definition AI has contributed new information before (just think of all the protein structures that have come out of [AlphaFold](https://en.wikipedia.org/wiki/AlphaFold)). But this discovery does seem different in kind, in that it is:\n\n**(a) Something verifiably true.**\n\n**(b) Non-trivial or even important (at least, relatively so in its subfield of math).**\n\n**(c) Something humans had spent previous time on and failed to crack.**\n\n**(d) The AI was (reportedly) not purpose-built to solve this particular problem, but did so (reportedly) autonomously as a next-gen LLM similar to the current version of ChatGPT.**\n\nIntriguingly, the internal model succeeded by going the opposite of the expected direction. It disproved the optimality of what Paul Erdős thought to be essentially the best construction for this problem (some have suggested that the social pressure of Erdős’ authority pointed humans in the wrong direction). To put it as simply as possible, Erdős was asking: If you place a set of nodes down on a plane, how can you organize this set of nodes such that *as many pairs of nodes as possible* are an exact fixed distance apart?\n\nHere is what the original thought-to-be-optimal construction looked like:\n\nAnd here is the improvement, passing from human to post-human, in an image that will probably go down in history books:\n\nIn response to this development, many are crowing that human mathematics is over. Here’s a comment comparing this moment to when the game of [Go fell to deep learning](https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol), which in turn heralded the modern AI age:\n\nPeople are getting extremely confident about this.\n\n## EXCEPT… A CURRENT MODEL CAN ALREADY DISCOVER THIS?!?\n\nLet’s review the implicit pitch of this announcement: That the newer internal model at OpenAI is a step up in capabilities, and therefore is becoming powerful enough to begin to automate mathematics itself.\n\nBut to actually show that scientifically, you need controls. Specifically, you need to show that previous models *could not do this*. Otherwise this could just be a function of search (and there was indeed probably a lot of search across all open Erdős problems until they got a hit). Or, there might be something uniquely easy about this problem. Or, the result could be via minor improvements in elicitation (“elicitation” means the work of getting the models to accomplish things via prompts or harnesses or even just asking in the first place). These don’t take away that AI solved something, but they would take away the implied conclusion: That the models are getting smarter and smarter at some fixed rate, and are soon to surpass humans.\n\nMeanwhile, a very good mathematician was able to get the currently available-to-all ChatGPT 5.5 Pro to reproduce the output! Below is this being described by one of the mathematicians quoted in OpenAI’s initial release, Timothy Gowers, who in turn is quoting the mathematician Xiao Ma, who showed ChatGPT 5.5 could do it (Ma previously made progress on [Hilbert’s 6th problem](https://arxiv.org/abs/2503.01800)).\n\nWe don’t know everything about Xiao Ma’s result, but in general the rediscovery by ChatGPT 5.5 seemed unsurprising to many key players; and, importantly, we also don’t know everything about OpenAI’s results (did they rewrite the prompt a million times, did they actually fine-tune or somehow specialize the “internal model” but still call it the same “internal model” because there are no rules here, etc.).\n\nIn reply, the OpenAI researcher Noam Brown (who leads the reasoning team at OpenAI) said that the real impact of the discovery is that the new model shifts the intelligence curve and so makes such discoveries easier.\n\nBut I read the whole transcript of the exchange which elicited the exact same result from the publicly-available ChatGPT 5.5, and basically the *only* hints given are just saying that the disproof was already known, that it was asked to generate a few profound ideas, and then asked to expand on one of those ideas. That’s a pretty minimal set of prompts that could easily be automated. Just tell the model the problem is known to be solved, tell it to generate a set of profound ideas, and then tell it to pursue each one to the end sequentially. I feel like people probably already use the models like this all the time?\n\nDoes this mean that all that training, all that work internally, went into creating this super-duper model, and all it did (pretty much) was shift the “intelligence curve” to not needing to be told the problem was already solved? That’s an improvement in intelligence that looks a lot like Dumbo’s feather: Look, a billion dollars later, we don’t need the feather to fly!\n\nSo if you want to be super skeptical, then all we really know for sure right now is that their new internal model takes away Dumbo’s magic feather.\n\nAnd what about ChatGPT 5.4? 5.3? You see where I’m going. I’m happy to admit there is some version that *definitely* could not, in a million years, be elicited to solve this problem. But I have no idea what version that is… and neither does anyone else.\n\nMaybe that’s what progress on intelligence really is: just feather after feather being removed. Yet if we’re just a feather or two away, why aren’t tons more math problems falling right now? Is no one looking? What’s going on?!\n\n## THIS LOOKS EERILY SIMILAR TO ANTHROPIC’S MYTHOS ANNOUNCEMENT\n\nIn April, Anthropic made worldwide headlines by claiming that their new model, Mythos, was such a cybersecurity threat it couldn’t be released to the public. And there was probably some truth to that, in that some benchmarks showed marked increases—much as this internal model is likely better on certain math benchmarks (but all these things are getting harder to measure as we enter a post-benchmark age). At the same time, Anthropic’s flagship example of a zero-day exploit turned out to be [replicable using existing models](https://aisle.com/blog/system-over-model-zero-day-discovery-at-the-jagged-frontier) by just rigorously trawling the search space. Probably both were true: Mythos was a step forward, but also, in general, serious resources are not spent trying to find exploits, and it’s impossible to disentangle the actual intelligence gains vs. just Anthropic devoting its massive war chest of money, talent, and compute to some attention-bottlenecked problem.\n\n## A HESITANT ALTERNATIVE MODEL TO THE POST-BENCHMARK ERA\n\nListen, I do think the world remains consistent with ever-increasing AI capabilities, to the degree that large sections of the economy end up automated quite quickly, that AI intellectual ability outstrips most or even all humans in the next few years, and (I’m most hesitant about this last part) that we rapidly end up in some regime of self-improvement that leads to “superintelligence.”\n\nAt the same time, the issue is *not* decided. It is certainly not decided by this one result.\n\nThere is still not a single explicit large-scale domain of intellectual production (as in broad categories like writing, math, medicine, science, legal advice, etc.) in which AI has truly *surpassed* quality human experts, at least over the scope of most jobs (and yes, [I include programming](https://bitsplitting.org/2026/04/01/the-beginning-of-programming-as-well-know-it/)) in the same way it surpassed Chess or Go players. E.g., when it comes to writing, it’s been half a decade since LLMs could produce okay-ish prose, yet after all this progress they are still bad enough to be detectably low-quality and so regularly cause scandals.\n\nSo there remains the hypothesis that the same slow plateauing (not in benchmarks, but in real-world impact)* *will play out in all intellectual domains for LLMs, even math and science, just as it did for writing. As I wrote in “[Bits In, Bits Out](https://www.theintrinsicperspective.com/p/bits-in-bits-out)” earlier this year:\n\nLLMs can now “write a scientific paper” or “write a mathematical paper” in the exact same sense that they’ve been able to “write a book” or “write a short story” or “write an essay” for several years, all to some effect, but overall the results have been objectively mediocre given the hype, and the world is somewhat stupider, rather than smarter, at least on average.\n\nAnother alternative hypothesis is that various intellectual domains like math and programming will indeed fall in their entirety to AI, but there will be sharp distinctions based on whether the domains are verifiable or (as I think is under-discussed) searchable.\n\nThose three futures (full automation, full approximation, partial automation) would all look like extremely AI-bullish predictions five years ago, mainly because the models have indeed gotten absurdly smart. But, up close, the futures they entail still look incredibly different, don’t they?\n\n## THIS IS JUST RIDICULOUS HUMAN COPE\n\nYes!\n\nOf course it is!\n\nWe are well into the era of human cope. We barely have any sources of cope left; we are cope-less, we need an entire copium mining operation to keep up. A human mathematician managed to improve the bound? Go us!\n\nGoal posts are moving rapidly and have been for some time. But there’s also a legitimate argument for why goal posts should indeed move: these subjects are much more complicated from up close, and we are all now very Up Close.\n\nE.g., Helen Toner (ex-OpenAI board member, among many other things) [recently wrote](https://helentoner.substack.com/p/the-term-agi-is-almost-useless-at) about how even the term “AGI” isn’t that useful anymore for talking about the technology of today. In fact, I’m going to borrow her great chart here, as it’s a synecdoche of this all:\n\nFrom over on the right, looking out at 2026 from the vantage of 2006, it seems as if “Artificial General Intelligence” has indeed been achieved here in 2026. But if we then zoom into our current AI, it maps onto some definitions of AGI, but not all, and there are no clear lines or borders. In fact, the current definition of “AGI” is so complexified by reality that it’s not a good umbrella term anymore: it dissolves away into a dozen inter-related definitions under the hood (e.g., AI can pass the bar exam, but not even play a simple out-of-distribution video game without human-made harnesses and human interventions).\n\nSimilarly, as our notion of what “AGI” is complexifies, so too does our notion of “an AI intellectual contribution” complexify. How much elicitation was involved? Was it attention bottlenecked? Does this actually represent a new capability or can earlier models accomplish it too?\n\nYou can’t just sneer at these questions!\n\nSolving such a renowned Erdős problem is indeed historic—and an incredible accomplishment. There is no denying any of that. But now that benchmarks have saturated, and measuring upcoming models’ intelligence has become so difficult, at minimum, we should demand that organizations like OpenAI and Anthropic rigorously control for whether their new much-hyped novel discoveries can actually be replicated with previously existing models. That’s a scientific necessity. Otherwise such discoveries, *no matter how amazing*, do not actually give us information about new capabilities themselves. Instead, they might simply reflect a long march by already-impressive systems across attention-bottlenecked topics… of which there are more than enough to last until the IPOs.", "url": "https://wpnews.pro/news/but-dumbo-could-already-fly", "canonical_source": "https://www.theintrinsicperspective.com/p/dumbo-could-already-fly", "published_at": "2026-05-23 12:28:22+00:00", "updated_at": "2026-06-05 18:22:22.728319+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-research", "large-language-models"], "entities": ["OpenAI", "Paul Erdős", "Timothy Gowers", "Daniel Litt", "Scientific American", "University of Cambridge", "University of Toronto"], "alternates": {"html": "https://wpnews.pro/news/but-dumbo-could-already-fly", "markdown": "https://wpnews.pro/news/but-dumbo-could-already-fly.md", "text": "https://wpnews.pro/news/but-dumbo-could-already-fly.txt", "jsonld": "https://wpnews.pro/news/but-dumbo-could-already-fly.jsonld"}}