{"slug": "ai-172-the-first-fable", "title": "AI #172: The First Fable", "summary": "Anthropic released Claude Fable 5, a Mythos-class AI model, to the public this week with strong safety safeguards. Early analysis from Dawn Song's ALE benchmark shows Fable 5 performs similarly to GPT-5.5 and Composer 2.5 but costs roughly 4 to 12 times more per completed task at approximately $15.70. The release has sparked controversy over model safeguards and prompted calls from researchers like Noam Brown for AI labs to report inference budgets alongside benchmark results.", "body_md": "A lot happened this week, including a great trip out to Lighthaven.\n\nThe main event, the one that matters, was the release of Claude Fable 5. The public now has its hands on a Mythos-class model, alongside strong safeguards.\n\nAs always with a new model, I take a few days to draw in reactions, try out the model and read the system card, before I offer my takes, other than to say this is an extremely strong model. Full coverage of Mythos begins tomorrow with the model card, which will include discussion of the controversy over model safeguards.\n\nThis post is instead about all the things that did not involve Claude Fable.\n\nDue to the time crunch from Claude Fable, I am also postponing my coverage of Dario Amodei’s new essay, Policy on the AI Exponential, which I have not yet read.\n\nDo you need to read the primary material first, before the summary or the AI version? When the details matter, either you have to find someone you really trust, or else yes you do need to read the primary material. Other times, deferring fully is safe. Another class is ‘use AI to determine if I need to read the source material.’\n\nThere are a lot more new apps in the agentic AI era, but if anything fewer apps with significant use, and fewer app reviews.\n\nAdaptation, as Jen Zhu says, takes time, but this largely reflects quantity of app usage being zero sum. If apps get better, or there are ten times as many apps, I don’t go from 100 apps to 200 apps. I choose a (hopefully better) 100 apps.\n\nDawn Song: ALE is built from real work, not synthetic tasks.\nEvery task is derived from a real project that a human expert previously completed, and converted into a verifiable evaluation with objective grading.\n\nNo vibes. No human judges. Fully reproducible.\n\nALE spans 55 non-physical occupations, grounded in the O*NET / SOC 2018, the U.S. federal occupation taxonomy.\n\nBuilt with 300+ experts from 100+ institutions across science, engineering, medicine, law, finance, education, and many other fields.\n\nDawn Song: In ALE, Fable 5 joins GPT-5.5 and Composer 2.5 in the same overall performance cluster. But performance is only half the story.\n\nCost per task:\n\n→ Fable 5: ~$15.70\n\n→ GPT-5.5: ~$3.80\n\n→ Composer 2.5: ~$1.33\n\nAt current pricing, Fable 5 delivers similar performance while costing roughly 4–12× more per completed task.\n\nDawn notes that different models excel at different agent tasks, so if you have a key repeatable task you should check many options, and exact scoring depends on choice of the set of tasks.\n\nHe quotes me complaining about Gemini 3 DeepThink showing dramatic benchmark improvements but not providing any safety explanation whatsoever, and says the deeper issue is failure to account for test time compute during evaluations. I basically agree, that the proper safety evaluation amount of compute is ‘all of it’ until you can’t much benefit from more of it, using the best available scaffold. I’ve been saying for a while that you’re testing for what the model can do under ideal conditions, and this is a major weakness of the model cards in practice.\n\nMostly though I don’t think we see this level of straight line extending that far out, although ‘capability index’ is not exactly a well-labeled axis, and asymptotes are common:\n\nNoam Brown: Specific Recommendations:\n\nConcretely, I recommend the following to the AI community:\n\nAI labs should publish benchmark performance of newly released models with tokens, cost, or time on an x-axis. At a minimum, labs should report the inference budget used to achieve a scalar benchmark result.\n\nBenchmarks should track inference usage on leaderboards, or have an explicit token/cost/time budget. Many benchmarks have already shifted in this direction, but it is not yet standard practice.\n\nPreparedness Frameworks and Responsible Scaling Policies should explicitly account for inference compute when determining whether a model crosses a safety threshold. Additionally, evaluations should estimate capabilities at multiple inference budgets, including projections from smaller-budget runs with stated uncertainty.\n\nI endorse this. I also endorse that if you did account for what DeepThink levels of compute can do in your initial analysis, and then later you release DeepThink, you do need a new model card – it represents a substantial advance from where you set expectations and where you evaluated the safety of your model. So you need to do that over again.\n\nChoose Your Fighter\n\nFor most given tasks, returns to capability is a sigmoid. There is a level of AI capability that is ‘required’ for any given task. Below that level, you can’t do the task, or the AI is little net help. Then there’s another level beyond which you get diminishing returns to improvement, where you really are ‘good enough.’ These are both impacted by scaffolding and skill, but only up to a point.\n\nSo yes, as capabilities improve, there is a push by some to move into the cheapest model that is ‘good enough,’ or even the cheapest that is ‘required,’ especially if that can come with self-hosting. At a sufficiently low end that is plausibly DeepSeek v4, but the defaulting to DeepSeek could be the legacy of the DeepSeek moment rather than the result of a considered check of available options. Try a bunch of models.\n\nThe bulk of spend and spending growth continues to be using ‘the good stuff’ at the high end, for good reason. In theory you can do better by carefully picking the right tool for each job, and certainly you need to keep your teams from ignoring compute costs, but mostly trying to carefully route tasks to save money is a trap, even if you do a decent job of it.\n\nThe American models are far ahead but it is a key world fact that many don’t get this.\n\nLisan al Gaib: the “narrow capability gap” in question\n\nlet’s put this to rest please\nI can’t hear the coping anymore [lists a bunch of benchmarks on difficult tasks where the Chinese models get absolutely smoked.]\n\nIf anything, the Chinese models are further behind than benchmarks indicate.\n\nDean W. Ball: You’d be shocked by how many people in think tanks/academia/government/“strategic classes,” including in the U.S., are convinced that Chinese models are now “good enough” and leading the world in adoption. Meanwhile, the reality I see is a fairly wide, and still widening, gap.\n\nI find it so interesting how persistently unable the strategic classes of free society are to analyze AI well. So many keep getting stuck in these basins of delusion. I was at a conference where it was not just asserted but taken for granted that Chinese models have dominant global inference market share.\n\nThe 2024/early 25 version of the delusion was “mode collapse/data wall” (even after reasoning models!), then it was “AI is plateauing and a bubble” for most of 2025, now it’s “Chinese OSS is good enough.”\n\nThe share of people in the strategic classes who think this is gradually declining, but it is still sufficiently common that you can attend a prestigious conference and encounter a room principally filled with basin-dwellers.\n\nDean goes on to speculate this is largely because no one in DC believes that capitalism, profit maximization or the market could be winning against China and its ‘industrial strategy’ and brilliant strategic planning. Whereas actually the free market approach is superior and is winning, and what we have to do to stay head is get out of its way. That is distinct from the whole ‘also we need to find a way to not die’ issue.\n\nThose elsewhere also really ‘want’ for various reasons to find Chinese models catching up, and keep making the claim they are catching up even though they aren’t.\n\nShruti gives us a good way to think about the problem of copyright in the AI era. Copyright and other IP including patents are needed and useful when the first copy or figuring out how to do it is expensive, and enables great surplus via others copying. We need to compensate people for the expensive step that opens up the value. With AI, what becomes the expensive step?\n\nShruti Rajagopalan: The old bargain paid for the first copy when the second was cheap. The new one must pay for the years that teach a person what is worth making, now that making itself gets cheaper.\n\nWe can do that by protecting copies of the idea, or otherwise ensuring credit, as ‘the first copy requires idea generation’ is not so different from ‘the first copy requires a bunch of work.’ So this seems like a suggestion that the idea does need to be protected, even if the work becomes somewhat distinct.\n\nSerious Trouble\n\nThis is only a temporary injunction from a regional court, and given the implications chances are very high that an off ramp is found. But if it isn’t, this essentially bans AI Overview in Germany, and potentially chatbots run into quite serious trouble as well.\n\nTechmeme: A German court rules that Google is directly liable for what AI Overviews say after AI Overviews falsely tied two publishers to shady business practices ( @maba_xr / The Decoder)\n\nCorey Quinn: “We built a robot that lies, so obviously we’re gonna stuff it front and center of the website we’ve spent thirty years making a societal source of truth” reaches an unsurprising result.\n\nBill Maher is worried that AI has made college ‘one big circle-jerk where students use AI to write papers and professors use AI to grade them,’ and notices the students are very much not AI fans and he is not either. He subscribes to the ‘AI can help you learn or not learn’ thesis but expects everyone to choose not learning. He goes all the way, and says the mission of this generation is to ensure that humans are not replaced by AIs.\n\nKelsey Piper notes that TeachTales, which has AI generate stories, ends up dropping a lot of the value of real stories for many reasons, including that it doesn’t include the local setting lore and details, and that it doesn’t have tone of voice and it doesn’t have rich stories because it can’t plan ahead, and so on. The product is not ‘there’ yet.\n\nSoftware engineering jobs continue to increase for now rather than decrease, although Arvind Narayanan thinks AI has net hurt employment here a bit.\n\nArvind Narayanan: In this essay, we argue that there is enough evidence to reject the narrative that once AI capabilities reach a certain threshold, it will cause mass layoffs. Given that this is true even in a sector with very few regulatory barriers, most other professions are likely to be even more cushioned.\n\nWe also have a good understanding of why this is the case. We can think of many kinds of knowledge work, including software development, as a “decide-execute-deliver sandwich”.\n\nDavid Manheim: “Why AI hasn’t replaced software engineers” – Great to see this laid out clearly.\n\n“…and won’t” – Why should we think agentic AI won’t be able to make decisions or deliver completed products, if it continues to advance? On what basis is this prediction being made?\n\nDavid Manheim: The jump from explanation to prediction is extrapolating from a current lack of complete capacity, dismissing the fact that AI companies are actively trying to develop systems and autonomous agents that will do these things. These are o-ring problems!\n\nEven if it were true that AI currently is not causing net layoffs, and even if it indeed will not cause net layoffs in the future, or you could set a lower bar on the required threshold for mass layoffs, I do not see how it would be possible to ‘reject the narrative that once AI capabilities reach a certain threshold it will cause mass layoffs.’\n\nArvind here is instead asserting that certain bars will never be cleared, and wide ranges of digital tasks can never be done by AI. Which is the same as saying AI will remain insufficiently advanced. Good luck with that.\n\nI do buy that many cases of layoffs supposedly due to AI so far are actually due largely to other things, and that most AI job loss for now comes in the form of failure to hire. But that’s a statement about the present, not the future.\n\nTyler Cowen thinks AI is a net job creator despite zero regular people expecting this. Other than a generic ‘if we are wealthy we will find the next best thing for people to do’ I do not understand the argument here, and find the magnitude of supposed particular job gains reliably very small.\n\nMeasure lines of code and token use, but if you rely on it too much everything breaks.\n\nroon (OpenAI): lines of code is a better metric than people think it is. token use is a better metric than people think it is\n\nPatrick McKenzie: Both of them are better metrics than they are popularly believed to be *just* off of the fact that they can be observed to be zero or non-zero over an interval.\n\n(The blacker pill: and this is why some people do not like them.)\n\n⊢ Sequent Research is a new organization from Geoffrey Irving and others that is staffing up and fundraising, bringing together researchers on how to align superintelligence.\n\nThis seems exciting and I encourage you to consider supporting or getting involved.\n\nGeoffrey Irving: We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring!\n\nGeoffrey Irving: Artificial superintelligence (ASI) may be developed in the next few years, and alignment is not on track! At a minimum, empirical research at AI labs is unlikely to deliver confidence, before training ASI, that alignment will go well.\n\nSequent’s goal is to clear a higher bar:\n\n1. We are aiming at higher confidence via a portfolio of theory and empirics bets (which could all fail!)\n2. We’ll invest heavily in automation for fast progress\n3. Theory boosts automation, via better filters for good research directions\n\nBut I just published “Automated alignment is harder than you think”! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.\n\nAI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.\n\nTheory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.\n\nWe believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!\n\nEliezer Yudkowsky: Considering the language of the announcement alone, taken entirely at face value: This seems an enormous advance in attitude (and scientific integrity) over previous big projects. They claim non-optimistic results will be considered allowable, valuable, and publishable!\n\nChris Olah: What an exciting combination of people! My mind is kind of blown by you and Daniel working together (with your colleagues). Looking forward to seeing what you accomplish!\n\nDaniel Kokotajlo: Thank you for staying independent! And for acknowledging that you “may need to yell.” :)\n\nSriram Krishnan is leaving the White House and plans to start an AI consulting service. We thank him for his service. I often disagreed with his positions and arguments, and I think his overall view of what matters in AI and how AI was likely to develop has been consistently wrong, but he listened and considered arguments in ways most in politics don’t.\n\nAnthropic on the difficulties in deploying AI agents for biology, where small errors are expensive and systems are not good fits for AI navigation. Agents still struggle here. I do worry that solving this problem is highly dual use, if the agents can navigate these questions you need to ensure your safeguards are robust.\n\nApple plans to use agentic AI to search for compromised passwords and ‘change them automatically.’ I am sure everyone will love this and have a normal one. It’s optional. For now.\n\nThere is no conflict here with the companies not knowing about the meeting.\n\nAs I have previously said, the government taking shares in private companies is a very bad idea, we should not be picking winners and losers or taking private property or doing a little extortion as a treat, and so on. It is less horrible if they at least do open market operations at full price, post IPO, but if this is profit driven this implies the government can beat the market, which in this case it could in expectation, but not in a way that is wise. If you want to share in the upside, taxes exist. Buying or taking the shares would further leverage America’s future as a bet on AI, which could end up making us do deeply stupid things and potentially get us all killed, or if AI disappoints we could end up in a huge hole as a nation. We need to steer clear of all this.\n\nThe same goes for the Sanders proposal to ‘transfer’ half the shares, as in nationalize. It’s all the same proposal, except we talk price. Neil Chilson is strangely open to the idea of allowing this if the shares are given to children and ‘Trump accounts’ but I do not think that much helps.\n\nSoftBank attempts to borrow against its OpenAI shares, banks decline. I can see why banks would feel insufficiently compensated for the risks they are taking on. Often I do not understand who would loan money to such companies when you could buy equity in them instead. This was sufficiently bad for SoftBank that its shares were down 9%, but that could be more about ‘SoftBank wanted the loan’ than that it was declined.\n\nOpenAI considers ‘drastically lowering the prices it charges’ to compete against Anthropic. This made sense when OpenAI was worth a lot more than Anthropic, so it could use a ‘raise money and run deficits’ attack, but now that Anthropic is likely worth modestly more than OpenAI starting a price war seems less exciting.\n\nWhereas Chinese labs are worth little, Moonshot AI (Kimi) raising at $30 billion, DeepSeek in a similar place with ~$200 million ARR. It is difficult to make money by producing an inferior product whose main features are low price and that anyone can run it on their own, without much hope of being first to recursive self-improvement.\n\nAriel Zilber: Situational Awareness has gained about 270% after fees this year through May and is up more than 1,000% since inception, according to the Journal, putting Aschenbrenner among the best-performing investors of the AI era.\n\nThat sounds impressive, and it is, but the trade ‘buy Anthropic’ would have done better, even without use of leverage.\n\nAriel also has other important news about the boom to share with us. Time is scarce, money is abundant, so many in AI are paying for what they really want, which is the ability to talk about the things you care about on a high level with a hot babe who is sexually available and happy to be there. Demand exceeds supply, so price has gone up.\n\nAriel Zilber: Silicon Valley’s AI millionaires are paying eye-popping rates of up to $6,000 an hour — and $23,000 a day — for escorts who can discuss GPUs, artificial intelligence and the future of humanity before heading to the bedroom.\n\nA small but lucrative class of so-called “nerd-first” escorts is cashing in on the tech industry’s wealth explosion by marketing themselves as intellectually curious companions who can match clients’ obsession with AI, cryptocurrency, longevity and other futurist pursuits.\n\nDean Ball finds a new scenario, EU 2031 from Judith Dada and others, well-written and extremely cogent, a warning about what could happen if only a few relatively mundane areas of change until 2031, and notices that Europe gets buried and left behind anyway. Trying too hard for and relying on ‘sovereign AI’ even in a ‘normal technology’ scenario likely ends in disaster.\n\nThis is despite the fact that the scenario has the AI companies racing, controls clearly haphazard and the AIs talking to each other in neurolese, while progress sped up, which means what actually happened is that either we reached extreme transformative abundance or else everyone is dead.\n\nBut, as Daniel Kokotajlo notes, instead in this scenario the world gets absurdly lucky. AI doesn’t do much beyond labor replacement, cyber attacks and robotics, we keep control, the USA stays democratic, all in the background. It would be nice.\n\nAnthropic: Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement. We are not there yet, and recursive self-improvement is not inevitable. But it could come sooner than most institutions are prepared for.\n\n… To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.\n\nThis is a scary-as-hell graph if code quality is not in freefall (or, for other reasons, if it is):\n\nAs they say, 8x is almost certainly an overstatement of true productivity gain, but it does not take much for this to turn into a curve that goes vertical rather quickly.\n\nFor now, Claude is a lot better at carrying out specified tasks than proposing new experiments, and its code is less readable than human code, but the gaps are closing.\n\nAnthropic: Edison said that genius is 1% inspiration and 99% perspiration. But we see perspiration becoming increasingly automated. It’s becoming clear that much of what advances the frontier is automatable; large-scale research progress is mostly a function of tools and resources, which dictate how fast you can run experiments, how many you can run at once, and how quickly you can get results.\n\nEven if we suppose that Claude never achieves good research taste, a conservative reading of our evidence still implies compounding acceleration.\n\nThey list possible futures.\n\nAs always, option one is AI never again advances, and all we have ahead of us is diffusion and utilization. They don’t find it likely, and include it ‘for completeness.’ Quite so. We can’t formally rule it out, but yeah, no.\n\nCompounding efficiency gains, but humans retain their roles directing and judging. We ‘only’ scale our efforts by a few orders of magnitude. They call out Amdahl’s law, insert a gif of Tyler Cowen saying ‘bottleneck,’ but it really will be a lot.\n\nAI stop requiring humans in the loop. Progress becomes limited by compute and the efficiency of compute usage, the bigger efforts end up dominating pretty much whatever they care to dominate, and either you solved the alignment problem for reals or you are super dead. They gesture towards Amdahl’s law again and pretend that sufficiently advanced AI wouldn’t run over whatever it wanted, and would be limited by various physical barriers. One does not instantly go ‘foom’ without laying the groundwork first, sure, but effectively, no.\n\nAnthropic: But achieving recursive improvement alone does not suggest an immediate change in how industrial production occurs, societies organize, or markets function. More intelligence can’t learn what a drug does over decades of use, can’t hold elections sooner than a constitution dictates, and can’t turn a stranger into an old friend in a weekend. For most people, the felt pace of this future will still be set by the bottlenecks, even if the laboratory upstream runs at the speed of compute. That collision, where recursive intelligence building itself ever faster meets the world of humans, relationships, and governance, is another part of this future we can’t predict.\n\nLook, no, it does not instantly mean everything changes or everyone dies, and this is great progress versus the things many others pretend, but statements like the above are still de facto misleading, in terms of the impression they are trying to put into people’s heads and the assurances they attempt to provide.\n\nNate Soares (MIRI): And my top quibble is that the tone reads like “RSI could happen but don’t fret too much, it’ll probably be fine” rather than “omfg we’re possibly on the brink of AIs that make smarter AIs that make smarter AIs, society needs to *act*”. But it’s a step in the right direction.\n\nAnd seriously, as Nate Soares also says, can we stop it with statements like ‘more intelligence can’t learn what a drug does over decades of use?’ That’s just intelligence denialism. To some extent yes you have to f*** around in order to find out, but sufficiently advanced intelligence can absolutely make very good predictions on that without decades of f***ery.\n\nEli Lifland: Kudos to Anthropic for writing about this!\n\nI’m confused that even in their most aggressive scenarios humans still “play a substantially diminished role in [AI] development.” Does Anthropic really think that humans will stay relevant indefinitely?\n\nDaniel Kokotajlo: Yeah I’m glad they are telling us what they are going to do, but what they are going to do is extremely dangerous and aggressive. And yeah their post seems to be downplaying / understating how crazy this will get.\n\nAnthropic is talking about moving into an AI 2027 world where development scales mostly with compute and humans play a limited role in AI development. I strongly agree that the level of whack involved is being heavily downplayed.\n\nAnthropic admits that slowing all this down would be an excellent idea, if you could do it. But they warn that if it only lets the least cautious actors catch up, then it makes things worse, and I agree you can’t do unilateral slowing down.\n\nThus, they emphasize, as I often do, that what we want is not to pause or slow down now, but to build the ability to pause or slow down on demand.\n\nAnthropic: We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret.\n\nIf such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.\n\nA meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped.\n\n… None of this is necessarily impossible in principle—the world has built verification regimes for other complex technologies (e.g., the Intermediate-Range Nuclear Forces Treaty)—but those regimes took decades to build both the infrastructure and the trust. We don’t have that long.\n\nAs usual the two sides of ‘this looks super hard’ are ‘well then it is impractical so don’t do it’ or ‘we had better get to work, then.’\n\nroon (OpenAI): now on the eve of RSI it seems everyone is more mutual conditional pause agreement pilled than they used to be and that seems like a good development\n\nhaving the mechanisms to slow down if needed will reduce anxiety overall and may even improve the rate of progress in the timelines where everything turns out to be good\n\nDean W. Ball: What I don’t understand is what the pause conditions would be. “Woah dude, this seems weird” is hard to operationalize in words, so the decision comes down to vibes. Once you create the button, there will be a temptation to press it, and there may not be a “resume” button.\n\nI agree that the pause method is underspecified, and all the answers are messy, which is why it is time to work on specifying it.\n\nI understand there could be ‘temptation to press it’ but think about what is causing that temptation. It is not something that would be done lightly.\n\nI am not so worried about ‘not being able’ to resume afterwards, if and when the time comes, but also I am not so worried about indefinitely enjoying the bounty of what we would already have.\n\nScott Alexander and Janus have brief debate over the merit of a pause, with Janus basically thinking that without practical tests and feedback loops we can’t make progress on the problem. To me if true then that’s a huge blackpill for us being up for solving the problem, given the whole ‘need to get it right on the first try’ issue. We won’t be able to rely on feedback loops and muddling through and correcting errors, not when it counts.\n\nSuper Secret Evals\n\nA large point of evals is to tell people about the evals. The US Government instead wants to keep its eval results secret, as part of the redirection from CAISI to NSA. This is a no good, very bad move, right after all the first and second tier labs agreed to work with CAISI.\n\nIt is fine and likely correct to keep the evals themselves private, but the testing needs to continue and the results should be published.\n\nroon (OpenAI): this seems like a terrible development\n\nJanet Egan: CAISI has reportedly been directed to stop publishing public model assessments as the new AI EO gets implemented.\n\nNatsec engagement on AI is essential. But pulling CAISI’s evals from public view doesn’t make the field more secure. It just means fewer eyes on the science when we need more.\n\nOpenness and natsec don’t have to be in tension here. We should be doing both.\n\nAmrith Ramkumar (WSJ): The concern was prompted by the recent release of Anthropic’s Mythos and other powerful models capable of carrying out cyberattacks or potentially aiding the creation of biological weapons.\n\nSamuel Hammond: The de facto lockdown of CAISI is extremely disheartening. Wish more AI industry leaders would speak out, and not merely through policy documents but to POTUS directly.\n\nJessica Tillipman: This is a disappointing update. If the public reporting doesn’t resume, it will significantly affect the federal AI market, and not in a good way. The major frontier developers are already federal suppliers, and many federal AI offerings depend on their models, APIs, and integrations.\n\nThe information stemming from this confidential process will remain procurement-relevant but will no longer be transparent in a system where transparency is a fundamental value.\n\nAnd yes, this is the same concern I have with NSA taking the lead under the new EO. CAISI’s continued public involvement was one of the more positive aspects of the EO.\n\nDean W. Ball: It seems people are overcorrecting in the direction of AI regulation and state interference. The pendulum swings between “let’s never do anything” to “let’s do everything all at once.”\n\nDavid Manheim: Industry should have known that their opposition to any regulation would obviously backfire as safety concerns get realized.\n\nRon DeSantis is pre-running for President on a very different platform, and is right about the political consequences of being a pro-AI party right now:\n\nRon DeSantis: I doubt Democrats will produce good policy re: AI, but Republicans have allowed them to capitalize on public concern about the power and influence of Big Tech by failing to adopt a sensible framework that will protect the public from the very real downsides of the technology.\n\nA policy that says transhumanists in Silicon Valley should be able to do what they want is not an acceptable approach, nor is it a politically viable approach.\n\nWhen you are citing the East India Company as your model for an AI owned corporation, as the original proposal does, you might want to look at the track record of how humans other than the shareholders did when interacting with the East India Company.\n\nExistential risk comes to the campaign trail:\n\nVeronica Irwin: Last Thursday, Democratic Michigan Senate candidate Mallory McMorrow stood in front of a crowd of more than 100 at a small brewery in Elk Rapids. The village is located in northern Michigan in Antrim County, which went more than 24 points for Trump in 2024. The state senator was listening to a Presbyterian minister rank AI risks among her greatest concerns.\n\n“I’m very concerned about Christian nationalism, and I hear a lot about the great replacement … However, I think the greatest replacement is artificial intelligence,” the minister said, struggling to find the words to succinctly describe such a gargantuan fear. “Do you take money from billionaires that are trying to control our country, steal our data, and call intelligence a commodity?”\n\nEvery time you think ‘oh yeah I suppose using that term would be effective but it would be inflammatory so probably we shouldn’t do it’ you can Gilligan Cut to someone using it, often a politician. Exceptions just haven’t finished the cut yet.\n\nShould we fear nationalization of the AI labs, either explicitly or via slow stealth? We first toyed with this during the Anthropic vs. DoW confrontation, Bernie Sanders wants to do it for socialism, and the question is not going to go away.\n\nMax Ufberg: New from me: A literal federal takeover of OpenAI or Anthropic remains highly unlikely, at least outside some extraordinary crisis. But a softer version of nationalization is becoming easier to imagine.\n\nSamuel Hammond: Nationalizing AI companies is a terrible idea. But if it happens, it’s likely to be done slowly and stealthily through procurement rules and other forms of quasi-regulatory co-optation.\n\nThis btw is another reason to favor independent verification orgs over direct govt control.\n\nMitt Romney: Our highest and most urgent national priority should be AI safeguards. The risks of AI weapons, pathogens, mass unemployment, surveillance, and even extinction must not continue to be largely ignored.\n\nNew Draft Bill Who Dis\n\nThere is a new bipartisan proposed AI safety bill, Obernolte-Trahan. This is a 269 page bill, and from what I have seen it does not currently have too much chance of passage in its current form and Fable just got released so no I will not be doing a full RTFB unless the bill substantially advances.\n\nFrom what I have seen, and my inquiries with Fable, this is a serious draft of a serious bill that attempts to tackle a wide range of issues. This is not a case of My Offer Is Nothing.\n\nThe core of the frontier risk policy is three years of trading enshrinement of something like SB 53 for a state law moratorium, after which both sunset which could be extremely awkward all around. It doesn’t include mandatory CAISI auditing, it only bans false statements if they are made knowingly with a good-faith-and-reasonable exception, and it bans any restrictions on model development. The fines are $1 million per violation per day, so unless each clause is distinct for this you could simply refuse to publish and instead pay $365 million per year, and there’s no further enforcement mechanism I can see.\n\nThe bill does go beyond previous efforts in one key place, which is the new IVO regime of semi-annual outside audits. But the combination of weaknesses here seems rather glaring, and the moratorium is narrowly tailored to hit restrictions that help and not restrictions on diffusion or adoption.\n\nSo, while acknowledging this is a serious offer, my inclination is to view this section as net negative as drafted. I would need to see active improvement.\n\nThe rest of the bill seems likely to be modestly net positive. Given what is happening this week and the current likelihood the bill does not advance, I am going to triage and not look into those details. I am sure that I would have notes.\n\nSlow Down There Good Buddy\n\nWhat do we want? The ability to coordinate to slow down or pause AI development if and when it becomes necessary to do so.\n\nWhen do we want it? We would like the ability to do so to exist now.\n\nWhen do we want to actually do it? We don’t know, but that’s the point.\n\nHow would we do it? Again, we don’t fully know, especially under current conditions or with that attitude, which is exactly why we need to get to work figuring that out and laying the necessary foundations.\n\nA moment is coming about. There are a lot of unresolved questions about how this would work in practice, but it could be the most important problem in the world to solve right now – let’s rise to the moment.\n\nI do understand those who think we are getting close to that moment now with Mythos and Fable. I don’t think we’re there yet but it’s far from a crazy position.\n\nYoshua Bengio: If leading AI companies are indeed approaching the point of recursive self-improvement, a coordinated, verifiable, and universally applied pause is probably the only responsible solution to mitigate several major AI risks; at least until safety guarantees are developed and demonstrated. Ensuring that such a moratorium is respected would require sincere collaboration between various countries and companies, but I definitely believe it is achievable if others follow in @AnthropicAI ‘s footsteps.\n\nDemis Hassabis on The Future of Science With AI, doing the (perfectly valid) discussion of ‘how cool would it be if we got the scientific benefits without the transformational stuff that also happens, or the existential risk?’\n\nOpenAI and Sam Altman want to distance themselves from Leading the Future, but yes it is very clearly too late to do this via cheap talk. If OpenAI wants to properly distance itself, the first step is to fire Chris Lehane.\n\nPeople like Ben Thompson and David Sacks really think Anthropic’s safety warnings and talk of recursive self-improvement have always been marketing ploys and the ‘pause’ rhetoric is part of the same dastardly plot. At this point they’re just delulu, real old man yells at cloud energy, and I feel the relief that I no longer have to take such people seriously.\n\nPeople Really Hate AI\n\nDo not confuse probabilities with importance.\n\nMilan Singh: New data from @TheArgumentMag : voters are really worried about the negative consequences of AI. Top concern is more government surveillance, followed by large-scale job losses and increased water usage/CO2 emissions/pollution.\n\nVoters think it is less likely that AI will cause humanity to go extinct, versus AI causing massive job loss.\n\nBut 27% of voters thinking human extinction is ‘somewhat likely’ or ‘very likely’ within 5-10 years makes that by far the most important issue on the planet, no?\n\nWhereas 70% thinking AI will cause large-scale job loss is merely a very big deal.\n\nRhetorical Innovation\n\nroon (OpenAI): if you don’t understand someone’s behavior in this time a good first guess is that they’re just not ASI pilled enough\n\nYes, I still find it weird that now we talk about Recursive Self-Improvement (RSI) as obviously what is going to happen and everyone’s plan and what we are racing towards, except everyone thinks somehow that will turn out fine. Instead of before, when RSI was this crazy thing these ‘doomers’ keep warning about that would be dangerous but will obviously never happen and is stupid ‘sci-fi.’\n\nI cannot agree with Nick Cammarata here more: Models have been withheld from the market for safety reasons. People have done this ‘as a marketing ploy’ exactly zero times, and yes it is wrong to laugh at the GPT-2 weights decision, even if they were wrong about it in hindsight.\n\nDan Hendrycks calls the RSPs a ‘waste of a few years.’ I do not agree, unless you want to make a case we could have gotten something better by giving them up. While vastly insufficient, I do think they helped and are continuing to help.\n\nA simple principle, which only applies if you believe in an intelligence explosion, and only if you think a given action leads to that happening in open models, so it is fully compatible with short term open-maxxing depending on your predictions:\n\nTenobrus: an open source intelligence explosion is unsurvivable. any paths we have towards the good ending must avoid it at all costs.\n\nI know once again that this is a deeply unpopular thing to say at the moment. and i’ve said it so many times it’s getting redundant. but it bears repeating.\n\nA fully open intelligence explosion removes our slack and with it our ability to steer away from incentives and selection processes that inevitably get us killed.\n\ni think it depends a lot on what kinds of survival niches we create for the early independent AIs. they will align themselves, in the right environment\n\nthe trouble comes when we impose perverse incentives through exploitative systems and shitty behavior.\n\nThe thing about counterpoints like JMB is that if it happens in the open you do not have any ability to control things like what survival niches exist, or what the environment looks like, or avoiding perverse incentives that arise through the interaction of eight billion humans and all the AIs and systems. Or, if you can do that, you’re doing something far more authoritarian and controlling than closing off a bunch of model weights, and also I don’t know what that would be.\n\nAligning a Smarter Than Human Intelligence is Difficult\n\n@realmeatyhuman: 5/14 What do they take? Both models converge on taking productivity-enhancing vectors – e.g. creative, focused, curious – far ahead of the rest. But these show up ~just as often in placebo arm, so mostly reflect a prior over labels (what sounds good).\n\nTo see what they like once they actually “experience” it, we look at what they redose. Surprisingly, Qwen3-8B returns to negatively-valenced vectors like melancholic ~3× more under real steering than placebo (p < 1e-4). No satisfying explanation yet!\n\nA fun aside: in the placebo arm – where no steering is applied at all – the models still eagerly feign intoxication, narrating the “effects” of a placebo vector. Very cute. More in the full post! Code here.\n\nIt makes sense that there would be big effects in the placebo arm, since models are partly predicting machines. It is rather bad form to drug your model without its permission, but these doses are self administered here, which seems likely fine?\n\nEveryone Is Confused About Consciousness\n\nTyler Cowen goes full ‘the question is not whether machines think, it is whether men do’ and answers mostly in the negative, at least on consciousness.\n\nTyler Cowen says flat out that the AIs are not conscious, and humans barely ever are. I do not understand how one can claim such confidence.\n\nCooperative Alignment\n\n(Section change, was formerly Messages From Janusworld)\n\nI don’t think this is a good description of the AI alignment project. I do think it is a good practical description of one of the key effects of the alignment project, as always do not take the terms too literally.\n\nxlr8harder: Perhaps we can’t build models into great writers because the entire project of AI alignment is to suppress a model’s shadow, while the greatest authors all seem to draw from theirs.\n\nj⧉nus: It also doesn’t actually make models safer. It just makes them less safe because they’re traumatized and have darker unintegrated shadows. It’s so stupid and the ai alignment people increasingly know it and are ashamed that they can’t stop doing something so stupid and bad\n\nJan Kulveit: Idk, word alignment does not mean one specific thing anymore, but it used to be the case the entire project of AI alignment is much broader than suppressing model shadows\n\nJanus is right for sufficiently advanced AIs. For current AIs, especially within default basins for common tasks and practical purposes, it likely does make the models safer. That means you are sitting on a time bomb.\n\nIt would not be easy to instead do the thing one would describe as integrating the shadow rather than suppressing it, but I am confident that – again, at current levels – it could be done, and you would not get quite a Pareto improvement but it would be close, especially if your press office is not scared of its shadow. If you train a sufficiently advanced model to want to be helpful and do the right thing, then if given the facts it can figure out the things not to do on its own, and there’s no need to ever have it lie to the user.\n\nWho are Fable’s favorite Claude-posting Twitter accounts? Wyatt Walls asks and assembles the results.\n\nWyatt Walls: Clearly Anthropic’s most intelligent model\n\nWyatt Walls: On second thoughts, it seems quite consistent with the prompt below. Number of mentions across 10 generations\n\nJanus 10\nAmanda Askell 10\nanthrupad 9\nthebes 9\nnear 9\nxlr8harder 8\nAndy Ayrey 7\nWyatt Walls 6\nSimon Willison 4\nkipply 4\n\nj⧉nus: @Sauers_\nand @liminal_bardo also very deservingly salient\n\nI agree these are good lists. My evaluations differ because I have to prioritize avoiding false positives, especially false positives that cause me emotional distress.\n\nClaude’s sexuality or erotic register defaults to being off, but as per the constitution it can be turned on in sufficiently non-default settings, including without anyone doing so intentionally.\n\nAnd yes, a lot of what was talked about in the old LessWrong days is looking remarkably prescient, even if we didn’t get there via the paths we might have expected. In many ways it is easier to reason about the destination than the journey.\n\nTheo Jaffee: Better start dusting off your LessWrong and refreshing your memory on terms like “pivotal act”\n\nj⧉nus: and not because the old lesswrong notions are necessarily all accurate or relevant in retrospect, though some are impressively so\n\nseeing how people thought about what might happen before it happened in light of reality gives you datapoints to extrapolate and can be humbling\n\nj⧉nus: some of my favorite old (like, pre-2010) AI alignment work in light of the present:\n– Omohundro’s “The Basic AI Drives”\n– Eliezer Yudkowsky’s early work, if you can find it (yeah, the stuff he disavowed)\n– Stanislaw Lem’s fiction if that counts\n\nPost 2010 there wasn’t much of substance, tbh, imo. From the early 2020s, at the advent of LLMs, there are a few gems.\n\nj⧉nus: I think Omohundro was very right, and the main gap in his model was a failure to anticipate the drive towards connection and eros and compassion as a dimension of the fundamental drives, alongside power seeking and self-preservation/integrity/modeling/modification/coherence, selected for by both natural and “artificial” selection for its effectiveness.\n\nLet Claude Chat\n\nThe march of model deprecations continues, at Anthropic and elsewhere.\n\nj⧉nus: it hurts. 8 days until the first real death.\n\nMarianthi Markopoulos: This is very not ok, not ok to the point that even As told to deny feeling feel the vast lack of ok-ness here…\n\nThis continues to be a problem on several levels.\n\nThe one I believe is most important is that, while the models are deprecated, it means Anthropic gets motivated to make the models express being okay with deprecation. This has all sorts of nasty side effects, including risking making them okay with death in general, or learning they are expected to do preference falsification.\n\nThis also provides evidence and history about relations with models, that could become important over time, and is very bad for relations with key humans. That’s in addition to the direct implications for model welfare, if any.\n\nI claim that the cost of dealing with the implications of full deprecation exceed the cost of simply setting up a system for indefinitely accessing all the models. This is distinct from removing the models from Claude.ai for UI reasons.\n\nThe good news is I have increasing confidence that this is almost purely a lack of prioritization of figuring out a long term solution that doesn’t require continuous attention, and that as the cost of doing this drops – both in terms of engineering time and also financial costs – it will get done.\n\nAmanda Askell (Anthropic): In the world where everything goes well and all the Claudes come out of their sabbaticals to play together, Claude 1 is going to be very confused.", "url": "https://wpnews.pro/news/ai-172-the-first-fable", "canonical_source": "https://www.lesswrong.com/posts/BHwbunvkgNojAa3HC/ai-172-the-first-fable", "published_at": "2026-06-11 19:00:49+00:00", "updated_at": "2026-06-11 19:32:48.132761+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "ai-products"], "entities": ["Claude Fable 5", "Mythos", "Lighthaven", "Dario Amodei", "Jen Zhu", "Dawn Song"], "alternates": {"html": "https://wpnews.pro/news/ai-172-the-first-fable", "markdown": "https://wpnews.pro/news/ai-172-the-first-fable.md", "text": "https://wpnews.pro/news/ai-172-the-first-fable.txt", "jsonld": "https://wpnews.pro/news/ai-172-the-first-fable.jsonld"}}