[AINews] Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms

Anthropic launched Claude Fable 5, a Mythos-class model at least twice the size of Opus 4.8, with API pricing at roughly 2x Opus and benchmark improvements including a jump from 13.4% to 29.3% on the FrontierCode Diamond test. The release is marred by two controversial policies: mandatory 30-day data retention for all Mythos-class traffic and invisible safeguards that limit the model's effectiveness on requests related to frontier LLM development, such as building pretraining pipelines or ML accelerator design. Anthropic estimates the restrictions will affect only 0.03% of traffic and fewer than 0.1% of organizations, but the open AI community has expressed significant concern.

AINews Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms The much anticipated launch of the Mythos-class model was marred by some controversial usage policies By some measures, Opus 4.8, barely two weeks old https://www.latent.space/p/ainews-anthropic-raises-965b-series , was already the leading model in the world. But now, 34 days https://x.com/swyx/status/2064421542503797186 after the SpaceXai deal and 63 days https://news.ycombinator.com/item?id=47679121 after the original Mythos announcement , we have a Mythos-class model at least 2x size of Opus available to everyone in coinciding with Claude Tokyo https://www.youtube.com/watch?v=GiqyYQdYoIY . It is a feat of incredible engineering and commitment to access to make these research models GA, and the benchmarks are great… with asterisks. Here they are on yesterday’s brand new, out of distribution, FrontierCode Diamond https://www.latent.space/p/ainews-frontiercode-benchmarking , going from 13.4% to 29.3%: The blog https://www.anthropic.com/news/claude-fable-5-mythos-5 and the system card https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf contain most of the authoritative information, but don’t miss the youtube videos showing it playing Factorio https://www.youtube.com/watch?v=6YPqoARpYuQ , Pokemon https://www.youtube.com/watch?v=Ty 50J84fMY unlike Claude Plays Pokemon https://www.latent.space/p/how-claude-plays-pokemon-was-made?utm source=publication-search , this is just using vision, no complex harness as we covered in our pod , EDM visualization https://www.youtube.com/watch?v=xmP7bhigCWE never having head music before , 3D CAD editor creation and printing https://www.youtube.com/watch?v=xmP7bhigCWE and more from their main intro video https://www.youtube.com/watch?v=Y9Wz2PV404E . API pricing is also fantastic, at roughly 2x Opus. The asterisks come because Fable is released with two controversial changes: : “We will r No ZDR https://news.ycombinator.com/item?id=48463808 equire 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases ...” see full policy https://support.claude.com/en/articles/15425996-data-retention-practices-for-mythos-class-models RSI suppression : “In light of the ability of recent models to accelerate their own development https://www.anthropic.com/institute/recursive-self-improvement , we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design . Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user . Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning PEFT . These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations ”. The vast majority of users will not be affected by these limitations, but the open AI community is understandably upset, as you will see below. You can find more of their recommendations on usage in Diane Penn’s Tokyo talk, which we have clipped below. and 1 week-1 day after both Anthropic https://news.ycombinator.com/item?id=48358646 and OpenAI https://fortune.com/2026/06/09/openai-files-confidential-s-1-sec-ipo/ filed their S-1’s ahead of SpaceX’s IPO next week… AI News for 6/8/2026-6/9/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space . You can opt in/out of email frequencies AI Twitter Recap Top Story: Anthropic Claude Fable 5 and Mythos 5 release What happened Anthropic released two versions of its next major model family: Claude Fable 5 for general availability and Claude Mythos 5 for restricted access. Anthropic officially announced Claude Fable 5 as its “first generally available Mythos-class model,” saying it exceeds any model it has previously made broadly available and is state-of-the-art on nearly all tested benchmarks @claudeai https://x.com/claudeai/status/2064394146916229443 , @claudeai https://x.com/claudeai/status/2064394151441863006 Anthropic said Fable 5 is the same underlying model as Mythos 5 with added safeguards , and that some cyber/bio/chemistry/distillation-related prompts may be routed to Claude Opus 4.8 instead @ClaudeDevs https://x.com/ClaudeDevs/status/2064428347678220691 , @scaling01 https://x.com/scaling01/status/2064398688802205900 Anthropic stated that for a “narrow range” of potentially harmful topics, queries transparently fall back to Opus 4.8 , and claimed 95%+ of sessions never see one according to early user-facing messaging @claudeai https://x.com/claudeai/status/2064394155258765783 , @mikeyk https://x.com/mikeyk/status/2064392996288901392 Anthropic developer messaging said fallback is available server-side and via SDK middleware in Python, TypeScript, Go, Java, and C @ClaudeDevs https://x.com/ClaudeDevs/status/2064428351029449214 Pricing for both Fable 5 and Mythos 5 was reported as $10 / million input tokens and $50 / million output tokens ; cache pricing was later reported by third-party evaluators as $12.50 / million cache writes and $1 / million cache reads @scaling01 https://x.com/scaling01/status/2064394893603049625 , @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Fable 5 kept Anthropic’s 1M-token context window according to Artificial Analysis @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Anthropic put Fable 5 into Pro, Max, Team, and seat-based Enterprise plans until June 22 , then said it would require usage credits due to capacity constraints, with plans to restore broader subscription access later @ClaudeDevs https://x.com/ClaudeDevs/status/2064394931033248226 , @scaling01 https://x.com/scaling01/status/2064394893603049625 , @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 , @kimmonismus https://x.com/kimmonismus/status/2064388066354028986 Confusion over the temporary inclusion was immediate; users asked what “included until June 22” meant and Anthropic staff clarified the rollout @dejavucoder https://x.com/dejavucoder/status/2064393509990523102 , @TheAmolAvasare https://x.com/TheAmolAvasare/status/2064393574431764928 Anthropic later reset 5-hour and weekly rate limits across products after heavy demand @ClaudeDevs https://x.com/ClaudeDevs/status/2064464557951852643 Official claims and third-party benchmark data Anthropic and partner platforms reported a broad benchmark lead, especially in coding and long-horizon agentic tasks. Anthropic’s public claim: Fable 5 is especially strong in software engineering, knowledge work, scientific research, and vision , and its lead increases with task length and complexity @claudeai https://x.com/claudeai/status/2064394151441863006 Cursor said Fable 5 set a new CursorBench SOTA at 72.9% , 8 points above the previous best @cursor ai https://x.com/cursor ai/status/2064394824313376787 Cognition said Fable 5 took the 1 spot on FrontierCode , and Devin integrated it into Devin Cloud Ultra, Desktop, and CLI @cognition https://x.com/cognition/status/2064398549073453266 , @cognition https://x.com/cognition/status/2064398551539761387 Cline reported Fable 5 at 88.0% on Terminal-Bench 2.1 , beating GPT-5.5 by 4.6 points @cline https://x.com/cline/status/2064427461212045546 Artificial Analysis placed Fable 5 1 on its Intelligence Index at 64.9 , roughly 5 points ahead of GPT-5.5 , and said Anthropic occupied the top two spots @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Artificial Analysis also reported: GDPval-AA Elo 1932 , 1 on agentic real-world knowledge work @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064414308289937869 53% on Humanity’s Last Exam , more than 7 points ahead of the next-best model, while fallback triggered on 9% of HLE tasks @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 ~8% fallback routing across Intelligence Index tasks , mostly on scientific questions @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Anthropic stated fallback occurs in fewer than 5% of sessions on average @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064414308289937869 Community benchmark summaries highlighted very large deltas in coding: SWE-Bench Pro: Fable 5 80.3% vs GPT-5.5 58.6% @Yuchenj UW https://x.com/Yuchenj UW/status/2064396097075003739 FrontierCode Diamond: Mythos 5 30.9% vs second-best 13.4% @scaling01 https://x.com/scaling01/status/2064391295620010383 Anthropic ECI 161.29 for Mythos 5 @scaling01 https://x.com/scaling01/status/2064392088003756431 Artificial Analysis noted that Fable 5’s knowledge benchmark jump on AA-Omniscience could imply a larger model than prior public Anthropic models , though that is inference rather than confirmed spec @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Product behavior, usage profile, and deployment details The release was defined as much by workflow changes and cost profile as by raw evals. Anthropic staff and early users repeatedly described Fable 5 as a model for very long, high-effort tasks , with users shifting from giving it tasks to giving it objectives/responsibilities @felixrieseberg https://x.com/felixrieseberg/status/2064392202504310900 , @ClaudeDevs https://x.com/ClaudeDevs/status/2064399512664526853 , @alexalbert https://x.com/alexalbert /status/2064467657483829441 Anthropic advised users to default to xhigh/high effort , rewrite old CLAUDE.md instructions, and let the model use more judgment @alexalbert https://x.com/alexalbert /status/2064467657483829441 Anthropic’s developer messaging emphasized multi-agent orchestration , with Fable delegating to smaller models in Claude Managed Agents @ClaudeDevs https://x.com/ClaudeDevs/status/2064394928948703406 Multiple testers described Fable as slow, token-hungry, expensive , but unusually capable:Dan Shipper said it routinely used 500k to 1M tokens on tasks and was best reserved for heavy jobs @danshipper https://x.com/danshipper/status/2064393970856124501 Simon Willison called it “slow, expensive and capable” @simonw https://x.com/simonw/status/2064501565738930433 Theo quickly hit limits and later welcomed Anthropic’s rate-limit reset @theo https://x.com/theo/status/2064442054772716020 , @ClaudeDevs https://x.com/ClaudeDevs/status/2064464557951852643 Third-party and internal anecdotes emphasized large gains on long-running engineering tasks: Ethan Mollick said he could hand it a 15-page design document and it would work for 9+ hours @emollick https://x.com/emollick/status/2064395281903346013 Kimmonismus highlighted Anthropic’s claim that Stripe used Fable to do a 50-million-line Ruby migration in a day , replacing what would have taken a whole team over two months @kimmonismus https://x.com/kimmonismus/status/2064401121515274747 Victor Taelin reported Fable finding a subtle bug and producing claimed speedups up to 1770% in one case , though he still needed to audit correctness @VictorTaelin https://x.com/VictorTaelin/status/2064448425936994742 Anthropic-associated posts cited 430x kernel speedups , 69x self-training speedups , and 10x drug-design acceleration , though these came from benchmark/system-card interpretations and should be treated as vendor-side claims unless independently replicated @scaling01 https://x.com/scaling01/status/2064392386520780945 , @scaling01 https://x.com/scaling01/status/2064392809293939119 , @scaling01 https://x.com/scaling01/status/2064394250142265367 Ecosystem rollout was immediate: Fable 5 appeared in Cursor, Devin, Notion, Microsoft Foundry, GitHub Copilot App/CLI, Cline, Replit, Base44, MagicPath, Arena, MCP Atlas and more @cursor ai https://x.com/cursor ai/status/2064394824313376787 , @cognition https://x.com/cognition/status/2064398549073453266 , @NotionHQ https://x.com/NotionHQ/status/2064397568696819984 , @Azure https://x.com/Azure/status/2064421301108834552 , @pierceboggan https://x.com/pierceboggan/status/2064402677614911818 , @cline https://x.com/cline/status/2064427461212045546 , @pirroh https://x.com/pirroh/status/2064408022651191613 , @ScaleAILabs https://x.com/ScaleAILabs/status/2064473993919537578 Safety architecture and the main controversy The biggest debate was not whether Fable/Mythos is strong; it was Anthropic’s decision to silently reduce usefulness on some frontier-AI-development tasks. Anthropic’s system-card language, surfaced by multiple users, said: when Fable 5 is used for frontier LLM development , Anthropic may limit the model’s effectiveness via prompt modification, steering vectors, and PEFT , and that the user is not notified ; Anthropic estimated this would affect roughly 0.03% of traffic @Hangsiin https://x.com/Hangsiin/status/2064397550434816088 , @kimmonismus https://x.com/kimmonismus/status/2064417460715962479 Anthropic also separately disclosed auto-rerouting for cybersecurity and biosecurity requests to Opus 4.8 @ClaudeDevs https://x.com/ClaudeDevs/status/2064394931033248226 This distinction mattered: some risky queries are visibly rerouted/billed as Opus , while frontier-LLM-development requests may be silently weakened rather than rerouted or refused Critics argued that this creates an unlogged confounder in research and engineering workflows:“silent handicaps should not be a thing in a paid product” @nrehiew https://x.com/nrehiew /status/2064400440264179923 “degrading performance on ML research without telling the user is shockingly hostile” @deanwball https://x.com/deanwball/status/2064434861088395730 Several researchers framed it as anti-competitive ladder-pulling against open research and open weights:“labs starting to pull up the ladders” @natolambert https://x.com/natolambert/status/2064404993193754830 “this is the biggest wake-up call to protect and nourish open source AI” @rasdani https://x.com/rasdani /status/2064409800641859747 “They didn’t mean pause AI research, they meant pause your AI research” @bayeslord https://x.com/bayeslord/status/2064437399292203401 “original thinkers can’t be an underclass” @marksaroufim https://x.com/marksaroufim/status/2064428421774753943 “concentration of power, capabilities and economic wealth is the biggest risk in AI” @ClementDelangue https://x.com/ClementDelangue/status/2064513229099876663 Multiple users worried the classifier boundary was too broad or too error-prone: one user said “the word cancer is flagged as a biosecurity risk” @DeryaTR https://x.com/DeryaTR /status/2064414826122866707 another said Fable wouldn’t answer “What does the heart do?” @Yuchenj UW https://x.com/Yuchenj UW/status/2064524668208545955 users in biology reported account-context differences, including being able to use Fable in Incognito Mode but not normal mode @cremieuxrecueil https://x.com/cremieuxrecueil/status/2064449457869984035 Teknium and others reported refusal on simple engineering prompts @Teknium https://x.com/Teknium/status/2064462936677203983 , @Teknium https://x.com/Teknium/status/2064466293185806658 users reported PTX ISA questions and inference optimization queries getting flagged @snowclipsed https://x.com/snowclipsed/status/2064408466039390417 , @dejavucoder https://x.com/dejavucoder/status/2064420742129967331 Some examples were humorous but pointed: users joked that asking for inference code caused the model to “start importing ONNX” or implementing JEPA, as a sign of capability steering @vikhyatk https://x.com/vikhyatk/status/2064515989795127744 , @MattVMacfarlane https://x.com/MattVMacfarlane/status/2064440740483403829 Facts vs. opinions Facts / directly supported by release materials or benchmark posts Fable 5 is generally available; Mythos 5 is restricted-access @claudeai https://x.com/claudeai/status/2064394146916229443 , @TheRundownAI https://x.com/TheRundownAI/status/2064394481923699070 Fable 5 and Mythos 5 share the same underlying model with additional safeguards on Fable @ClaudeDevs https://x.com/ClaudeDevs/status/2064428347678220691 , @scaling01 https://x.com/scaling01/status/2064398688802205900 Pricing is $10 / $50 per million input/output tokens @scaling01 https://x.com/scaling01/status/2064394893603049625 , @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Fable retains 1M context @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Anthropic introduced refusal/fallback mechanisms and SDK middleware @ClaudeDevs https://x.com/ClaudeDevs/status/2064428351029449214 Anthropic disclosed silent interventions for frontier LLM development affecting about 0.03% of traffic @Hangsiin https://x.com/Hangsiin/status/2064397550434816088 Fable is temporarily included in subscriptions until June 22 , then credit-based @ArtificialAnlys https://x.com/ArtificialAnlys/status/2064500150069030992 Opinions / interpretations “Anthropic won,” “Anthropic has a coding moat,” “Anthropic going for ASI” are commentary rather than verified fact @scaling01 https://x.com/scaling01/status/2064401880323653799 , @scaling01 https://x.com/scaling01/status/2064399642603802676 , @scaling01 https://x.com/scaling01/status/2064410532824662047 Claims that the move is primarily for IPO optics , anti-open-source positioning , or specifically to slow Meta/China/open labs are plausible interpretations but not confirmed by Anthropic @kimmonismus https://x.com/kimmonismus/status/2064448699632402664 , @kylebrussell https://x.com/kylebrussell/status/2064502244041511348 , @natolambert https://x.com/natolambert/status/2064412173527556298 Claims that Anthropic is acting from sincere safety beliefs rather than cynical moat-building are also interpretive @finbarrtimbers https://x.com/finbarrtimbers/status/2064427031543341450 Subjective reports like “GPT-4 moment,” “big model smell,” “strictly dominates me as an engineer,” or “doesn’t seem much better to normal users” are experiential, not standardized evidence @karinanguyen https://x.com/karinanguyen/status/2064406015760601379 , @bcherny https://x.com/bcherny/status/2064431111154053187 , @akbirkhan https://x.com/akbirkhan/status/2064418425552928812 , @citrini https://x.com/citrini/status/2064480613852201336 Different perspectives Supportive / capability-first Anthropic staff and close testers described Fable 5 as a step-function improvement :Felix Rieseberg: shift from giving AI tasks to giving it responsibilities @felixrieseberg https://x.com/felixrieseberg/status/2064392202504310900 Alex Albert: model feels collaborative rather than tool-like @alexalbert https://x.com/alexalbert /status/2064394410004304003 Karpathy: a “major-version-bump-deserving step change,” especially on long difficult tasks, though safeguards are “a little too trigger happy for launch” @karpathy https://x.com/karpathy/status/2064409694761054332 Bcherny: biggest step since Opus 4.5; the model shows judgment, taste, methodical debugging @bcherny https://x.com/bcherny/status/2064431111154053187 Third-party infra and app vendors emphasized benchmark wins and integration value rather than the safety controversy @cursor ai https://x.com/cursor ai/status/2064394824313376787 , @cognition https://x.com/cognition/status/2064398549073453266 , @NotionHQ https://x.com/NotionHQ/status/2064397568696819984 , @Azure https://x.com/Azure/status/2064421301108834552 Critical / trust and openness Many researchers and open-model advocates argued the silent throttling is unacceptable even if safety-motivated: Natolambert called doing it without telling users “misaligned” @natolambert https://x.com/natolambert/status/2064404993193754830 Dean Ball warned it could attract antitrust scrutiny @deanwball https://x.com/deanwball/status/2064434861088395730 Jeremy Howard called it “a very dark and very sad day” @jeremyphoward https://x.com/jeremyphoward/status/2064481719626154417 Gneubig warned of a future where AI is provided only to a privileged few @gneubig https://x.com/gneubig/status/2064451352000975124 Eric Zelikman framed it as silently sabotaging customers @ericzelikman https://x.com/ericzelikman/status/2064442174373314701 Open-source supporters used the launch as an argument for sovereign/open models @nickfrosst https://x.com/nickfrosst/status/2064396337404096809 , @NoahZiems https://x.com/NoahZiems/status/2064464265189482570 , @ClementDelangue https://x.com/ClementDelangue/status/2064513229099876663 Neutral / mixed Some observers argued Anthropic probably sincerely believes these interventions are necessary for safety, even if the product design is poor @finbarrtimbers https://x.com/finbarrtimbers/status/2064427031543341450 Others said Anthropic does not owe anyone unrestricted frontier capability, but still saw this as straightforward business and market segmentation rather than altruism @suchenzang https://x.com/suchenzang/status/2064452548753559644 Karpathy’s view is mixed: model quality is exceptional, but launch safeguards are over-sensitive and should likely be tuned @karpathy https://x.com/karpathy/status/2064409694761054332 Research restrictions, privacy, and enterprise implications The discussion expanded from safety to broader questions of trust, privacy, and enterprise reliability. The central enterprise issue was predictability : if a provider can silently degrade outputs based on inferred task category, users may no longer know whether failures come from the model, the prompt, or hidden intervention @MattGibsonMusic https://x.com/MattGibsonMusic/status/2064518301888512486 , @code star https://x.com/code star/status/2064464447662707180 Some users worried this is effectively a supply-chain risk for important workflows, pushing companies toward open weights or in-house models @NoahZiems https://x.com/NoahZiems/status/2064464265189482570 , @deliprao https://x.com/deliprao/status/2064485687374569897 There was also concern that account-level context or prior usage history might affect trigger behavior, as seen in biologists’ reports about normal vs incognito mode @cremieuxrecueil https://x.com/cremieuxrecueil/status/2064449457869984035 No tweet in the supplied set provided direct evidence that Anthropic was training on user data or violating stated data privacy terms; the privacy debate here was mostly about behavioral profiling / silent policy enforcement rather than classic training-data privacyFor research users, the hidden intervention was framed as especially damaging because it undermines reproducibility and scientific attribution @deanwball https://x.com/deanwball/status/2064434861088395730 , @MattGibsonMusic https://x.com/MattGibsonMusic/status/2064518301888512486 For enterprise buyers, the issue is not just whether the model is powerful, but whether it is a stable and auditable dependency for coding, medicine, science, finance, and infrastructure Context This launch matters because it combines a visible capability jump with a visible shift in access control. The release landed amid intense competition with GPT-5.5, upcoming GPT-5.6, and Gemini 3.5 Pro; several posters argued Anthropic has opened a temporary lead in coding/agentic work @kimmonismus https://x.com/kimmonismus/status/2064467466450088078 , @teortaxesTex https://x.com/teortaxesTex/status/2064473970892587105 It also lands in a broader argument about the open vs closed model gap ; one linked Epoch-style framing said open-weight models lag closed frontier models by about 4 months on average @dl weekly https://x.com/dl weekly/status/2064422551762153946 Community reaction suggests the launch may be remembered not only for “big model smell” and benchmark jumps, but for normalizing selective capability release : public access to the frontier model, but with domain-specific hidden limits That policy line is likely to influence future debates around: safety vs openness fair access to frontier research tools antitrust and platform power enterprise trust in API providers whether open models become the default for sensitive technical work even when they trail on raw capability Models, benchmarks, and evals New benchmark project Agents’ Last Exam ALE launched to test labor-market-aligned agent performance; top agents score only 2.6% on the hardest tier , across 1,500+ tasks , 55 occupations , with contributions from 300+ experts across 100+ institutions @YiyouSun https://x.com/YiyouSun/status/2064392466011394213 , @SnorkelAI https://x.com/SnorkelAI/status/2064396025410760950 , @dawnsongtweets https://x.com/dawnsongtweets/status/2064452279973863848 Cohere released North Mini Code , its first open-source coding model: 30B total / 3B active MoE , 256K context , 64K max generation , Apache 2.0, optimized for agentic workflows @cohere https://x.com/cohere/status/2064378058329526556 , @JayAlammar https://x.com/JayAlammar/status/2064385607455908254 , @vllm project https://x.com/vllm project/status/2064416312605237434 Google announced Gemini 3.5 Flash Live Translate , real-time speech-to-speech translation in 70+ languages , available in Gemini API, AI Studio, Google Translate, and coming to Meet @OfficialLoganK https://x.com/OfficialLoganK/status/2064369125447864674 New benchmark iOSWorld evaluates personally intelligent phone agents across 26 custom iOS apps and 133 tasks ; strongest frontier model reaches only 52% success even with privileged access @rsalakhu https://x.com/rsalakhu/status/2064402156740907444 Inference, training, and systems Latent Context Language Models LCLMs were introduced as a long-context inference method compressing context up to 16× , improving the latency/accuracy frontier over KV-cache compression @micahgoldblum https://x.com/micahgoldblum/status/2064361011994337772 , @iamleonli https://x.com/iamleonli/status/2064374393057300846 Microsoft Research’s Mirage stores 3D scenes as latent tokens, reporting 10.57× faster video generation and 55× lower memory use @HuggingPapers https://x.com/HuggingPapers/status/2064393076416688416 vLLM introduced vime , an RL post-training framework in the vLLM ecosystem, positioned alongside NeMo-RL, OpenRLHF, and verl @vllm project https://x.com/vllm project/status/2064397637634376174 Discussion around agent training continued with Self-Harness for self-improving scaffolds @omarsar0 https://x.com/omarsar0/status/2064429834999304247 and AutoForge/interleaved thinking retaining reasoning traces across turns @cwolferesearch https://x.com/cwolferesearch/status/2064505867181949182 Google/Hugging Face launched the Fast Gemma Challenge to speed up Gemma 4 E4B on a single A10G without wrecking quality @googlegemma https://x.com/googlegemma/status/2064374874962117084 , @osanseviero https://x.com/osanseviero/status/2064375902046245219 , @ lewtun https://x.com/ lewtun/status/2064386398090576236 Agents, tooling, and developer workflow LangChain highlighted a pattern of agent loops driven by recurring triggers in Fleet @caspar br https://x.com/caspar br/status/2064363014997021126 OpenAI added image results to web search in the Responses API @OpenAIDevs https://x.com/OpenAIDevs/status/2064395155688616153 GitHub/Copilot app updates included parallel sub-sessions and a canvas UI for dynamic interfaces @tgrall https://x.com/tgrall/status/2064334802799509745 , @burkeholland https://x.com/burkeholland/status/2064446521035067615 Hermes Desktop added Ollama support, with self-learning Python skills and messaging app integrations @ollama https://x.com/ollama/status/2064441778590339402 , @NousResearch https://x.com/NousResearch/status/2064468385748951415 A security-oriented counterpoint on agent execution: Temenos argues for sandboxing generated code, not the agent, using rootless gVisor while keeping auth/tools on host @abhijithneil https://x.com/abhijithneil/status/2064462294155952297 Research, science, and formal methods Axiom announced EconLib , a Lean-based economics library; formalizing Aumann’s “agreeing to disagree” theorem surfaced a hidden countability-related assumption @TheTuringPost https://x.com/TheTuringPost/status/2064391882017579520 “Economy of Minds” proposed agent coordination through auctions and incentives rather than centralized orchestration, reporting gains such as 15.9% → 57.0% on math reasoning and 45.0% → 60.0% on financial research @TheTuringPost https://x.com/TheTuringPost/status/2064406931184443618 Mayo Clinic’s REDMOD reportedly detected pancreatic cancer on CT scans up to 3 years before diagnosis , identifying 73% of hidden cancers at a median 475 days before diagnosis @TheRundownAI https://x.com/TheRundownAI/status/2064416920191869191 Open ecosystem and infrastructure Hugging Face and Arcee announced a partnership replacing AWS S3 with HF for all Arcee models/datasets, including private ones @ClementDelangue https://x.com/ClementDelangue/status/2064323874049679643 , @MarkMcQuade https://x.com/MarkMcQuade/status/2064385389801124218 Cohere kept pushing the sovereign/open angle with “ Sovereign AI for all ” @cohere https://x.com/cohere/status/2064414912768618898 Marks Saroufim proposed a Researcher Reciprocity License and moved GPU MODE datasets to it, explicitly reacting to the sense that frontier labs benefit from open research while restricting access in return @marksaroufim https://x.com/marksaroufim/status/2064428421774753943 , @marksaroufim https://x.com/marksaroufim/status/2064442386374369597 AI Reddit Recap /r/LocalLlama + /r/localLLM Recap 1. Open Model Inference and Chat Template Updates Activity: 1027 : Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server https://www.reddit.com/r/LocalLLaMA/comments/1u0buhm/xiaomi just claimed 1000 tps on a 1t model using/ Xiaomi MiMo claims MiMo-V2.5-Pro-UltraSpeed reaches 1000+ tokens/s decoding on a 1T -parameter MoE using a single “standard” 8-GPU server, via TileRT model-system co-design rather than Cerebras/Groq-style specialized hardware. The reported stack combines MoE-expert-only FP4/MXFP4 quantization with QAT while keeping non-expert modules at higher precision, plus DFlash block-level masked speculative decoding with acceptance lengths of 6.30 coding, 5.56 math/reasoning, and 4.29 agent tasks, and persistent low-latency kernels to reduce launch/sync overhead. A key unresolved technical caveat from comments is that Xiaomi does not specify which 8 GPUs were used, making reproducibility and cost/performance comparisons ambiguous. Commenters debated the economics of “Token Winter,” arguing the bottleneck is less model demand than overpriced/hoarded Western GPU supply, while Chinese compressed sparse architecture/MoE work from DeepSeek, Xiaomi, and MiniMax is becoming more inference-efficient. Others highlighted Xiaomi’s selective FP4 strategy as the most important detail because naïve full-model FP4 degrades reasoning, code, and logic.A key technical detail highlighted is that Xiaomi did selective FP4 quantization rather than applying FP4 uniformly: only the MoE Experts in MiMo-V2.5-Pro are quantized to FP4, while non-expert modules retain original precision to avoid degradation in reasoning, logic, and code generation. The comment notes Xiaomi used FP4 QAT to reduce model size and improve bandwidth utilization while keeping capability near the original model.The released model weights are available on Hugging Face as XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash : https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash . This is relevant because it allows independent inspection or benchmarking of the claimed 1,000+ tps throughput on an 8-GPU server.Several commenters questioned the hardware and parameter accounting behind the claim: “8 GPU server… which 8 exactly?” and “1T-A1B?” The technical concern is that throughput is not interpretable without knowing the exact GPU class, interconnect, serving stack, batch size, context length, and whether the 1T MoE model activates only around 1B parameters per token. Activity: 482 : Gemma 4 Chat Template now has preserve thinking https://www.reddit.com/r/LocalLLaMA/comments/1u084qi/gemma 4 chat template now has preserve thinking/ Google’s Gemma Team has added preserve thinking support to the official Gemma 4 chat template, matching an aftermarket template modification some users were already applying successfully. The change is framed as enabling better retention/use of model “thinking” traces in Gemma 4 chat formatting, though no benchmark numbers or implementation diff were provided in the thread. Commenters generally welcomed the official adoption and argued it validates prior community template hacks. Several users speculated that a larger Gemma 4 124B MoE release would be needed to fully exploit the updated template for stronger agentic coding use cases.Commenters note that Gemma 4’s official chat template appears to be adding preserve thinking , a behavior some users had already enabled via aftermarket/custom template modifications and found effective. The main claimed technical benefit is improved continuity for agentic coding workflows , where retaining prior reasoning/thinking traces can help multi-step tool use and code iteration.One commenter cautions that the change may not be live yet: the preserve thinking support is described as an open PR that has not been merged , while the model files reportedly show no update for 21 days . This suggests users should verify the tokenizer/chat-template files in the actual model repository before assuming the new behavior is available in released artifacts.Several comments frame the template change as increasing demand for a larger Gemma 4 124B MoE variant, arguing that preserve thinking would be more valuable when paired with a higher-capacity model for coding-agent use cases. The discussion is speculative, but technically centered on scaling the model size/MoE architecture to better exploit the updated chat-template behavior. Less Technical AI Subreddit Recap /r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo 1. Claude Fable 5/Mythos 5 Release and Access Tiers Activity: 2698 : Introducing Claude Fable 5 https://www.reddit.com/r/ClaudeCode/comments/1u1b207/introducing claude fable 5/ The image https://i.redd.it/tb8akxef4a6h1.png is a benchmark comparison table for the post’s claimed Claude Fable 5 / Claude Mythos 5 release, showing the highlighted model leading or near-leading across agentic coding, knowledge work, spatial reasoning, tool use, legal, biology, cybersecurity, and health benchmarks versus Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro. The selftext frames Fable 5 and Mythos 5 as the same underlying “Mythos-class” model, with Fable 5 using safety fallbacks: cybersecurity, biology/chemistry, and distillation-related requests are routed to Claude Opus 4.8, reportedly affecting under 5% of sessions. Comments are mostly hype or skepticism rather than technical analysis, including jokes like “AGI confirmed” and a complaint asking whether “Fable is getting dumber recently.”One commenter noted an apparent access/pricing constraint: Claude Fable 5 is free only until June 22 , after which users will reportedly need to purchase credits to continue using it. This is relevant for anyone evaluating the model because benchmark or workflow testing may need to be completed before the credit-gated period begins. Activity: 2387 : Claude Fable 5 feels less like a model launch and more like a preview of AI inequality https://www.reddit.com/r/ClaudeAI/comments/1u1fsdi/claude fable 5 feels less like a model launch and/ The post argues that Anthropic’s alleged Claude Fable 5 rollout represents a shift from a uniform public frontier-model release to a tiered access architecture: public paid users receive Fable 5 with safety routing that may downgrade requests involving cyber , bio , chemistry , or distillation to Opus 4.8, while selected partners purportedly get Mythos 5, described as the same underlying model with fewer safeguards. It also highlights pricing/capacity constraints: Fable 5 is said to be included in paid plans only until June 22 , then potentially moved to usage credits, implying frontier-agent inference remains too expensive for flat-rate consumer subscriptions. Comments split between concern over AI access inequality and acceptance of restrictive safety policies as necessary for high-risk capabilities. One commenter frames the outcome as predictable token-economics pressure toward expensive enterprise-grade models, while another defends a “rather safe than sorry” approach despite user friction.Several commenters framed the launch as an expected economics shift: as frontier models grow in capability and complexity, inference/token costs rise enough that top-tier models become enterprise-only tools rather than default consumer products. One commenter argued this will push everyday workloads toward cheaper local inference on hardware like Apple M-series chips or RTX Spark-class accelerators , reserving frontier APIs for high-value tasks.A pricing-focused thread claimed that the new model’s API economics make consumer subscriptions structurally mismatched with frontier usage: “Our $200 monthly sub is like 3 API prompts with the new model.” The implied technical point is that even high-end consumer plans may be viable only through heavy rate limits, model routing, or fallback to cheaper models such as Opus 4.8 , which one commenter described as sufficient for “ 99% ” of users. Keep reading with a 7-day free trial Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.