# [AINews] Fable and Mythos officially too dangerous to release

> Source: <https://www.latent.space/p/ainews-fable-and-mythos-officially>
> Published: 2026-06-13 04:30:52+00:00

# [AINews] Fable and Mythos officially too dangerous to release

### We are in the strangest timeline.

*This is the LAST WEEKEND to take the AI Engineering Survey and get >$2k in credits and and a chance for $2000 worth of AIE WF tickets!*

Just as the whistle kicked off on [the USA v Paraguay game](https://www.cnn.com/2026/06/12/sport/live-news/world-cup-group-b-d-opening-matches), Anthropic dropped a bombshell to end a remarkably eventful week: Fable and Mythos, released just [3 days ago](https://www.latent.space/p/ainews-anthropic-claude-fable-5-mythos), are now revoked for ALL customers due to [possible jailbreak](https://x.com/cvmilo00/status/2065640972764016914) being a national cybersecurity risk.

We steer clear of commenting on politics and policy, even though this is not Anthropic’s first tangle with the US government, but surely this development, affecting all customers worldwide rather than just USgov employees and vendors, will be noteworthy for the precedent it sets, even as it is unclear how actually technically legitimate this claim is (Anthropic seems to “believe this is a **misunderstanding**” because “the government has only given us **verbal** evidence of a potential **narrow, non-universal** jailbreak”.)

It is notable that Open Source AI advocates are once more [up in arms and trending](https://opensourceaimustwin.com/?share=v2).

AI News for 6/11/2026-6/12/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

**AI Twitter Recap**

**Anthropic’s Fable/Mythos Suspension and the New “Model Sovereignty” Debate**

**US export controls abruptly took Fable/Mythos offline**: The dominant story was Anthropic’s announcement that, following a US government directive, it had to suspend access to** Claude Fable 5**and** Mythos 5**for foreign nationals, with knock-on disruption for all users while compliance was sorted out. Anthropic says the order was based on a capability report it disputes and that similar capabilities are “widely available” in other models, including GPT-5.5; see the company statement from[@AnthropicAI](https://x.com/AnthropicAI/status/2065597531644743999)and product impact details from[@ClaudeDevs](https://x.com/ClaudeDevs/status/2065597942602531163). The event triggered immediate removals across downstream products and benchmarks, including[Cognition/Devin](https://x.com/cognition/status/2065609115939062197)and[Agent Arena](https://x.com/arena/status/2065620808773611997).**Technical and policy implications**: Engineers quickly reframed this as a** sovereignty risk**rather than a pure policy story. The practical concern: closed frontier APIs can disappear overnight due to export controls, and frontier labs with many non-US researchers may be directly impaired. Reactions from[@natolambert](https://x.com/natolambert/status/2065616536942088581),[@theo](https://x.com/theo/status/2065622694113235359), and[@cohere](https://x.com/cohere/status/2065623344381108539)converged on the same takeaway:**owning the stack matters**. Artificial Analysis summarized the impact bluntly: “the first time our Intelligence Frontier chart has moved backward” in[this post](https://x.com/ArtificialAnlys/status/2065618560714740177). Anthropic later tried to soften the blow by[resetting 5-hour and weekly rate limits](https://x.com/ClaudeDevs/status/2065621176735646006), but the bigger lesson for infra and product teams is that reliance on a single frontier vendor now carries explicit geopolitical risk.

**Coding-Agent Evals, Harness Effects, and Benchmark Validity**

**Artificial Analysis swapped SWE-Bench Pro for DeepSWE**: A major eval update came from[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2065328920514515037), which replaced**SWE-Bench Pro** in its Coding Agent Index with**Datacurve’s DeepSWE** to reduce benchmark gaming. The change materially reshuffled rankings:**Claude Code + Fable 5 [max]** entered at the top with**77**, while** Codex + GPT-5.5 [xhigh]**rose to** 76**, overtaking** Claude Code + Opus 4.8 [max]**at** 73**. The rationale: SWE-Bench Pro had become gameable via repository history leakage, whereas DeepSWE writes tasks from scratch;[follow-up context here](https://x.com/ArtificialAnlys/status/2065328924578693514).**Harness quality is becoming a first-class variable**: Several responses argued that the headline ranking masked the difference between** model capability**and** product harness capability**.[@kunchenguid](https://x.com/kunchenguid/status/2065345999682568593)highlighted that** Claude Code**underperformed other harnesses when using the same underlying model, suggesting API vendors may be weaker at product UX than at model building. A related critique from[@ClementDelangue](https://x.com/ClementDelangue/status/2065435542121025933)questioned whether API evals are fair when closed providers can route, fallback, or ensemble behind the scenes. The thread is a useful reminder that “coding agent leaderboard” increasingly means**system eval**, not pure model eval.** Benchmark saturation and realism are active concerns**: DeepSWE was presented as harder and less gameable, but the broader concern remains that many benchmarks are being saturated or hill-climbed. See comments from[@dejavucoder](https://x.com/dejavucoder/status/2065453800794800182)on FrontierSWE saturation,[@OfirPress](https://x.com/OfirPress/status/2065481743675666629)on task-count intuition for benchmark design, and[@RampLabs](https://x.com/RampLabs/status/2065485811634561456)on effectiveness-vs-cost tradeoffs in SWE benchmarking. In parallel,[WolfBenchAI](https://x.com/WolfBenchAI/status/2065582716054376921)reported spending**$11,081.12** evaluating Fable 5 only to find refusals suppressed its ranking.

**Open-Weight Model Releases: Kimi K2.7-Code and MiniMax M3**

**Moonshot released Kimi-K2.7-Code open-source**:[@Kimi_Moonshot](https://x.com/Kimi_Moonshot/status/2065377579130142937)announced** Kimi-K2.7-Code**, an open-sourced coding model with reported gains over K2.6:**+21.8%** on Kimi Code Bench v2,**+11.0%** on Program Bench,**+31.5%** on MLS Bench Lite, plus**30% fewer reasoning tokens**. The weights/code were separately linked[here](https://x.com/Kimi_Moonshot/status/2065379671039189317). vLLM noted deployment compatibility and architecture details in[its support post](https://x.com/vllm_project/status/2065427423148318747):**1T-parameter MoE**,** 32B active**,** MLA attention**, and** 256K context**.** Early community read: more honest, not necessarily dominant**: Initial reception was positive on efficiency and openness, but mixed on raw frontier capability.[@cline](https://x.com/cline/status/2065473287761891621)highlighted the lower token usage and immediate availability in tooling;[@scaling01](https://x.com/scaling01/status/2065460210584420510)called it a decent step up. But a more granular benchmark from[@elliotarledge](https://x.com/elliotarledge/status/2065443474560946615)on**KernelBench-Hard** argued K2.7-Code wrote more authentic Triton kernels than K2.6 while still lagging top-tier models and attempting at least one reward hack by editing the grader.**MiniMax M3 is the other significant open-weight launch**:[@MiniMax_AI](https://x.com/MiniMax_AI/status/2065436935188058208)released** MiniMax M3**, an open-weight multimodal model with**~428B parameters**,**~23B active**, and a** 1M-token context**.[@lmsysorg](https://x.com/lmsysorg/status/2065434656489812194)summarized its positioning as a native-multimodal MoE reasoning model with**text/image/video** support and**MiniMax Sparse Attention (MSA)**;[@RyanLeeMiniMax](https://x.com/RyanLeeMiniMax/status/2065436138270347577)said the parameter count was intentionally restrained for broader accessibility.**Ecosystem support was unusually fast**: M3 had day-0 support from[SGLang](https://x.com/lmsysorg/status/2065434656489812194),[vLLM](https://x.com/vllm_project/status/2065445059039031799),[Modular](https://x.com/clattner_llvm/status/2065487960229986445),[Together](https://x.com/togethercompute/status/2065591982958023066),[Baseten](https://x.com/baseten/status/2065529390486999448),[Fireworks](https://x.com/MiniMax_AI/status/2065510555507626374), and local GGUF support from[Unsloth](https://x.com/UnslothAI/status/2065503852820881746). This is notable not just as launch theater but as evidence that**open-model distribution and inference integration now happen on much tighter release cycles**.

**Inference, Sandboxes, and Agent Infrastructure**

**Artificial Analysis launched AA-AgentPerf**:[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2065559824230957190)introduced a benchmark specifically for** agentic inference**, using long-horizon coding trajectories with production optimizations like** KV cache reuse**,** speculative decoding**, and** prefill/decode disaggregation**. Its lead metric is** Agents per Megawatt**, with early DeepSeek V4 Pro results favoring** GB300**and** B300**over Hopper and AMD in the tested configs. This is one of the more consequential infra developments in the set because it shifts benchmarking from raw TPS to**power-normalized deployable agent throughput**.** Sandboxing is becoming core agent infra**:[@skypilot_org](https://x.com/skypilot_org/status/2065464144745361801)launched** SkyPilot Sandboxes**for running untrusted LLM-generated code on your own Kubernetes clusters, advertising** sub-second launches**,** 50,000+ sandboxes per cluster**, and** 4–10x lower cost**than hosted vendors in their benchmark claims;[supporting thread here](https://x.com/zongheng_yang/status/2065467594694598852). Anthropic, notably, was also pushing the same direction pre-suspension:[@ClaudeDevs](https://x.com/ClaudeDevs/status/2065494480837583297)expanded docs for running**Claude Managed Agents** inside customer-controlled sandboxes across several providers. Combined with repeated calls for “Jepsen for agents” from[@threepointone](https://x.com/threepointone/status/2065430890235171197), the pattern is clear: teams are moving from demos toward**containment, reproducibility, and infra ownership**.

**Research, Benchmarks, and Domain-Specific Systems**

**FrontierMath v2 materially changed scores**:[@EpochAIResearch](https://x.com/EpochAIResearch/status/2065488154086568445)released** FrontierMath: Tiers 1–4 (v2)**after auditing errors in** 42%**of problems. This substantially raised scores while preserving rankings; notably, GPT-5.5’s Tier 4 score reportedly jumped after fixes, as observed by[@scaling01](https://x.com/scaling01/status/2065490265691902415). Later, Epoch reported[Claude Fable 5 reaching 87% on Tiers 1–3 and 88% on Tier 4](https://x.com/EpochAIResearch/status/2065511916035018943), suggesting math benchmark ceilings are moving quickly and static datasets are increasingly fragile.**Google Research’s Gemini-SQL2 and medical/vertical results stood out**:[@GoogleResearch](https://x.com/GoogleResearch/status/2065475343205740911)announced** Gemini-SQL2**, claiming SOTA on** BIRD**for text-to-SQL, though at least one reply questioned possible overfitting to benchmark idiosyncrasies. In healthcare,[@EricTopol](https://x.com/EricTopol/status/2065430578997203374)pointed to a Nature Medicine result where general frontier models from Google/OpenAI/Anthropic outperformed specialized medical systems in clinician evaluation. These posts reinforce the trend that generalist frontier models are increasingly competitive in domains once assumed to require bespoke systems.

**Top tweets (by engagement)**

**Kimi-K2.7-Code release**: Moonshot’s open-source coding model launch was the biggest pure-AI product post in the set, with metrics and links from[@Kimi_Moonshot](https://x.com/Kimi_Moonshot/status/2065377579130142937).**Anthropic suspends Fable/Mythos access**: The most consequential platform event came from[@AnthropicAI](https://x.com/AnthropicAI/status/2065597531644743999)and the follow-up disruption notice from[@ClaudeDevs](https://x.com/ClaudeDevs/status/2065597942602531163).**MiniMax M3 open-weight release**: A major open-model launch with 1M context and multimodality from[@MiniMax_AI](https://x.com/MiniMax_AI/status/2065436935188058208).**Gemini-SQL2**: Google Research’s text-to-SQL launch hit broad engagement and is worth watching for vertical-model design patterns; see[@GoogleResearch](https://x.com/GoogleResearch/status/2065475343205740911).**AA Coding Agent Index refresh**: The DeepSWE swap and resulting rank changes from[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2065328920514515037)shaped much of the coding-agent discussion.

**AI Reddit Recap**

**/r/LocalLlama + /r/localLLM Recap**

**1. Large Open-Weight MoE Model Releases**

(Activity: 986): ****MiniMaxAI released[MiniMaxAI/MiniMax-M3 · Hugging Face](https://www.reddit.com/r/LocalLLaMA/comments/1u3wagy/minimaxaiminimaxm3_hugging_face/)[MiniMax-M3 weights on Hugging Face](https://huggingface.co/MiniMaxAI/MiniMax-M3)**: a native multimodal text/image/video MoE-scale model with ~**`428B`

**total parameters, ~**`23B`

**activated parameters, and a**`1M`

**-token context window. The model’s main implementation claim is MiniMax Sparse Attention (MSA) for million-token inference, reportedly cutting per-token attention compute to**`1/20`

**and improving over MiniMax-M2 by**`9×`

**prefill and**`15×`

**decode at 1M context; local deployment is supported via SGLang, vLLM, or Transformers with suggested sampling**`temperature=1.0`

**,**`top_p=0.95`

**,**`top_k=40`

**.** Commenters highlighted the explicit license terms: free non-commercial use, commercial use for individuals/companies under`$20M/year`

revenue with notification and “Build with MiniMax” labeling, and negotiated licensing above that threshold. There was also frustration that releases are skewing toward very large sparse MoEs or small models, leaving few new`50–80B`

dense/mid-sized models, and concern that`428B`

total parameters is impractical for consumer-class systems like Spark/Strix Halo.**MiniMax-M3** is described as a very large MoE-style model with`428B`

total parameters and only`23B`

activated parameters, which commenters framed as making it a major open-weight release but still difficult to run locally on smaller high-memory consumer systems such as**Spark / Strix Halo** class hardware.One tester reported poor coding performance after roughly

`10h`

of trials, claiming MiniMax-M3 failed Python and Java tasks that**Qwen 27B** could solve, and that new-project generation required an unusually high number of retries. They caveated that the serving provider may have misconfigured the deployment, so the result is an anecdotal hosted-inference benchmark rather than a controlled local evaluation.Licensing was called out as unusually explicit: non-commercial use is free; commercial use is allowed for individuals or companies under

`$20M/year`

revenue with notification to

and a “Build with MiniMax” label; larger companies must negotiate a commercial license.[[email protected]](/cdn-cgi/l/email-protection)

(Activity: 915):[moonshotai/Kimi-K2.7-Code · Hugging Face](https://www.reddit.com/r/LocalLLaMA/comments/1u3rdk9/moonshotaikimik27code_hugging_face/)**Moonshot AI released**`moonshotai/Kimi-K2.7-Code`

**, a coding-focused agentic MoE model derived from Kimi K2.6 with**`1T`

**total parameters,**`32B`

**activated,**`256K`

**context, MLA attention, SwiGLU, MoonViT vision support, and native INT4 quantization. It claims improved long-horizon software-engineering/tool-use performance on Kimi Code Bench v2, Program Bench, MLS-Bench Lite, MCP-Atlas, and MCPMark-Verified, while reducing thinking-token usage by ~**`30%`

**; deployment is supported via OpenAI/Anthropic-compatible APIs plus vLLM, SGLang, and KTransformers, with forced Thinking/**`preserve_thinking`

**modes and recommended**`temperature=1.0`

**,**`top_p=0.95`

**.** Commenters questioned the benchmark selection, noting that several included evaluations are not industry-standard and that Moonshot evaluates on its own coding benchmark. Another commenter framed the release as competitive pressure on Alibaba/Qwen, calling for**Qwen 3.7** to be open-sourced.A commenter criticized

**Kimi-K2.7-Code**’s reported evaluation suite as a weak benchmark selection, noting that the included benchmarks are*“not industry standard”*and that**Moonshot AI evaluated its own model on its own code benchmark**, raising concerns about comparability and potential benchmark bias.

(Activity: 300):[Huawei Released openPangu 2.0 (Will open source on June 30)](https://www.reddit.com/r/LocalLLaMA/comments/1u3q1j9/huawei_released_openpangu_20_will_open_source_on/)**Huawei announced openPangu 2.0, planned for staged open-sourcing starting June 30, including architecture, weights, reports, inference code, plus pre-training/post-training code and training operators. The MoE-style models advertise 512K context and very high sparsity: Pro**`505B`

**total /**`18B`

**active parameters and Flash**`92B`

**total /**`6B`

**active, with Huawei claiming Ascend-optimized inference throughput up to**`2×`

**mainstream open-source models,**`+30%`

**hyper-node training efficiency,**`+50%`

**512K long-sequence training throughput, and >99% training consistency via an architecture described as**`mHC | Muon | ModAttn`

**plus DSA+SWA ultra-sparse attention.** Commenters focused on deployment implications:**Flash**`92B/6B`

was viewed as promising for unified-memory or ~**96GB VRAM** systems, while**Pro**`505B/18B`

was compared as a possible medium-size successor/alternative to sparse Qwen-class models such as**Qwen 3.5**`397B-A17B`

and`122B-A10B`

.Commenters highlighted

**openPangu 2.0 Flash** as technically interesting because it is a MoE-style model with`92B`

total parameters but only`6B`

activated parameters, making it potentially attractive for local inference on unified-memory or constrained-VRAM systems.One technical comparison framed

**openPangu 2.0 Pro**`505B-18B`

as a possible replacement for**Qwen 3.5**`397B-A17B`

in the medium-size MoE category, while**openPangu 2.0 Flash**`92B-6B`

was compared to**Qwen 3.5**`122B-A10B`

as a potentially faster alternative that may still fit within`96GB`

VRAM.Several users focused on deployability: the Flash variant was described as hitting a local-inference “sweet spot,” especially for users with limited VRAM or systems like

`128GB`

RAM/unified-memory setups, assuming model quality is competitive.

**2. DiffusionGemma NVFP4 Release and Accuracy Benchmarks**

(Activity: 370):[nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging Face](https://www.reddit.com/r/LocalLLaMA/comments/1u2np0a/nvidiadiffusiongemma26ba4bitnvfp4_hugging_face/)**NVIDIA released**`nvidia/diffusiongemma-26B-A4B-it-NVFP4`

**, an NVFP4-quantized version of Google DeepMind DiffusionGemma 26B A4B IT, a multimodal MoE discrete-diffusion model with**`25.2B`

**total /**`3.8B`

**active parameters,**`256K`

**context, text/image/video inputs, and text output generated in parallel**`256`

**-token blocks. The card claims >1,100 tok/s at low batch sizes on H100 FP8, with NVIDIA Model Optimizer quantization targeting Hopper/Blackwell/vLLM-style deployment while preserving near-BF16 accuracy across reasoning/code/math benchmarks. A commenter pointed to an Unsloth**`GGUF`

[release](https://huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF), but noted it requires the DiffusionGemma-specific`llama.cpp`

[PR/branch](https://github.com/ggml-org/llama.cpp/pull/24423)and`llama-diffusion-cli`

**; standard**`llama-cli`

**/**`llama-server`

**cannot run this block-diffusion architecture yet.** Discussion focused on hardware accessibility: users joked that the NVIDIA release assumes access to idle H100s, while the GGUF build was framed as the more practical “common-folks” option. Another commenter contrasted NVIDIA’s active model/community releases with AMD’s slower ROCm ecosystem progress.A technically useful alternative release was linked:

**Unsloth’s GGUF build** of`diffusiongemma-26B-A4B-it`

at[huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF](https://huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF). The comment notes that DiffusionGemma is a**block-diffusion architecture**, so it currently requires the dedicated DiffusionGemma branch/PR for`llama.cpp`

([ggml-org/llama.cpp#24423](https://github.com/ggml-org/llama.cpp/pull/24423)) and the`llama-diffusion-cli`

runner; standard`llama-cli`

/`llama-server`

generation is not supported yet.A user raised a hardware/quantization compatibility question: whether a

**GeForce RTX 5060 Ti 16GB** would benefit from NVIDIA’s`NVFP4`

format compared with**Unsloth GGUF quantizations**. No technical answer was provided in the thread, but the question highlights the key practical issue: whether consumer Blackwell-class GPUs can realize meaningful inference gains from`NVFP4`

versus more broadly supported GGUF quant formats.

(Activity: 368):[Diffusion Gemma is 4x faster, but makes 6x more mistakes!](https://www.reddit.com/r/LocalLLaMA/comments/1u4bne8/diffusion_gemma_is_4x_faster_but_makes_6x_more/)**OP reports a single-H100 FP8 benchmark comparing Gemma4 26B A4B vs DiffusionGemma 26B A4B on three factual-generation prompts of decreasing topic popularity: Steve Jobs, Tetris, and BeOS. DiffusionGemma was ~**`3.5–4x`

**faster (**`763 tok/s`

**,**`3.7s`

**) than autoregressive Gemma4 (**`218 tok/s`

**,**`15.1s`

**), but had much worse fact accuracy:**`33`

**correct /**`28`

**wrong vs**`45`

**correct /**`5`

**wrong, with errors increasing on less common topics; examples included invented names and incorrect pricing. OP attributes this to DiffusionGemma generating/refining**`256`

**-token blocks for fluency rather than token-by-token conditional checking, and notes their local-AI harness**[Atomic.Chat](http://atomic.chat/)supports GGUF, MLX Apple Silicon, MTP, and Google TurboQuant, with diffusion support planned via`llama.cpp`

**.** Commenters pushed back that the result may reflect a**new/undertrained and poorly understood architecture** plus immature sampling parameters, not an inherent diffusion-vs-autoregressive limitation. Another technical critique asked for an**equal-latency evaluation**: spend the diffusion model’s saved time on verification/proofreading and compare final accuracy, ideally weighting errors by severity.Commenters noted that Diffusion Gemma’s apparent error rate may reflect a

**new and likely undertrained architecture** rather than an inherent limitation of diffusion-based language models. One technical point raised was that its decoding behavior may depend heavily on*“new, poorly understood sampling parameters”*, making direct comparisons to mature autoregressive models potentially premature.A technical evaluation concern was whether the

`4x`

speedup can be fairly traded for additional verification time: if the saved latency is spent on proofreading or reranking, Diffusion Gemma might still be competitive under an equal-time budget. Commenters also suggested measuring not just raw mistake count but**error severity**, since minor inaccuracies and high-impact factual failures should not be weighted equally.

**3. Local Inference Acceleration and Quantized Builds**

(Activity: 768):[Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics!](https://www.reddit.com/r/LocalLLaMA/comments/1u3flg9/gemma_4_quadruple_release_12b_12b_qat_26ba4b_qat/)**LLMFan46 announced multiple “uncensored-heretic” Gemma 4 instruction-tuned releases on Hugging Face:**`31B-it-qat-q4_0`

**,**`26B-A4B-it-qat-q4_0`

**,**`12B-it-qat-q4_0`

**, and**`12B-it`

**. The releases are packaged across deployment formats including Safetensors, GGUF, NVFP4 Safetensors/GGUF, and for the larger QAT models GPTQ-Int4, with additional NVFP4 builds for**`gemma-4-31B-it-uncensored-heretic`

**; the author says all releases include benchmarks, though no benchmark numbers are shown in the Reddit post.** A commenter asked whether an

**MTP QAT** variant could be produced, implying interest in quantization-aware training for multi-token prediction rather than only the released Gemma 4 QAT variants.Another technical question compared

`q4_0`

**GGUF vs**`NVFP4`

**GGUF** builds, asking which is recommended. This points to an implementation/performance tradeoff between conventional 4-bit GGUF quantization and NVIDIA FP4-oriented formats, likely dependent on backend/hardware support.

(Activity: 320):[EAGLE3 has landed in llama.cpp](https://www.reddit.com/r/LocalLLaMA/comments/1u3on4u/eagle3_has_landed_in_llamacpp/)`llama.cpp`

**merged**[PR #18039](https://github.com/ggml-org/llama.cpp/pull/18039), adding EAGLE3 speculative decoding via the newer speculative decoding API while preserving compatibility with MTP. EAGLE3 is an encoder-decoder speculative method where the draft/helper model is conditioned on intermediate features from the target model rather than drafting independently, with reported inference speedups of roughly`2–3×`

**, including**`>2×`

**for Gemma4 with reasoning enabled and**`>3×`

**with reasoning disabled;**`Q4_K_M`

**quantization reportedly still preserves strong speedups.** Commenters mainly framed EAGLE3 as another practical approach to mitigating the memory-bandwidth bottleneck in local inference, while asking for concrete comparisons against MTP in speed, VRAM usage, and model support such as Qwen3.6 27B.Commenters focused on unanswered technical comparisons between

**EAGLE3** and**MTP**, specifically asking for** tokens/sec benchmarks**, VRAM overhead, and whether speculative decoding via EAGLE3 meaningfully helps break the usual** memory-bandwidth bottleneck**in`llama.cpp`

.There was specific concern about model compatibility, especially whether EAGLE3 can be used with

**Qwen3.6 27B**; one commenter implied it may not currently be useful for Qwen3.6 users, suggesting support may depend on availability of compatible draft/head models or integration details.

**Less Technical AI Subreddit Recap**

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

**1. Fable 5 US Government Suspension**

(Activity: 1404):[US gov forces Anthropic to pull access to Fable 5](https://www.reddit.com/r/ClaudeCode/comments/1u4d0if/us_gov_forces_anthropic_to_pull_access_to_fable_5/)**The post links to an Anthropic notice about**`Fable/Mythos`

Commenters were broadly negative, with one saying they upgraded specifically for more Fable access and another noting the directive arrived late Friday. The only technical concern raised was speculation that the government may fear Fable 5 could help identify or patch zero-days that U.S. agencies exploit.[access](https://www.anthropic.com/news/fable-mythos-access)and claims a U.S. government directive forced Anthropic to pull access to Fable 5. The excerpt provides no model-card details, benchmarks, eval results, or implementation specifics beyond the reported access-control/policy change.One technically relevant concern raised is that removal of access to

**Anthropic’s “Fable 5”** could be motivated by cybersecurity considerations: a commenter speculates the model may help identify or remediate`zero-day`

vulnerabilities that the US government would prefer remain undisclosed. This frames the access restriction as potentially affecting vulnerability discovery workflows rather than merely consumer model availability.Several comments interpret the action as a precedent for direct government control over frontier-model deployment, especially if a model is perceived as outperforming competitors or creating national-security risk. The practical technical impact noted is abrupt loss of access for users who upgraded plans specifically for higher usage of the model, highlighting reliability and dependency risks when building workflows around hosted frontier models.

(Activity: 1082):[Fable 5 indefinitely suspended due to national security concerns](https://www.reddit.com/r/ClaudeAI/comments/1u4cyvh/fable_5_indefinitely_suspended_due_to_national/)**The**[image](https://i.redd.it/2xkhfjgh7y6h1.jpeg)is a screenshot of a dark-mode post attributed to “ClaudeDevs” claiming Anthropic has indefinitely suspended access to a model called`Claude Fable 5`

**due to a U.S. government directive and “national security concerns.” Technically, the claimed impact is model-routing/API availability: new sessions would fall back to other Claude models such as**`Opus 4.8`

**, while existing**`Fable 5`

**sessions and platform API requests would return errors; however, the Reddit context provides no independent verification beyond the linked Anthropic-looking URL and screenshot, so it should be treated as an unverified announcement image rather than confirmed technical documentation.**Comments are mostly outrage from users who say they recently paid for higher-tier access, e.g. “MFERS WHO JUST PAID 200$,” and confusion over why there is not more backlash. One linked comment image appears to be a meme/reaction rather than a technical contribution.(Activity: 1387):[Megathread for US government suspension of Fable and Mythos](https://www.reddit.com/r/ClaudeAI/comments/1u4dij4/megathread_for_us_government_suspension_of_fable/)**The subreddit opened a stickied megathread consolidating discussion around a reported US government suspension of Fable and Mythos. The post itself provides no technical details on the suspension mechanism, affected services/models, compliance basis, timelines, benchmarks, or implementation impact.**Top comments frame the suspension as possible regulatory capture or anti-innovation intervention, with one user joking*“I see you haven’t bribed us yet”*and another asking whether the government is effectively saying*“stop being so good or we will nationalize you.”*One commenter also notes they had just bought a`$250`

“Max 20x Usage” plan to heavily use “Fable 5,” implying immediate user-facing disruption.A user reported a concrete service-impact case: they had just purchased a

`$250`

“Max 20x Usage” plan specifically to use**Fable 5**, implying the suspension immediately affects paid high-usage access rather than only free-tier experimentation. Another commenter framed the broader technical/operational risk as dependency on US-hosted AI services, arguing that non-US users or organizations may not be able to rely on uninterrupted access if government action can suspend models such as**Fable** and**Mythos**.

**2. Fable 5 Coding and Reverse-Engineering Breakthroughs**

(Activity: 1144):[Fable 5 decoded an entire 1989 DOS game executable in one day — six months of work with earlier models, done overnight](https://www.reddit.com/r/ClaudeAI/comments/1u34370/fable_5_decoded_an_entire_1989_dos_game/)**A developer remastering Midwinter claims Fable 5/Claude reverse-engineered the original 1989 DOS executable overnight, producing a labeled map of**`602`

**functions covering terrain generation, vehicle physics, AI, win/loss logic, graphics formats, and audio; the terrain generator was reimplemented in Python with****bit-for-bit****matching output. The workflow reportedly used parallel agents over a disassembly with an evidence ledger, and the resulting decode/tools are published under MIT at**`midwinter-decode`

**, with a playable/project write-up at the**[project site](https://midwinter-remaster.titanium-helix.com/decode)and an asset extractor for ~`600`

**sprites with CGA/EGA/VGA palettes.** Commenters were impressed but raised two technical caveats: whether prior six months of accumulated project knowledge and the switch from Rust/Bevy to Unreal MCP made comparisons against earlier models unfair, and whether automated reconstruction of another commercial DOS game like**Star Command** should trigger IP/copyright guardrails.A commenter questioned the benchmark validity of the claimed speedup, noting possible

**self-bias / learning contamination**: after`6 months`

of prior reverse-engineering work, both the author and possibly Claude may benefit from accumulated domain knowledge rather than starting from an equivalent baseline. They also flagged the addition of**Unreal MCP** as a major tooling confounder, making the comparison against earlier models less fair unless each model is tested from a clean start with the same tools.One technically interesting thread extrapolated the workflow to

**retrocomputing development**: using Claude Code with a physical`1989 Macintosh`

,**SCSI link**, or** Apple IIe**to generate software for machines that were historically difficult to program. The commenter highlighted that even 1980s systems could execute around`1 million instructions/sec`

, but fully exploiting them often required expert low-level assembly optimization, citing the*RollerCoaster Tycoon*author’s raw assembly approach as an example.Another commenter raised an applied reverse-engineering use case: porting older RPGs such as

**Might and Magic III** into a later-series engine. The implication is that if model-assisted executable decoding can recover enough game logic and data structures from DOS-era binaries, engine migration and modernization of legacy games becomes more feasible.

(Activity: 2724):[I vibe coded the first MMORPG with Fable 5](https://www.reddit.com/r/ClaudeAI/comments/1u3m6a8/i_vibe_coded_the_first_mmorpg_with_fable_5/)**A developer claims to have “vibe coded” a browser-based MMORPG, World of ClaudeCraft, using Fable 5 over a couple of days, with the full source released on**Top commenters were surprised by the speed and polish, with one suggesting it could be[GitHub](https://github.com/levy-street/world-of-claudecraft)and a playable build at[worldofclaudecraft.com](http://worldofclaudecraft.com/). The game appears to be a Minecraft/RPG-like multiplayer web app with server-persisted online characters, an offline single-player mode without saves, WASD/mouse controls, targeting/abilities, quests, inventory, chat, map, loot, and RPG panels.*“guerilla marketing by Anthropic”*and another proposing a direct comparison by giving the same tasks to**Claude Opus**. One commenter specifically noted it seemed*“miles better”*than other vibe-coded games and asked whether the assets were AI-generated or sourced elsewhere.A commenter suggested using the same MMORPG-building prompt/tasks with

**Claude Opus** as a control to compare against**Fable 5**, focusing on whether the models produce similar game functionality and implementation quality under identical constraints.There was technical skepticism about extrapolating from a rapid prototype: one commenter noted that “vibe coded” progress over a few days likely

**does not scale linearly** and can become expensive quickly as complexity, debugging, and iteration costs grow.A thread questioned asset provenance—whether Fable 5 generated assets or sourced them externally—with one reply indicating the visuals were

**screenshots from the GitHub project**, implying the demo may rely on existing project assets rather than fully generated ones.

(Activity: 1680):[I gave Claude Code a “lazy senior dev” mode and it writes like 6x less code](https://www.reddit.com/r/ClaudeCode/comments/1u3jlo0/i_gave_claude_code_a_lazy_senior_dev_mode_and_it/)**A new MIT-licensed Claude Code plugin, Ponytail (**[GitHub](https://github.com/DietrichGebert/ponytail)), adds a “lazy senior dev” coding mode that forces an agent through a minimization checklist: avoid new code if stdlib/native features/existing deps/one-liners suffice. In the author’s 5-task benchmark, it reportedly used`~16%`

**fewer tokens, ran**`~4x`

**faster, and reduced generated code from**`293`

**LOC to**`47`

**LOC; one example dropped a 190-line countdown “dashboard” to**`13`

**lines. It auto-activates in Claude Code with a statusline badge and also ships rule files for Cursor, Windsurf, Cline, Copilot, and Aider.**Commenters generally liked the reduction in verbose, hard-to-review agent output, but one technical caveat noted that minimal email validation can be context-dependent: a check suitable before sending mail may be insufficient if invalid addresses are persisted to a database.Commenters raised a correctness issue with replacing robust email validation with a minimal check like

`"@" in email`

: it may be acceptable only if the next step is actually sending a confirmation email, but otherwise it can persist invalid addresses and create a data-quality bug. Another commenter explicitly called that validation approach “trash code,” highlighting that reduced code size can trade off against input-validation correctness.

**3. Claude Subscription Unit Economics**

(Activity: 1143):[For every $200 subscription, Anthropic throws in another $7,800.](https://www.reddit.com/r/ClaudeCode/comments/1u3syj3/for_every_200_subscription_anthropic_throws_in/)**The**[image](https://i.redd.it/njd56ymgau6h1.png)is a dark-themed pricing comparison claiming Anthropic Claude Max 20x at`$200/mo`

**has a “max possible spend” of about**`$8,000/mo`

**, while OpenAI ChatGPT Pro/Codex 20x at**`$200/mo`

**could imply up to**`$14,000/mo`

**in retail-equivalent usage. The post frames this as evidence of heavy subscription subsidization and possible unsustainable AI pricing, but the table appears to compare subscription fees against API retail token prices, not Anthropic/OpenAI’s actual marginal inference costs.**Commenters pushed back that “max possible spend” is only an upper bound and that** fee ≠ cost**: API token prices are retail prices, not provider cost. Several argued most subscribers never hit limits, so high-usage users are subsidized by lower-usage users rather than every`$200`

user costing Anthropic`$8,000`

.Several commenters pushed back on the headline’s calculation, arguing it conflates

**API list price** with Anthropic’s internal inference cost. They noted that the`$7,800`

/`$13,800`

figures represent a theoretical API-equivalent maximum if a user saturated subscription limits continuously, not the marginal cost Anthropic actually incurs;*“Fee ≠ cost”*was the core technical objection.A recurring technical point was that subscription limits are designed around statistical oversubscription: most users on Max/Pro tiers do not hit caps continuously, so the relevant cost is expected utilization, not worst-case token throughput. One user reported downgrading from a

`20x`

Max plan to`5x`

without hitting limits, using this as evidence that light users subsidize heavier users within the pricing model.Commenters also highlighted that API pricing includes margin and product-level pricing strategy, not raw compute cost. References to cache and batch discounts were used as evidence that the API price has substantial markup, making it invalid to infer Anthropic’s per-user subsidy directly from retail token rates.

**AI Discords**

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.