We are in the strangest timeline.
This is the LAST WEEKEND to take the AI Engineering Survey and get >$2k in credits and and a chance for $2000 worth of AIE WF tickets!
Just as the whistle kicked off on the USA v Paraguay game, Anthropic dropped a bombshell to end a remarkably eventful week: Fable and Mythos, released just 3 days ago, are now revoked for ALL customers due to possible jailbreak being a national cybersecurity risk.
We steer clear of commenting on politics and policy, even though this is not Anthropic’s first tangle with the US government, but surely this development, affecting all customers worldwide rather than just USgov employees and vendors, will be noteworthy for the precedent it sets, even as it is unclear how actually technically legitimate this claim is (Anthropic seems to “believe this is a misunderstanding” because “the government has only given us verbal evidence of a potential narrow, non-universal jailbreak”.)
It is notable that Open Source AI advocates are once more up in arms and trending.
AI News for 6/11/2026-6/12/2026. We checked 12 subreddits,
[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!
AI Twitter Recap
Anthropic’s Fable/Mythos Suspension and the New “Model Sovereignty” Debate
US export controls abruptly took Fable/Mythos offline: The dominant story was Anthropic’s announcement that, following a US government directive, it had to suspend access to** Claude Fable 5and Mythos 5for foreign nationals, with knock-on disruption for all users while compliance was sorted out. Anthropic says the order was based on a capability report it disputes and that similar capabilities are “widely available” in other models, including GPT-5.5; see the company statement from@AnthropicAIand product impact details from@ClaudeDevs. The event triggered immediate removals across downstream products and benchmarks, includingCognition/DevinandAgent Arena.Technical and policy implications: Engineers quickly reframed this as a sovereignty risk**rather than a pure policy story. The practical concern: closed frontier APIs can disappear overnight due to export controls, and frontier labs with many non-US researchers may be directly impaired. Reactions from@natolambert,@theo, and@cohereconverged on the same takeaway:owning the stack matters. Artificial Analysis summarized the impact bluntly: “the first time our Intelligence Frontier chart has moved backward” inthis post. Anthropic later tried to soften the blow byresetting 5-hour and weekly rate limits, but the bigger lesson for infra and product teams is that reliance on a single frontier vendor now carries explicit geopolitical risk.
Coding-Agent Evals, Harness Effects, and Benchmark Validity
Artificial Analysis swapped SWE-Bench Pro for DeepSWE: A major eval update came from@ArtificialAnlys, which replacedSWE-Bench Pro in its Coding Agent Index withDatacurve’s DeepSWE to reduce benchmark gaming. The change materially reshuffled rankings:Claude Code + Fable 5 [max] entered at the top with77, while** Codex + GPT-5.5 [xhigh]rose to 76**, overtaking** Claude Code + Opus 4.8 [max]at 73**. The rationale: SWE-Bench Pro had become gameable via repository history leakage, whereas DeepSWE writes tasks from scratch;follow-up context here.Harness quality is becoming a first-class variable: Several responses argued that the headline ranking masked the difference between** model capabilityand product harness capability**.@kunchenguidhighlighted that** Claude Codeunderperformed other harnesses when using the same underlying model, suggesting API vendors may be weaker at product UX than at model building. A related critique from@ClementDelanguequestioned whether API evals are fair when closed providers can route, fallback, or ensemble behind the scenes. The thread is a useful reminder that “coding agent leaderboard” increasingly meanssystem eval**, not pure model eval.** Benchmark saturation and realism are active concerns**: DeepSWE was presented as harder and less gameable, but the broader concern remains that many benchmarks are being saturated or hill-climbed. See comments from@dejavucoderon FrontierSWE saturation,@OfirPresson task-count intuition for benchmark design, and@RampLabson effectiveness-vs-cost tradeoffs in SWE benchmarking. In parallel,WolfBenchAIreported spending**$11,081.12** evaluating Fable 5 only to find refusals suppressed its ranking.
Open-Weight Model Releases: Kimi K2.7-Code and MiniMax M3
Moonshot released Kimi-K2.7-Code open-source:@Kimi_Moonshotannounced** Kimi-K2.7-Code**, an open-sourced coding model with reported gains over K2.6:+21.8% on Kimi Code Bench v2,+11.0% on Program Bench,+31.5% on MLS Bench Lite, plus30% fewer reasoning tokens. The weights/code were separately linkedhere. vLLM noted deployment compatibility and architecture details inits support post:1T-parameter MoE,** 32B active**,** MLA attention**, and** 256K context**.** Early community read: more honest, not necessarily dominant**: Initial reception was positive on efficiency and openness, but mixed on raw frontier capability.@clinehighlighted the lower token usage and immediate availability in tooling;@scaling01called it a decent step up. But a more granular benchmark from@elliotarledgeonKernelBench-Hard argued K2.7-Code wrote more authentic Triton kernels than K2.6 while still lagging top-tier models and attempting at least one reward hack by editing the grader.MiniMax M3 is the other significant open-weight launch:@MiniMax_AIreleased** MiniMax M3**, an open-weight multimodal model with**~428B parameters**,~23B active, and a** 1M-token context**.@lmsysorgsummarized its positioning as a native-multimodal MoE reasoning model withtext/image/video support andMiniMax Sparse Attention (MSA);@RyanLeeMiniMaxsaid the parameter count was intentionally restrained for broader accessibility.Ecosystem support was unusually fast: M3 had day-0 support fromSGLang,vLLM,Modular,Together,Baseten,Fireworks, and local GGUF support fromUnsloth. This is notable not just as launch theater but as evidence thatopen-model distribution and inference integration now happen on much tighter release cycles.
Inference, Sandboxes, and Agent Infrastructure
Artificial Analysis launched AA-AgentPerf:@ArtificialAnlysintroduced a benchmark specifically for** agentic inference**, using long-horizon coding trajectories with production optimizations like** KV cache reuse**,** speculative decoding**, and** prefill/decode disaggregation**. Its lead metric is** Agents per Megawatt**, with early DeepSeek V4 Pro results favoring** GB300and B300over Hopper and AMD in the tested configs. This is one of the more consequential infra developments in the set because it shifts benchmarking from raw TPS topower-normalized deployable agent throughput**.** Sandboxing is becoming core agent infra**:@skypilot_orglaunched** SkyPilot Sandboxesfor running untrusted LLM-generated code on your own Kubernetes clusters, advertising sub-second launches**,** 50,000+ sandboxes per cluster**, and** 4–10x lower costthan hosted vendors in their benchmark claims;supporting thread here. Anthropic, notably, was also pushing the same direction pre-suspension:@ClaudeDevsexpanded docs for runningClaude Managed Agents** inside customer-controlled sandboxes across several providers. Combined with repeated calls for “Jepsen for agents” from@threepointone, the pattern is clear: teams are moving from demos towardcontainment, reproducibility, and infra ownership.
Research, Benchmarks, and Domain-Specific Systems
FrontierMath v2 materially changed scores:@EpochAIResearchreleased** FrontierMath: Tiers 1–4 (v2)after auditing errors in 42%of problems. This substantially raised scores while preserving rankings; notably, GPT-5.5’s Tier 4 score reportedly jumped after fixes, as observed by@scaling01. Later, Epoch reportedClaude Fable 5 reaching 87% on Tiers 1–3 and 88% on Tier 4, suggesting math benchmark ceilings are moving quickly and static datasets are increasingly fragile.Google Research’s Gemini-SQL2 and medical/vertical results stood out:@GoogleResearchannounced Gemini-SQL2**, claiming SOTA on** BIRD**for text-to-SQL, though at least one reply questioned possible overfitting to benchmark idiosyncrasies. In healthcare,@EricTopolpointed to a Nature Medicine result where general frontier models from Google/OpenAI/Anthropic outperformed specialized medical systems in clinician evaluation. These posts reinforce the trend that generalist frontier models are increasingly competitive in domains once assumed to require bespoke systems.
Top tweets (by engagement) Kimi-K2.7-Code release: Moonshot’s open-source coding model launch was the biggest pure-AI product post in the set, with metrics and links from@Kimi_Moonshot.Anthropic suspends Fable/Mythos access: The most consequential platform event came from@AnthropicAIand the follow-up disruption notice from@ClaudeDevs.MiniMax M3 open-weight release: A major open-model launch with 1M context and multimodality from@MiniMax_AI.Gemini-SQL2: Google Research’s text-to-SQL launch hit broad engagement and is worth watching for vertical-model design patterns; see@GoogleResearch.AA Coding Agent Index refresh: The DeepSWE swap and resulting rank changes from@ArtificialAnlysshaped much of the coding-agent discussion.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Large Open-Weight MoE Model Releases
(Activity: 986): ****MiniMaxAI releasedMiniMaxAI/MiniMax-M3 · Hugging FaceMiniMax-M3 weights on Hugging Face: a native multimodal text/image/video MoE-scale model with ~428B
total parameters, ~23B
activated parameters, and a1M
-token context window. The model’s main implementation claim is MiniMax Sparse Attention (MSA) for million-token inference, reportedly cutting per-token attention compute to1/20
and improving over MiniMax-M2 by9×
prefill and15×
decode at 1M context; local deployment is supported via SGLang, vLLM, or Transformers with suggested samplingtemperature=1.0
,top_p=0.95
,top_k=40
. Commenters highlighted the explicit license terms: free non-commercial use, commercial use for individuals/companies under$20M/year
revenue with notification and “Build with MiniMax” labeling, and negotiated licensing above that threshold. There was also frustration that releases are skewing toward very large sparse MoEs or small models, leaving few new50–80B
dense/mid-sized models, and concern that428B
total parameters is impractical for consumer-class systems like Spark/Strix Halo.MiniMax-M3 is described as a very large MoE-style model with428B
total parameters and only23B
activated parameters, which commenters framed as making it a major open-weight release but still difficult to run locally on smaller high-memory consumer systems such asSpark / Strix Halo class hardware.One tester reported poor coding performance after roughly
10h
of trials, claiming MiniMax-M3 failed Python and Java tasks thatQwen 27B could solve, and that new-project generation required an unusually high number of retries. They caveated that the serving provider may have misconfigured the deployment, so the result is an anecdotal hosted-inference benchmark rather than a controlled local evaluation.Licensing was called out as unusually explicit: non-commercial use is free; commercial use is allowed for individuals or companies under
$20M/year
revenue with notification to
and a “Build with MiniMax” label; larger companies must negotiate a commercial license.[[email protected]](/cdn-cgi/l/email-protection)
(Activity: 915):[moonshotai/Kimi-K2.7-Code · Hugging Face](https://www.reddit.com/r/LocalLLaMA/comments/1u3rdk9/moonshotaikimik27code_hugging_face/)**Moonshot AI released**`moonshotai/Kimi-K2.7-Code`
, a coding-focused agentic MoE model derived from Kimi K2.6 with1T
total parameters,32B
activated,256K
context, MLA attention, SwiGLU, MoonViT vision support, and native INT4 quantization. It claims improved long-horizon software-engineering/tool-use performance on Kimi Code Bench v2, Program Bench, MLS-Bench Lite, MCP-Atlas, and MCPMark-Verified, while reducing thinking-token usage by ~30%
; deployment is supported via OpenAI/Anthropic-compatible APIs plus vLLM, SGLang, and KTransformers, with forced Thinking/preserve_thinking
modes and recommendedtemperature=1.0
,top_p=0.95
. Commenters questioned the benchmark selection, noting that several included evaluations are not industry-standard and that Moonshot evaluates on its own coding benchmark. Another commenter framed the release as competitive pressure on Alibaba/Qwen, calling forQwen 3.7 to be open-sourced.A commenter criticized
Kimi-K2.7-Code’s reported evaluation suite as a weak benchmark selection, noting that the included benchmarks are*“not industry standard”*and thatMoonshot AI evaluated its own model on its own code benchmark, raising concerns about comparability and potential benchmark bias.
(Activity: 300):Huawei Released openPangu 2.0 (Will open source on June 30)Huawei announced openPangu 2.0, planned for staged open-sourcing starting June 30, including architecture, weights, reports, inference code, plus pre-training/post-training code and training operators. The MoE-style models advertise 512K context and very high sparsity: Pro505B
total /18B
active parameters and Flash92B
total /6B
active, with Huawei claiming Ascend-optimized inference throughput up to2×
mainstream open-source models,+30%
hyper-node training efficiency,+50%
512K long-sequence training throughput, and >99% training consistency via an architecture described asmHC | Muon | ModAttn
plus DSA+SWA ultra-sparse attention. Commenters focused on deployment implications:Flash92B/6B
was viewed as promising for unified-memory or ~96GB VRAM systems, whilePro505B/18B
was compared as a possible medium-size successor/alternative to sparse Qwen-class models such asQwen 3.5397B-A17B
and122B-A10B
.Commenters highlighted
openPangu 2.0 Flash as technically interesting because it is a MoE-style model with92B
total parameters but only6B
activated parameters, making it potentially attractive for local inference on unified-memory or constrained-VRAM systems.One technical comparison framed
openPangu 2.0 Pro505B-18B
as a possible replacement forQwen 3.5397B-A17B
in the medium-size MoE category, whileopenPangu 2.0 Flash92B-6B
was compared toQwen 3.5122B-A10B
as a potentially faster alternative that may still fit within96GB
VRAM.Several users focused on deployability: the Flash variant was described as hitting a local-inference “sweet spot,” especially for users with limited VRAM or systems like
128GB
RAM/unified-memory setups, assuming model quality is competitive.
2. DiffusionGemma NVFP4 Release and Accuracy Benchmarks
(Activity: 370):nvidia/diffusiongemma-26B-A4B-it-NVFP4 · Hugging FaceNVIDIA releasednvidia/diffusiongemma-26B-A4B-it-NVFP4
, an NVFP4-quantized version of Google DeepMind DiffusionGemma 26B A4B IT, a multimodal MoE discrete-diffusion model with25.2B
total /3.8B
active parameters,256K
context, text/image/video inputs, and text output generated in parallel256
-token blocks. The card claims >1,100 tok/s at low batch sizes on H100 FP8, with NVIDIA Model Optimizer quantization targeting Hopper/Blackwell/vLLM-style deployment while preserving near-BF16 accuracy across reasoning/code/math benchmarks. A commenter pointed to an UnslothGGUF
[release](https://huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF), but noted it requires the DiffusionGemma-specific`llama.cpp`
[PR/branch](https://github.com/ggml-org/llama.cpp/pull/24423)and`llama-diffusion-cli`
**; standard**`llama-cli`
/llama-server
cannot run this block-diffusion architecture yet. Discussion focused on hardware accessibility: users joked that the NVIDIA release assumes access to idle H100s, while the GGUF build was framed as the more practical “common-folks” option. Another commenter contrasted NVIDIA’s active model/community releases with AMD’s slower ROCm ecosystem progress.A technically useful alternative release was linked:
Unsloth’s GGUF build ofdiffusiongemma-26B-A4B-it
athuggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF. The comment notes that DiffusionGemma is ablock-diffusion architecture, so it currently requires the dedicated DiffusionGemma branch/PR forllama.cpp
([ggml-org/llama.cpp#24423](https://github.com/ggml-org/llama.cpp/pull/24423)) and the`llama-diffusion-cli`
runner; standard`llama-cli`
/llama-server
generation is not supported yet.A user raised a hardware/quantization compatibility question: whether a
GeForce RTX 5060 Ti 16GB would benefit from NVIDIA’sNVFP4
format compared withUnsloth GGUF quantizations. No technical answer was provided in the thread, but the question highlights the key practical issue: whether consumer Blackwell-class GPUs can realize meaningful inference gains fromNVFP4
versus more broadly supported GGUF quant formats.
(Activity: 368):Diffusion Gemma is 4x faster, but makes 6x more mistakes!OP reports a single-H100 FP8 benchmark comparing Gemma4 26B A4B vs DiffusionGemma 26B A4B on three factual-generation prompts of decreasing topic popularity: Steve Jobs, Tetris, and BeOS. DiffusionGemma was ~3.5–4x
faster (763 tok/s
,3.7s
) than autoregressive Gemma4 (218 tok/s
,15.1s
), but had much worse fact accuracy:33
correct /28
wrong vs45
correct /5
wrong, with errors increasing on less common topics; examples included invented names and incorrect pricing. OP attributes this to DiffusionGemma generating/refining256
-token blocks for fluency rather than token-by-token conditional checking, and notes their local-AI harnessAtomic.Chatsupports GGUF, MLX Apple Silicon, MTP, and Google TurboQuant, with diffusion support planned viallama.cpp
. Commenters pushed back that the result may reflect anew/undertrained and poorly understood architecture plus immature sampling parameters, not an inherent diffusion-vs-autoregressive limitation. Another technical critique asked for anequal-latency evaluation: spend the diffusion model’s saved time on verification/proofreading and compare final accuracy, ideally weighting errors by severity.Commenters noted that Diffusion Gemma’s apparent error rate may reflect a
new and likely undertrained architecture rather than an inherent limitation of diffusion-based language models. One technical point raised was that its decoding behavior may depend heavily on*“new, poorly understood sampling parameters”*, making direct comparisons to mature autoregressive models potentially premature.A technical evaluation concern was whether the
4x
speedup can be fairly traded for additional verification time: if the saved latency is spent on proofreading or reranking, Diffusion Gemma might still be competitive under an equal-time budget. Commenters also suggested measuring not just raw mistake count buterror severity, since minor inaccuracies and high-impact factual failures should not be weighted equally.
3. Local Inference Acceleration and Quantized Builds
(Activity: 768):Gemma 4 Quadruple Release, 12B, 12B QAT, 26B-A4B QAT and 31B QAT Uncensored Heretics!LLMFan46 announced multiple “uncensored-heretic” Gemma 4 instruction-tuned releases on Hugging Face:31B-it-qat-q4_0
**,**`26B-A4B-it-qat-q4_0`
**,**`12B-it-qat-q4_0`
, and12B-it
. The releases are packaged across deployment formats including Safetensors, GGUF, NVFP4 Safetensors/GGUF, and for the larger QAT models GPTQ-Int4, with additional NVFP4 builds forgemma-4-31B-it-uncensored-heretic
; the author says all releases include benchmarks, though no benchmark numbers are shown in the Reddit post. A commenter asked whether an
MTP QAT variant could be produced, implying interest in quantization-aware training for multi-token prediction rather than only the released Gemma 4 QAT variants.Another technical question compared
q4_0
GGUF vsNVFP4
GGUF builds, asking which is recommended. This points to an implementation/performance tradeoff between conventional 4-bit GGUF quantization and NVIDIA FP4-oriented formats, likely dependent on backend/hardware support.
(Activity: 320):EAGLE3 has landed in llama.cppllama.cpp
mergedPR #18039, adding EAGLE3 speculative decoding via the newer speculative decoding API while preserving compatibility with MTP. EAGLE3 is an encoder-decoder speculative method where the draft/helper model is conditioned on intermediate features from the target model rather than drafting independently, with reported inference speedups of roughly2–3×
, including>2×
for Gemma4 with reasoning enabled and>3×
with reasoning disabled;Q4_K_M
quantization reportedly still preserves strong speedups. Commenters mainly framed EAGLE3 as another practical approach to mitigating the memory-bandwidth bottleneck in local inference, while asking for concrete comparisons against MTP in speed, VRAM usage, and model support such as Qwen3.6 27B.Commenters focused on unanswered technical comparisons between
EAGLE3 andMTP, specifically asking for** tokens/sec benchmarks**, VRAM overhead, and whether speculative decoding via EAGLE3 meaningfully helps break the usual** memory-bandwidth bottleneck**inllama.cpp
.There was specific concern about model compatibility, especially whether EAGLE3 can be used with
Qwen3.6 27B; one commenter implied it may not currently be useful for Qwen3.6 users, suggesting support may depend on availability of compatible draft/head models or integration details.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Fable 5 US Government Suspension
(Activity: 1404):US gov forces Anthropic to pull access to Fable 5The post links to an Anthropic notice aboutFable/Mythos
Commenters were broadly negative, with one saying they upgraded specifically for more Fable access and another noting the directive arrived late Friday. The only technical concern raised was speculation that the government may fear Fable 5 could help identify or patch zero-days that U.S. agencies exploit.accessand claims a U.S. government directive forced Anthropic to pull access to Fable 5. The excerpt provides no model-card details, benchmarks, eval results, or implementation specifics beyond the reported access-control/policy change.One technically relevant concern raised is that removal of access to
Anthropic’s “Fable 5” could be motivated by cybersecurity considerations: a commenter speculates the model may help identify or remediatezero-day
vulnerabilities that the US government would prefer remain undisclosed. This frames the access restriction as potentially affecting vulnerability discovery workflows rather than merely consumer model availability.Several comments interpret the action as a precedent for direct government control over frontier-model deployment, especially if a model is perceived as outperforming competitors or creating national-security risk. The practical technical impact noted is abrupt loss of access for users who upgraded plans specifically for higher usage of the model, highlighting reliability and dependency risks when building workflows around hosted frontier models.
(Activity: 1082):Fable 5 indefinitely suspended due to national security concernsTheimageis a screenshot of a dark-mode post attributed to “ClaudeDevs” claiming Anthropic has indefinitely suspended access to a model calledClaude Fable 5
due to a U.S. government directive and “national security concerns.” Technically, the claimed impact is model-routing/API availability: new sessions would fall back to other Claude models such asOpus 4.8
, while existingFable 5
**sessions and platform API requests would return errors; however, the Reddit context provides no independent verification beyond the linked Anthropic-looking URL and screenshot, so it should be treated as an unverified announcement image rather than confirmed technical documentation.**Comments are mostly outrage from users who say they recently paid for higher-tier access, e.g. “MFERS WHO JUST PAID 200$,” and confusion over why there is not more backlash. One linked comment image appears to be a meme/reaction rather than a technical contribution.(Activity: 1387):Megathread for US government suspension of Fable and Mythos**The subreddit opened a stickied megathread consolidating discussion around a reported US government suspension of Fable and Mythos. The post itself provides no technical details on the suspension mechanism, affected services/models, compliance basis, timelines, benchmarks, or implementation impact.*Top comments frame the suspension as possible regulatory capture or anti-innovation intervention, with one user joking“I see you haven’t bribed us yet”and another asking whether the government is effectively saying“stop being so good or we will nationalize you.”*One commenter also notes they had just bought a$250
“Max 20x Usage” plan to heavily use “Fable 5,” implying immediate user-facing disruption.A user reported a concrete service-impact case: they had just purchased a
$250
“Max 20x Usage” plan specifically to useFable 5, implying the suspension immediately affects paid high-usage access rather than only free-tier experimentation. Another commenter framed the broader technical/operational risk as dependency on US-hosted AI services, arguing that non-US users or organizations may not be able to rely on uninterrupted access if government action can suspend models such asFable andMythos.
2. Fable 5 Coding and Reverse-Engineering Breakthroughs
(Activity: 1144):Fable 5 decoded an entire 1989 DOS game executable in one day — six months of work with earlier models, done overnightA developer remastering Midwinter claims Fable 5/Claude reverse-engineered the original 1989 DOS executable overnight, producing a labeled map of602
functions covering terrain generation, vehicle physics, AI, win/loss logic, graphics formats, and audio; the terrain generator was reimplemented in Python withbit-for-bitmatching output. The workflow reportedly used parallel agents over a disassembly with an evidence ledger, and the resulting decode/tools are published under MIT atmidwinter-decode
, with a playable/project write-up at theproject siteand an asset extractor for ~600
sprites with CGA/EGA/VGA palettes. Commenters were impressed but raised two technical caveats: whether prior six months of accumulated project knowledge and the switch from Rust/Bevy to Unreal MCP made comparisons against earlier models unfair, and whether automated reconstruction of another commercial DOS game likeStar Command should trigger IP/copyright guardrails.A commenter questioned the benchmark validity of the claimed speedup, noting possible
self-bias / learning contamination: after6 months
of prior reverse-engineering work, both the author and possibly Claude may benefit from accumulated domain knowledge rather than starting from an equivalent baseline. They also flagged the addition ofUnreal MCP as a major tooling confounder, making the comparison against earlier models less fair unless each model is tested from a clean start with the same tools.One technically interesting thread extrapolated the workflow to
retrocomputing development: using Claude Code with a physical1989 Macintosh
,SCSI link, or** Apple IIe**to generate software for machines that were historically difficult to program. The commenter highlighted that even 1980s systems could execute around1 million instructions/sec
, but fully exploiting them often required expert low-level assembly optimization, citing theRollerCoaster Tycoonauthor’s raw assembly approach as an example.Another commenter raised an applied reverse-engineering use case: porting older RPGs such as
Might and Magic III into a later-series engine. The implication is that if model-assisted executable decoding can recover enough game logic and data structures from DOS-era binaries, engine migration and modernization of legacy games becomes more feasible.
(Activity: 2724):I vibe coded the first MMORPG with Fable 5A developer claims to have “vibe coded” a browser-based MMORPG, World of ClaudeCraft, using Fable 5 over a couple of days, with the full source released onTop commenters were surprised by the speed and polish, with one suggesting it could beGitHuband a playable build atworldofclaudecraft.com. The game appears to be a Minecraft/RPG-like multiplayer web app with server-persisted online characters, an offline single-player mode without saves, WASD/mouse controls, targeting/abilities, quests, inventory, chat, map, loot, and RPG panels.*“guerilla marketing by Anthropic”and another proposing a direct comparison by giving the same tasks toClaude Opus. One commenter specifically noted it seemed“miles better”*than other vibe-coded games and asked whether the assets were AI-generated or sourced elsewhere.A commenter suggested using the same MMORPG-building prompt/tasks with
Claude Opus as a control to compare againstFable 5, focusing on whether the models produce similar game functionality and implementation quality under identical constraints.There was technical skepticism about extrapolating from a rapid prototype: one commenter noted that “vibe coded” progress over a few days likely
does not scale linearly and can become expensive quickly as complexity, debugging, and iteration costs grow.A thread questioned asset provenance—whether Fable 5 generated assets or sourced them externally—with one reply indicating the visuals were
screenshots from the GitHub project, implying the demo may rely on existing project assets rather than fully generated ones.
(Activity: 1680):I gave Claude Code a “lazy senior dev” mode and it writes like 6x less codeA new MIT-licensed Claude Code plugin, Ponytail (GitHub), adds a “lazy senior dev” coding mode that forces an agent through a minimization checklist: avoid new code if stdlib/native features/existing deps/one-liners suffice. In the author’s 5-task benchmark, it reportedly used~16%
fewer tokens, ran~4x
faster, and reduced generated code from293
LOC to47
LOC; one example dropped a 190-line countdown “dashboard” to13
**lines. It auto-activates in Claude Code with a statusline badge and also ships rule files for Cursor, Windsurf, Cline, Copilot, and Aider.**Commenters generally liked the reduction in verbose, hard-to-review agent output, but one technical caveat noted that minimal email validation can be context-dependent: a check suitable before sending mail may be insufficient if invalid addresses are persisted to a database.Commenters raised a correctness issue with replacing robust email validation with a minimal check like
"@" in email
: it may be acceptable only if the next step is actually sending a confirmation email, but otherwise it can persist invalid addresses and create a data-quality bug. Another commenter explicitly called that validation approach “trash code,” highlighting that reduced code size can trade off against input-validation correctness.
3. Claude Subscription Unit Economics
(Activity: 1143):For every $200 subscription, Anthropic throws in another $7,800.Theimageis a dark-themed pricing comparison claiming Anthropic Claude Max 20x at$200/mo
has a “max possible spend” of about$8,000/mo
, while OpenAI ChatGPT Pro/Codex 20x at$200/mo
could imply up to$14,000/mo
in retail-equivalent usage. The post frames this as evidence of heavy subscription subsidization and possible unsustainable AI pricing, but the table appears to compare subscription fees against API retail token prices, not Anthropic/OpenAI’s actual marginal inference costs.Commenters pushed back that “max possible spend” is only an upper bound and that fee ≠ cost: API token prices are retail prices, not provider cost. Several argued most subscribers never hit limits, so high-usage users are subsidized by lower-usage users rather than every$200
user costing Anthropic$8,000
.Several commenters pushed back on the headline’s calculation, arguing it conflates
API list price with Anthropic’s internal inference cost. They noted that the$7,800
/$13,800
figures represent a theoretical API-equivalent maximum if a user saturated subscription limits continuously, not the marginal cost Anthropic actually incurs;*“Fee ≠ cost”*was the core technical objection.A recurring technical point was that subscription limits are designed around statistical oversubscription: most users on Max/Pro tiers do not hit caps continuously, so the relevant cost is expected utilization, not worst-case token throughput. One user reported downgrading from a
20x
Max plan to5x
without hitting limits, using this as evidence that light users subsidize heavier users within the pricing model.Commenters also highlighted that API pricing includes margin and product-level pricing strategy, not raw compute cost. References to cache and batch discounts were used as evidence that the API price has substantial markup, making it invalid to infer Anthropic’s per-user subsidy directly from retail token rates.
AI Discords
Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.