# [AINews] Microsoft Build: MAI-Thinking-1 and MAI Family models

> Source: <https://www.latent.space/p/ainews-microsoft-build-mai-thinking>
> Published: 2026-06-03 05:49:02+00:00

# [AINews] Microsoft Build: MAI-Thinking-1 and MAI Family models

### Microsoft Build recap, and new MAI model technical details

Today was a big day, not least because we caught up on [the state of GitHub vs Agents](https://www.latent.space/p/github), and recorded a [special pod with No Priors and Satya Nadella](https://x.com/TheTuringPost/status/2061901518522188251?s=20) — at MS Build, Satya and Mustafa announced 7 new MAI models:

This is an impressive lineup, especially considering that the [Microsoft-Inflection deal that set up MAI ](https://news.smol.ai/issues/24-03-20-ainews-shipping-and-dipping-inflection-stability-edition)only happened 2 years ago, and that these are all from-scratch pretrains. MAI today is by no means an unqualified frontier lab, but it is a good tier 2 neolab with obvious incentives to support domain specific finetunes (as opposed to [the frontier labs who have ~all killed finetuning](https://www.latent.space/p/ainews-the-end-of-finetuning)).

The star of the show was the [100+ page MAI tech report](https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf), which the research community is giving glowing reviews:

You can catch up on all the rest of the announcement in the excellent Verge recap, and the tweet summaries below:

AI News for 06/1/2026-6/2/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

**AI Twitter Recap**

**Top Story: Microsoft Build recap, and new MAI model technical details**

**What happened**

**Microsoft used Build to position itself as both an AI platform company and a frontier-model lab, pairing broad product launches with unusually detailed disclosures about its new MAI model family.**

Microsoft AI announced

**seven new MAI models** spanning reasoning, code, image, speech transcription, and voice, led by**MAI-Thinking-1**,** MAI-Code-1-Flash**,** MAI-Image-2.5**,** MAI-Transcribe-1.5**, and** MAI-Voice-2**according to[@MicrosoftAI](https://x.com/MicrosoftAI/status/2061887500541366489)and[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)The flagship reasoning model

**MAI-Thinking-1** was presented as Microsoft’s**first reasoning model**, built with** clean data lineage**and** zero distillation from third-party models**in posts from[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188),[@baseten](https://x.com/baseten/status/2061878701823066431),[@tuhinone](https://x.com/tuhinone/status/2061879239817969756), and[@HannaHajishirzi](https://x.com/HannaHajishirzi/status/2061901432627044430)Microsoft released a

**109-page technical report** for MAI-Thinking-1, which drew strong positive reactions from technically oriented readers for its level of transparency, including[@eliebakouch](https://x.com/eliebakouch/status/2061877335960281459),[@ethanCaballero](https://x.com/ethanCaballero/status/2061920873297088723),[@nrehiew_](https://x.com/nrehiew_/status/2062013300196700395),[@yacinelearning](https://x.com/yacinelearning/status/2061914159235617056), and[@stochasticchasm](https://x.com/stochasticchasm/status/2061916808626815161)Microsoft also emphasized

**local AI and agent-native Windows**: Build messaging highlighted** secure execution layers for agents**, a new** Surface RTX Spark Dev Box**, Windows AI access to the broader Windows GPU install base, and concept hardware such as** Project Solara/Scout**, summarized by[@yusuf_i_mehdi](https://x.com/yusuf_i_mehdi/status/2061882543641907528),[@TheTuringPost](https://x.com/TheTuringPost/status/2061865165734506683),[@kimmonismus](https://x.com/kimmonismus/status/2061860319547527191), and[@kimmonismus](https://x.com/kimmonismus/status/2061875714933371220)Build also included a major

**GitHub Copilot app** push as the “desktop home for agent-native software development,” with**canvases**, cross-device continuity, and tighter GitHub agent workflows, from[@pierceboggan](https://x.com/pierceboggan/status/2061868635241828688),[@lukehoban](https://x.com/lukehoban/status/2061905434039246939), and reactions from[@techgirl1908](https://x.com/techgirl1908/status/2061870470237164018)Microsoft introduced

**Web IQ**, a new grounding/search API stack for AI agents, claiming the APIs already power “nearly all AI agents and chatbots in the industry today, including Copilot and ChatGPT,” via[@JordiRib1](https://x.com/JordiRib1/status/2061866606670581871)Satya Nadella framed Build as an ecosystem moment rather than a single-product launch, while Mustafa Suleyman framed it as the output of Microsoft’s internal “hill-climbing machine,” in

[@satyanadella](https://x.com/satyanadella/status/2061896503304806521),[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061934667096596657), and reaction from[@nrehiew_](https://x.com/nrehiew_/status/2061983583523475556)

**MAI model family: disclosed facts and technical details**

**MAI-Thinking-1**

Microsoft described

**MAI-Thinking-1** as a**35B active parameter MoE** with a**256K context window** in[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)A separate summary from

[@scaling01](https://x.com/scaling01/status/2061889624847343825)says the model is a**1T@35B parameter model**,** pre-trained on 30T tokens**, and trained using** 8192 GB200 GPUs**; this appears to be a reading of the technical report rather than Microsoft marketing copy[@kimmonismus](https://x.com/kimmonismus/status/2061877528781025381)similarly summarized it as a**mid-size MoE with 45B active params**, but this conflicts with Mustafa’s own** 35B active**figure; the more authoritative figure in the tweet set is the official** 35B active**numberMicrosoft claims

**97% on AIME 2025** and**53% on SWE-Bench Pro**, with blind human raters on Surge preferring it overall to** Sonnet 4.6**, from[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)and[@asadovsky](https://x.com/asadovsky/status/2062008312603070891)Microsoft says the model is

**optimized on MAIA 200**, with** 30% better performance per dollar**and** 1.4x performance-per-watt gain**versus** GB200**when running MAI models end-to-end, per[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)Microsoft and partners repeatedly stressed

**no third-party distillation**, “clean data lineage,” and enterprise-controlled fine-tuning with “100% eyes-off” post-training data through Baseten, in[@baseten](https://x.com/baseten/status/2061878701823066431),[@tuhinone](https://x.com/tuhinone/status/2061879239817969756), and[@MicrosoftAI](https://x.com/MicrosoftAI/status/2061923309344756043)

**MAI-Code-1-Flash**

Microsoft introduced

**MAI-Code-1-Flash** as a fast coding model for**VS Code** and**GitHub Copilot CLI**, first announced by[@pierceboggan](https://x.com/pierceboggan/status/2061877165810131297)and later highlighted by[@mariorod1](https://x.com/mariorod1/status/2061914993550143513)Official Microsoft messaging via

[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)says**Code-1-Flash achieves 51% on SWE-Bench Pro despite having just 5B parameters**, positioning it near Haiku-class size/costA competing summary from

[@scaling01](https://x.com/scaling01/status/2061891478176112794)describes it as a**137B parameter MoE**,** 256K context**, trained on** 10T+ tokens**, and “stronger and more efficient than Claude 4.5 Haiku.” That likely indicates** 5B active parameters**rather than total parameters; the tweets do not fully reconcile this distinction, but together imply** small active footprint within a much larger MoE**Availability at launch was highlighted as

**GitHub Copilot / VS Code-first**, per[@scaling01](https://x.com/scaling01/status/2061891478176112794)and[@mariorod1](https://x.com/mariorod1/status/2061914993550143513)

**MAI-Image-2.5**

Microsoft launched

**MAI-Image-2.5** and a**Flash** variant, claiming both reached**#2 on leaderboards**, with[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)saying they surpass** Nano Banana 2**on image editingIndependent leaderboard accounts supported the high ranking:

[@arena](https://x.com/arena/status/2061887242579382660)reported**#2 in Image Edit Arena** with**score 1401**,**+10 points over Nano Banana 2**, Grok Imagine, and ChatGPT Image Latest HF[@arena](https://x.com/arena/status/2061894541888962712)further said MAI-Image-2.5 “advances the Pareto frontier,” meaning no model at its price tier scores higher on that benchmarkDistribution partners quickly followed, including

[@OpenRouter](https://x.com/OpenRouter/status/2061894672847671724)and[@fal](https://x.com/fal/status/2061920052664820199)

**MAI-Transcribe-1.5**

[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2061878491860324402)reported**MAI-Transcribe-1.5** as an unusually strong speed/accuracy point on the STT frontier:**~276x realtime**,** 2.4% AA-WER**,**#3 overall** on its leaderboardThe model supports

**43 languages**, including English, French, Arabic, Japanese, and Chinese, and supports** keyword biasing**for rarer terms such as names and medical terminology, per[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2061878491860324402)Pricing was reported as

**$6 per 1,000 minutes of audio** via Microsoft Foundry in[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2061878498609053909)OpenRouter also listed the model among the three MAI launches it brought live the same day in

[@OpenRouter](https://x.com/OpenRouter/status/2061894672847671724)

**MAI-Voice-2**

MAI-Voice-2 appears in Microsoft’s “seven models” umbrella and in OpenRouter’s availability post at

[@OpenRouter](https://x.com/OpenRouter/status/2061894672847671724)The tweet set contains little technical detail on Voice-2 itself beyond launch/availability

**Technical-report details that mattered to researchers**

**Why the report stood out**

The dominant technical reaction was that Microsoft released an unusually detailed frontier-model report:

[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947)called it “one of the most transparent for a model at this scale,”[@nrehiew_](https://x.com/nrehiew_/status/2062023547690828141)said it “could really serve as an updated textbook for LLM training today,” and[@stochasticchasm](https://x.com/stochasticchasm/status/2061879506139557979)called it a “gold mine”Multiple readers highlighted that the report disclosed

**pipeline details, scaling ladder methodology, data curation, infra metrics, and MFU numbers**; this level of specificity is what drew praise from[@ethanCaballero](https://x.com/ethanCaballero/status/2061920873297088723),[@eliebakouch](https://x.com/eliebakouch/status/2062004670017486912), and[@nrehiew_](https://x.com/nrehiew_/status/2062013300196700395)

**Pretraining and data**

A major technical claim repeated across commentary is that MAI-Thinking-1 used

**no synthetic data** and**no distillation**, not only in post-training but throughout the disclosed pipeline, from[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947),[@stochasticchasm](https://x.com/stochasticchasm/status/2061967095022366924), and[@HannaHajishirzi](https://x.com/HannaHajishirzi/status/2061901432627044430)[@eliebakouch](https://x.com/eliebakouch/status/2061977834558804207)says the report explicitly notes data from**Common Crawl plus private sources**, with** targeted sub-pipelines for different domains**, heavy extraction/dedup work, and an intentional choice of** no synthetic data**The report’s internal

**private NLL set** used for scaling decisions was summarized by[@eliebakouch](https://x.com/eliebakouch/status/2061976608265880004)as:**50% code****17.5% STEM****17.5% math****10% general knowledge****5% multilingual**

[@eliebakouch](https://x.com/eliebakouch/status/2061976230933496176)says architecture promotion in the scaling ladder was based on an**Efficiency Gain (EG)** metric: how much extra compute the baseline would need to match the candidate’s lossThe same thread notes ablations at roughly

**100/200 tokens per parameter**, described as around “Chinchilla optimal” for the setup, while also remarking this differs from dense-model heuristics due to MoE structure in[@eliebakouch](https://x.com/eliebakouch/status/2061975730414633043)

**Post-training / RL**

The most discussed technical choice was that Microsoft appears to have started RL from a checkpoint with

**no prior reasoning exposure**, which several readers found notable.[@stochasticchasm](https://x.com/stochasticchasm/status/2061879070141677615)called this a “very interesting decision,” while[@stochasticchasm](https://x.com/stochasticchasm/status/2061878066314645861)reacted to graphs suggesting a jump from**<20% AIME25 to >95%**[@HannaHajishirzi](https://x.com/HannaHajishirzi/status/2061901432627044430)described the “climbing from scratch” recipe as**simple recipes, rigorous science, self-distillation, patience, and great infra**[@soldni](https://x.com/soldni/status/2061882085573616003)characterized the process as “climbing with no distillation, like the big boys do”Some independent readers inferred from the report that

**synth data remains very valuable** for agentic performance in the broader field, even if Microsoft deliberately avoided it here; see[@stochasticchasm](https://x.com/stochasticchasm/status/2061961874879783376)

**Data curation / judges / DSPy GEPA**

A detail that got substantial attention from the DSPy/late-interaction crowd: Microsoft reportedly used

**GEPA / DSPy-optimized LLM judges** in pretraining data curation and quality scoringThis was highlighted by

[@bj2rn](https://x.com/bj2rn/status/2061941109828301241),[@LakshyAAAgrawal](https://x.com/LakshyAAAgrawal/status/2062013650639241403), and[@lateinteraction](https://x.com/lateinteraction/status/2062015109132873852)

**Infra / utilization / hardware co-design**

Microsoft reportedly disclosed

**exact MFU across iterations**, which multiple readers said is rarely shared at this scale, per[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947)[@scaling01](https://x.com/scaling01/status/2061889624847343825)summarized the run as using**8192 GB200 GPUs**[@eliebakouch](https://x.com/eliebakouch/status/2062004120098144764)singled out a reported**~40% higher throughput per watt**-type figure as “pretty impressive and bullish on microsoft chips,” though this may refer to rack-level budget or serving configuration and was not fully unpacked in-tweetMicrosoft’s official framing connected model design to

**MAIA 200** custom silicon and emphasized better**performance-per-dollar** and**performance-per-watt** vs NVIDIA GB200 in[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)Build’s broader Windows/local-AI narrative also centered on hardware specifics such as:

**1 trillion parameters running locally on DGX Station****128GB unified memory****110 TOPS AI performance****20 CPU cores****70+ PowerToys utilities** from[@TheTuringPost](https://x.com/TheTuringPost/status/2061852480636653924)

Reactions also pointed to local runs of large models, e.g.

[@kimmonismus](https://x.com/kimmonismus/status/2061852979318427988)on**RTX Spark running a 120B parameter model locally**

**Build product/platform recap beyond the models**

**GitHub Copilot app and agent-native development**

GitHub unveiled the

**GitHub Copilot app**, pitched as a desktop surface for** agent-native software development**by[@pierceboggan](https://x.com/pierceboggan/status/2061868635241828688)Key themes included:

**canvases** for bidirectional work between users and agents, per[@Techmeme](https://x.com/Techmeme/status/2061875738694062419)continuity across

**CLI, mobile, web, local, and cloud**, per[@lukehoban](https://x.com/lukehoban/status/2061905448287322243)a growing role for GitHub as the center of agent workflows, reflected in

[@techgirl1908](https://x.com/techgirl1908/status/2061870470237164018)and[@OrenMe](https://x.com/OrenMe/status/2061873010664001605)

Copilot CLI also got an experimental

**terminal UI with tabs, built-in feedback/rubber duck, prompt scheduling, and voice input**, per[@GHchangelog](https://x.com/GHchangelog/status/2061870684876272123)

**Windows as an agent runtime**

Microsoft’s Windows org framed Build around “faster developer execution, a secure execution layer for agents, and unmetered intelligence that runs locally on device,” per

[@yusuf_i_mehdi](https://x.com/yusuf_i_mehdi/status/2061882543641907528)Several posts stressed that Microsoft wants

**Windows** to be the trusted execution platform for agents, not just Azure[@TheTuringPost](https://x.com/TheTuringPost/status/2061865165734506683)described**Project Solara** as a platform for**agent-first devices**, with concepts including:a

**desktop AI companion** a

**wearable badge** with cameras, microphones, sensors, and secure authentication

[@kimmonismus](https://x.com/kimmonismus/status/2061860319547527191)saw these as handheld/desktop devices for controlling agents and compared them to expectations people had for standalone OpenAI hardware[@kimmonismus](https://x.com/kimmonismus/status/2061875714933371220)separately highlighted**Microsoft Scout** as an “always-on personal agent for work”

**Web IQ and search for agents**

[@JordiRib1](https://x.com/JordiRib1/status/2061866606670581871)announced**Microsoft Web IQ** as a suite of**AI-native grounding APIs** for**web pages, news, images, and videos** His framing is important context: classic search engines were built for humans, but Microsoft believes future search demand will come from agents, potentially

**1000x more queries** than human search trafficHe claimed Web IQ was re-architected from Bing’s stack for

**quality, latency, and token efficiency**, and that it already powers major chatbots including** Copilot and ChatGPT**

**Foundry and open-model distribution**

[@jeffboudier](https://x.com/jeffboudier/status/2061868927207244277)said Satya cited**11,000+ models available in Microsoft Foundry**, of which** 10,928**come from Hugging FaceThis supports Microsoft’s parallel identity at Build: both a first-party model builder and a large multi-model hosting/distribution platform

**Build messaging around datacenters and compute**

Several observers noted Build discussion around

**data center expansion**, community backlash, and Microsoft’s argument that AI infra can expand without raising electricity costs to local communities; see[@kimmonismus](https://x.com/kimmonismus/status/2061854806395015316)and[@kimmonismus](https://x.com/kimmonismus/status/2061903253890330639)[@scaling01](https://x.com/scaling01/status/2061901702324695115)highlighted Mustafa saying AI compute will grow**1000x in the next 3 years**, taking today’s rough** 5e27 FLOPs**frontier scale to** 5e30 FLOPs by 2029**[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880029315764256)summarized the company’s philosophical theme as**“Humanist superintelligence”**

**Facts vs. opinions**

**Factual claims in the tweet set**

Microsoft launched

**seven new MAI models** at Build:[@MicrosoftAI](https://x.com/MicrosoftAI/status/2061887500541366489)Official metrics for MAI-Thinking-1:

**35B active MoE**,** 256K context**,** 97% AIME 2025**,** 53% SWE-Bench Pro**, and blind human preference over Sonnet 4.6:[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)Official metrics for MAI-Code-1-Flash:

**51% SWE-Bench Pro**,** 5B parameters**as stated in tweet copy:[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061880164498428188)MAI-Image-2.5 ranking claims were independently echoed by

[@arena](https://x.com/arena/status/2061887242579382660)MAI-Transcribe-1.5 speed/accuracy details came from independent benchmark account

[@ArtificialAnlys](https://x.com/ArtificialAnlys/status/2061878491860324402)Microsoft released a

**109-page technical report**:[@eliebakouch](https://x.com/eliebakouch/status/2061877335960281459)

**Opinions / interpretations**

“Microsoft is training serious models now?” from

[@teortaxesTex](https://x.com/teortaxesTex/status/2061892492350407158)is an interpretive reaction to the model/report quality, not a standalone factClaims that the report is “one of the most transparent” or “an updated textbook” are opinions from

[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947)and[@nrehiew_](https://x.com/nrehiew_/status/2062023547690828141), albeit shared by many readers[@kimmonismus](https://x.com/kimmonismus/status/2061852480636653924)and[@TheTuringPost](https://x.com/TheTuringPost/status/2061865165734506683)framed Build as a strategic shift from cloud-only AI toward local reasoning/agents; that is analysis rather than official wordingPosts claiming Microsoft “leaked” Anthropic Mythos FLOPs, including

[@swyx](https://x.com/swyx/status/2061878629504881151)and[@scaling01](https://x.com/scaling01/status/2061897540161728791), are speculative interpretations of a slide, later contested by the same cluster of commenters

**Different opinions and perspectives**

**Supportive views**

Technical readers were broadly impressed by the

**report’s transparency** and Microsoft’s willingness to publish details usually withheld at this scale:[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947),[@nrehiew_](https://x.com/nrehiew_/status/2062023547690828141),[@ethanCaballero](https://x.com/ethanCaballero/status/2061920873297088723),[@stochasticchasm](https://x.com/stochasticchasm/status/2061916808626815161)Some saw MAI-Thinking-1 as proof Microsoft is becoming a genuine frontier lab rather than just a model reseller or application layer, e.g.

[@teortaxesTex](https://x.com/teortaxesTex/status/2061892492350407158),[@echen](https://x.com/echen/status/2061907282607100075),[@NandoDF](https://x.com/NandoDF/status/2061901884042985728)Enterprise/platform supporters liked the

**clean-data-lineage**,** fine-tunable**,** eyes-off post-training data**story, especially Baseten/Microsoft’s positioning around ownership and control:[@baseten](https://x.com/baseten/status/2061878701823066431),[@tuhinone](https://x.com/tuhinone/status/2061879239817969756)

**Neutral / analytical views**

Several posts focused on

**reading and unpacking the report** rather than cheering the launch, especially[@stochasticchasm](https://x.com/stochasticchasm/status/2061916808626815161),[@nrehiew_](https://x.com/nrehiew_/status/2062013300196700395), and[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947)Some commentators were careful on benchmark interpretation.

[@kimmonismus](https://x.com/kimmonismus/status/2061918020843557110)noted Microsoft appeared to compare to**Sonnet 4.6** generally, with**Opus-level comparability only on SWE Pro**[@iScienceLuvr](https://x.com/iScienceLuvr/status/2061926066453962952)specifically appreciated reporting on** health benchmarks**such as HealthBench Professional and MedXpertQA rather than only coding/math

**Skeptical / opposing views**

A subset questioned whether all numbers and comparisons were being interpreted correctly, especially around active params and external-model comparisons

The most visible skepticism concerned the apparent

**Mythos FLOP “leak”**.[@iScienceLuvr](https://x.com/iScienceLuvr/status/2061882397340393514)suggested it was probably just an estimate, not a leak;[@scaling01](https://x.com/scaling01/status/2061989029025853757)later argued the original**6.1e27 FLOP** figure was unrealistic and supplied a lower alternative estimate before posting a correction in[@scaling01](https://x.com/scaling01/status/2061990840138899674)There was also implicit skepticism in the field about whether

**zero synth / zero distillation** is the right long-term recipe for best agentic performance, as noted by readers emphasizing synth-data deltas elsewhere, e.g.[@stochasticchasm](https://x.com/stochasticchasm/status/2061961874879783376)

**Context: why this matters**

Build’s announcements matter because they suggest Microsoft is no longer content with being only:

Azure/OpenAI’s cloud host

GitHub’s developer surface

Copilot’s application shell

It is also trying to be a**first-party frontier model developer** with its own model family, silicon stack, and post-training platform

The

**clean lineage / no distillation** emphasis is strategically significant. It addresses enterprise concerns around IP provenance, future controllability, and dependence on external labsThe

**local AI** emphasis matters because Microsoft is tying AI strategy to Windows and device distribution, not just to Azure. Build messaging repeatedly pushed the idea that reasoning models, planners, and agents can increasingly run**on-device**, not only in the cloud:[@TheTuringPost](https://x.com/TheTuringPost/status/2061852480636653924),[@yusuf_i_mehdi](https://x.com/yusuf_i_mehdi/status/2061882543641907528)The

**109-page report** matters because frontier-model transparency has generally been shrinking, especially around data, infra, and training methodology. Multiple researchers explicitly noted the disclosure level is uncommon at this scale:[@eliebakouch](https://x.com/eliebakouch/status/2061965825037254947),[@nrehiew_](https://x.com/nrehiew_/status/2062023547690828141)The Build recap also showed Microsoft trying to integrate all layers of the stack:

**models**: MAI family** chips**: MAIA 200** cloud**: Azure + Foundry** OS**: Windows agent runtime** developer UX**: Copilot app / VS Code / CLI** retrieval/grounding**: Web IQ** hardware form factors**: Solara / Scout concepts

This combination is why several observers described the event less as a normal dev conference and more as a coordinated move toward an

**agent platform spanning cloud, edge, OS, and custom models**, e.g.[@satyanadella](https://x.com/satyanadella/status/2061896503304806521),[@mustafasuleyman](https://x.com/mustafasuleyman/status/2061934667096596657), and[@TheTuringPost](https://x.com/TheTuringPost/status/2061865165734506683)

**The “Mythos FLOPs leak” mini-story**

During/after Build, some users claimed a Microsoft slide inadvertently revealed training compute for Anthropic’s rumored

**Claude Mythos**, with[@swyx](https://x.com/swyx/status/2061878629504881151)asking if Mustafa had leaked the FLOP count[@scaling01](https://x.com/scaling01/status/2061897540161728791)estimated the slide implied**6.1e27 FLOPs** with a confidence interval based on pixel measurement, while[@kimmonismus](https://x.com/kimmonismus/status/2061908067034517853)noted that would be around**Gemini 3.1 Pro-scale** computeThat interpretation was subsequently challenged by

[@iScienceLuvr](https://x.com/iScienceLuvr/status/2061882397340393514), who argued it was probably an estimate, and then by[@scaling01](https://x.com/scaling01/status/2061989029025853757), who posted a lower-range model-based estimate of**3.37e26 to 1.46e27 FLOPs** and later said the original numbers were**bogus** in[@scaling01](https://x.com/scaling01/status/2061990840138899674)The episode is useful mostly as context: Build’s compute/scaling messaging was detailed enough that people started trying to infer competitor training budgets from presentation materials

**Developer tools, agents, and coding workflows**

OpenAI launched

**Sites in Codex**, letting teams turn ideas/docs/plans into deployed internal websites/apps with auth and dynamic data, first for business/enterprise users, in[@OpenAI](https://x.com/OpenAI/status/2061845949170045346),[@TheRohanVarma](https://x.com/TheRohanVarma/status/2061872164442403139), and[@gdb](https://x.com/gdb/status/2061988413105156128)OpenAI also expanded

**role-specific Codex plugins** across sales, data analytics, creative production, product design, and public equity workflows, with access to**62 apps and 110 skills**, from[@OpenAI](https://x.com/OpenAI/status/2061887650391625870)and[@OpenAIDevs](https://x.com/OpenAIDevs/status/2061888366791246071)GitHub’s

**Copilot app** and Microsoft’s Build push around agent-native software development were central to the day’s tooling news:[@pierceboggan](https://x.com/pierceboggan/status/2061868635241828688),[@lukehoban](https://x.com/lukehoban/status/2061905434039246939),[@GHchangelog](https://x.com/GHchangelog/status/2061870684876272123)Anthropic shipped a

**CLI for Claude Platform** and upgraded Claude Code’s`/fork`

to run a background agent with exact context + prompt cache, in[@ClaudeDevs](https://x.com/ClaudeDevs/status/2061877343078244459)and[@ClaudeDevs](https://x.com/ClaudeDevs/status/2061947411141169494)Nous launched

**Hermes Desktop**, a local/native desktop surface for Hermes agents, in[@NousResearch](https://x.com/NousResearch/status/2061843507417944552),[@Teknium](https://x.com/Teknium/status/2061844602735538266), and later Tailscale/Ollama integration notes from[@Teknium](https://x.com/Teknium/status/2061984430370267210)and[@ollama](https://x.com/ollama/status/2062011585355551231)Cognition launched

**Devin Desktop**, positioned as an agent-neutral desktop for managing local/cloud agents and handoff between local planning and cloud execution, in[@cognition](https://x.com/cognition/status/2061889596703551926),[@ScottWu46](https://x.com/ScottWu46/status/2061998361373532187), and[@russelljkaplan](https://x.com/russelljkaplan/status/2061920322325205007)

**Models, local inference, and routing**

H Company launched

**Holo 3.1**, a local computer-use model family based on Qwen-style architecture, with checkpoints from** 0.8B to 35B**and formats including** NVFP4, FP8, and Q4 GGUF**; a popular summary cited** 79.3% on AndroidWorld**for the 35B model in[@TeksEdge](https://x.com/TeksEdge/status/2061825310669332818), with launch tweet from[@hcompany_ai](https://x.com/hcompany_ai/status/2061815355341725925)Perplexity announced

**hybrid agentic inference** for Perplexity Computer, splitting work between**local models on-device** and frontier cloud models for privacy and token efficiency, in[@perplexity_ai](https://x.com/perplexity_ai/status/2061861293569765847)and[@AravSrinivas](https://x.com/AravSrinivas/status/2061875858542096520)OpenRouter data shared by

[@ttunguz](https://x.com/ttunguz/status/2061846636805177692)showed**open-weight models at 69.1% of token volume**, versus** 30.9%**for closed modelsCommentary around

**model routing** as a key future abstraction came from[@ClementDelangue](https://x.com/ClementDelangue/status/2061871024627482964),[@garrytan](https://x.com/garrytan/status/2061878212213572083),[@matanSF](https://x.com/matanSF/status/2061865185527074914), and the counterpoint from[@glennko](https://x.com/glennko/status/2061896887699964171), who argued enterprise production reliability makes generic routing harder than enthusiasts suggestLocal-AI UX improvements also appeared in Hugging Face’s

**hardware compatibility checks** and oMLX’s native macOS app release from[@m_newhaus](https://x.com/m_newhaus/status/2061824017510584630)and[@jundotkim](https://x.com/jundotkim/status/2061863850874634242)

**Research and evals**

Google DeepMind announced

**Co-Scientist**, a Gemini-based multi-agent hypothesis generation system for science, claiming collaborations that helped identify liver fibrosis targets, ALS approaches, and genetic leads for aging, in[@GoogleDeepMind](https://x.com/GoogleDeepMind/status/2061857539977842793),[@GoogleDeepMind](https://x.com/GoogleDeepMind/status/2061857550438392094), and[@GoogleDeepMind](https://x.com/GoogleDeepMind/status/2061857553076920643)The new

**Crafter / CraftEditor** work on editable scientific figure generation drew attention as a five-agent workflow for producing and refining figures plus raster-to-SVG conversion, in[@HuggingPapers](https://x.com/HuggingPapers/status/2061800325959324069),[@_akhaliq](https://x.com/_akhaliq/status/2061835314599993392), and[@TheTuringPost](https://x.com/TheTuringPost/status/2061883014410629400)Tilde Research introduced

**Wall Attention**, a RoPE-free attention method with diagonal forget gates, claiming training at** 4k**and generalization to** 200k+**tokens plus Triton kernels and strong decode throughput, in[@tilderesearch](https://x.com/tilderesearch/status/2061839600562409581)A robotics vision encoder claiming

**+22.5% real-world OOD success** by encoding dynamics-awareness rather than relying on static-image pretraining was posted by[@jbhuang0604](https://x.com/jbhuang0604/status/2061840469966090308)New evals/benchmarks of note:

**PaintBench** for precise image editing, where best model reached only**17.1%**, from[@itskaixu](https://x.com/itskaixu/status/2061827068170518956)** VSTAT**for video state tracking, arguing frontier MLLMs remain weak at tracking evolving world state, from[@PinzhiHuang](https://x.com/PinzhiHuang/status/2062004108249145442)and[@sainingxie](https://x.com/sainingxie/status/2062011403733512253)**Data Agent Benchmark** for enterprise data workflows, from[@sh_reya](https://x.com/sh_reya/status/2061984097531310378)

**Inference, infrastructure, and agent systems**

Harvey + LangChain shared work on

**cheap verifiers** for legal agents, showing**DeepSeek V4 Flash** could preserve**94–96% agreement** with Opus 4.7 while reducing cost**18x** in per-criterion mode and**~1000x** in batch mode; for**3,200 RL rollouts**, verification cost dropped from**$18,000 to $18**, in[@harvey](https://x.com/harvey/status/2061866491033899371),[@hwchase17](https://x.com/hwchase17/status/2061867746141356427), and[@nikogrupen](https://x.com/nikogrupen/status/2061866707988431039)W&B relaunched

**Weave** as agent-first observability with integrations across common harnesses and automated detection of failure modes, in[@wandb](https://x.com/wandb/status/2061894943203831996)and[@neutralino1](https://x.com/neutralino1/status/2061949197851742525)Prime-RL integrated

**Mooncake Store** with vLLM for cross-node prefix / KV cache reuse, pitched as key for agentic rollouts, in[@m_sirovatka](https://x.com/m_sirovatka/status/2061862853997465738)Together detailed serving optimizations for

**MiniMax-M3**, citing** 81–125% throughput improvements**via KV-block-major sparse attention, paged decode, optimized index scoring, and multimodal preprocessing, in[@togethercompute](https://x.com/togethercompute/status/2061895336486949109)MiniMax itself highlighted

**1M context**, native multimodality, desktop-computer operation, and MSA reducing attention’s share of decode time from**~30% to ~5%**, in[@MiniMax_AI](https://x.com/MiniMax_AI/status/2061944204604101020)

**Ecosystem, hardware, and industrial capacity**

Westmag emerged from stealth to build

**American robot actuators and drone motors**, with**$11M raised** led by a16z and participation from Founders Fund, Lux, NFDG, Menlo and others, in[@boxcardavid](https://x.com/boxcardavid/status/2061825303715123234),[@packyM](https://x.com/packyM/status/2061835223470330100), and[@oyhsu](https://x.com/oyhsu/status/2061837257531670864)PyTorch noted NVIDIA adoption of

**OpenMDW-1.1**, a permissive AI-model licensing framework, across four open-model families in[@PyTorch](https://x.com/PyTorch/status/2061840384817328604)Martin Scorsese publicly demonstrated narrow, preproduction use of

**FLUX** for storyboarding with Black Forest Labs, framed as exploratory and complementary to hand-drawn work rather than generative replacement, in[@robrombach](https://x.com/robrombach/status/2061804823352086681)and[@TheRundownAI](https://x.com/TheRundownAI/status/2061834880917357011)

**AI Reddit Recap**

**/r/LocalLlama + /r/localLLM Recap**

**1. NVIDIA Nemotron 3 Ultra and RTX Spark Specs**

## Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.