# [AINews] Google I/O 2026: Gemini 3.5 Flash, Omni (NanoBanana for Video), Spark (background agents), and Antigravity 2.0

> Source: <https://www.latent.space/p/ainews-google-io-2026-gemini-35-flash>
> Published: 2026-05-20 03:34:17+00:00

# [AINews] Google I/O 2026: Gemini 3.5 Flash, Omni (NanoBanana for Video), Spark (background agents), and Antigravity 2.0

### Google has been busy!

The [full keynote livestream](https://www.youtube.com/watch?v=wYSncx9zLIU&pp=ygUJZ29vZ2xlIGlv) was 2 hours, but as usual, The Verge has the best supercut down to 30 mins, which is very worthwhile to get a narrative sense:

The mainline Gemini 3.5 Flash is GA today (very nice compared to some staged rollouts) and is sold as a decent step up even compared to 3.1 Pro, with 3.5 Pro coming next month. Perhaps more impressive were the Gemini Live (Voice) and Omni (Video) and Google Pics/Flow (Images/VFX/music) modalities, where Google demonstrated industry leading capabilities and latency, all presumably made possible by industry leading hardware and models.

Per longstanding tradition at every bigtech keynote these days, Google also showed off some smart glasses tech, which seems a little more likely to be seen on the street than many prior iterations from both Google and their peers.

AI News for 5/18/2026-5/19/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

**AI Twitter Recap**

**Google used I/O to reposition Gemini as both a consumer AI surface and a developer/agent platform, with three core technical announcements: Gemini 3.5 Flash for fast agentic/coding workloads, Gemini Omni for multimodal generation/editing starting with video, and a broader Antigravity agent stack spanning desktop/CLI/SDK/API.** Official posts emphasized scale — Google says it now processes **over 3.2 quadrillion tokens/month**, up **7x YoY** from **480T/month**, while the Gemini app has **900M+ monthly users** and is available in **230+ countries and 70+ languages** ([Google](https://x.com/Google/status/2056783102085640252), [Google](https://x.com/Google/status/2056783643381543253), [GeminiApp](https://x.com/GeminiApp/status/2056799446684578250)). The most technically substantive release was **Gemini 3.5 Flash**, framed by Google as its strongest agentic/coding model yet, **GA immediately**, with **1M-token context**, **65k max output**, **4 thinking levels** (“minimal/low/medium/high”), and “thought preservation” across turns ([GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056787987774816525), [Google](https://x.com/Google/status/2056788266872140232), [_philschmid](https://x.com/_philschmid/status/2056794978517750165)). Google paired that with **Gemini Omni**, a new family combining Gemini reasoning with generative media, initially via **Omni Flash**, capable of taking **text/image/video/audio inputs** and producing video edits/generation in Gemini, Flow, Shorts, and later APIs ([GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056786446636212467), [Google](https://x.com/Google/status/2056786781992071172), [GeminiApp](https://x.com/GeminiApp/status/2056800579159216202)). Around those models, Google launched or expanded **Antigravity 2.0 desktop**, **CLI**, **SDK**, **Managed Agents in the Gemini API**, Search-native generative UI/coding, **Gemini Spark** background agents on cloud VMs, and a long list of Gemini-app/Workspace/commerce/media integrations ([Google](https://x.com/Google/status/2056789045548896516), [Google](https://x.com/Google/status/2056838495298367773), [Google](https://x.com/Google/status/2056791134295273554)).

**Facts vs. opinions**

**Facts / directly claimed by official or third-party benchmark sources**

Google says it now processes

**3.2 quadrillion tokens/month**, up from** 480 trillion**a year earlier ([Google](https://x.com/Google/status/2056783102085640252)).Google says Gemini has

**900M+ monthly users**([Google](https://x.com/Google/status/2056783643381543253)).Google says Gemini 3.5 Flash is

**GA today** across Gemini app, Search AI Mode, Gemini API, AI Studio, Antigravity, Android Studio, and enterprise surfaces ([Google](https://x.com/Google/status/2056791527314387208),[GeminiApp](https://x.com/GeminiApp/status/2056789742910595342)).Google says Gemini 3.5 Flash has

**1M context**,** 65k max output**,** 4 thinking levels**, and “thought preservation” across turns ([_philschmid](https://x.com/_philschmid/status/2056794978517750165)).Google says 3.5 Flash beats Gemini 3.1 Pro on

**Terminal-Bench 2.1**,** GDPval-AA**, and** MCP Atlas**([GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056787990110994511),[Google](https://x.com/Google/status/2056788281317306466)).Google says 3.5 Flash runs

**4x faster than comparable frontier models**, and** up to 12x faster in Antigravity**([Google](https://x.com/Google/status/2056788266872140232),[JeffDean](https://x.com/JeffDean/status/2056793419033588091)).Independent benchmarker Artificial Analysis reports Gemini 3.5 Flash scores

**55** on its Intelligence Index,**+9 vs Gemini 3 Flash**, at**>280 output tok/s**, with** MMMU-Pro 84%**,** GDPval-AA Elo 1656**, and pricing of**$1.50 / $9.00 per 1M input/output tokens**; it also reports the model is** 5.5x costlier**to run than Gemini 3 Flash on its suite and** 75% costlier than Gemini 3.1 Pro**([ArtificialAnlys](https://x.com/ArtificialAnlys/status/2056795055512596817)).Arena reports Gemini 3.5 Flash reached

**#9 overall in Text Arena** and**#9 in Code Arena: Frontend**, scoring** 1507**, a**+70** jump over Gemini 3 Flash, and becoming the top score in its price tier ([arena](https://x.com/arena/status/2056793176720195693)).Google says Gemini Omni Flash is available in Gemini/Flow today for paid users, in Shorts/Create starting this week for free, and via APIs in coming weeks (

[Google](https://x.com/Google/status/2056789307856462061)).Google says Spark runs on

**dedicated Google Cloud virtual machines**, allowing long-running tasks while user devices are closed ([Google](https://x.com/Google/status/2056791134295273554)).Google claims an Antigravity + Gemini 3.5 Flash demo built a functioning OS in

**12 hours** using**93 parallel sub-agents**,** 15k+ model requests**,** 2.6B tokens**, and**< $1K** API credits ([Google](https://x.com/Google/status/2056789235500466273)).Google says Search will use Antigravity + 3.5 Flash to generate

**custom visual tools/simulations** on the fly ([Google](https://x.com/Google/status/2056795269694423065)).

**Opinions / interpretations / skepticism**

Positive takes: “Google is back,” “insane evals for a Flash model,” “world model towards AGI,” “mind blowing” for Search + Antigravity, etc. (

[kimmonismus](https://x.com/kimmonismus/status/2056791681073316071),[Kseniase_](https://x.com/Kseniase_/status/2056798225378783656),[demishassabis](https://x.com/demishassabis/status/2056831486251380783)).Neutral caution: some posters explicitly avoided overhyping due to

**self-reported benchmarks** and noted pricing/perf concerns ([scaling01](https://x.com/scaling01/status/2056794370909593987),[simonw](https://x.com/simonw/status/2056867815605625172)).Negative/skeptical takes focused on:

**Price inflation** relative to earlier Flash models ([enricoros](https://x.com/enricoros/status/2056816088785289481)).Comparisons where

**GPT-5.5-medium** may be smarter/cheaper/faster end-to-end ([scaling01](https://x.com/scaling01/status/2056803273756000721),[scaling01](https://x.com/scaling01/status/2056798645983334890)).Benchmark caveats such as weak

**TerminalBench-Hard**, mediocre** MRCR / ARC-AGI-2**, or not clearly beating Kimi/GLM on some slices ([scaling01](https://x.com/scaling01/status/2056796392899645919),[teortaxesTex](https://x.com/teortaxesTex/status/2056794752167645653),[scaling01](https://x.com/scaling01/status/2056795648742076743)).Product naming/UX confusion around Gemini CLI vs Antigravity CLI and broader interface design criticism (

[zachtratar](https://x.com/zachtratar/status/2056848643580482002),[kchonyc](https://x.com/kchonyc/status/2056826706984337726),[teortaxesTex](https://x.com/teortaxesTex/status/2056788641926509010)).

**Gemini 3.5 Flash: the main technical release**

**Official positioning**

Google/DeepMind repeatedly described **Gemini 3.5 Flash** as the company’s strongest model yet for **agents and coding**, not its absolute flagship intelligence model. It’s meant to sit on the high-speed, high-utility part of the Pareto frontier, powering both Google products and developer workloads ([GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056787987774816525), [Google](https://x.com/Google/status/2056788266872140232), [SundarPichai](https://x.com/sundarpichai/status/2056796893951426705)).

**Technical details and metrics**

From Google and affiliated posts:

**GA availability now**([Google](https://x.com/Google/status/2056791527314387208))** 1M token context window****65k max output tokens****Thinking levels:** minimal, low, medium (**new default**), high** Thought preservation across multi-turn conversations****Text output** Input modalities:

**text, image, video, speech** per Artificial Analysis ([_philschmid](https://x.com/_philschmid/status/2056794978517750165),[ArtificialAnlys](https://x.com/ArtificialAnlys/status/2056795055512596817))Pricing:

**$1.50 / 1M input**,**$9.00 / 1M output**,** 90% discount on cached input**([scaling01](https://x.com/scaling01/status/2056793465715822720),[ArtificialAnlys](https://x.com/ArtificialAnlys/status/2056795055512596817))

Official benchmark claims:

**Terminal-Bench 2.1:****76.2%****GDPval-AA:****1656 Elo****MCP Atlas:****83.6%** Google-quoted multimodal result:

**MMMU-Pro 83.6%** in one engineer post; Artificial Analysis reports**84%**, highest recorded on its setup ([koraykv](https://x.com/koraykv/status/2056795667088204234),[ArtificialAnlys](https://x.com/ArtificialAnlys/status/2056795055512596817))

Speed claims:

Google marketing claim:

**4x faster than comparable frontier models**([Google](https://x.com/Google/status/2056788266872140232))In Antigravity, Google says it is

**up to 12x faster**([JeffDean](https://x.com/JeffDean/status/2056793419033588091),[scaling01](https://x.com/scaling01/status/2056790573961326680))Artificial Analysis observed

**>280 output tok/s** Some discussion cited

**~867 tok/s** in Antigravity-specific optimized serving ([scaling01](https://x.com/scaling01/status/2056790573961326680),[scaling01](https://x.com/scaling01/status/2056791726677782743))

Third-party evaluation:

Artificial Analysis says 3.5 Flash is the

**leader on the intelligence-vs-speed Pareto frontier**, but the economics are notably worse than prior Flash:Intelligence Index

**55****+9** over Gemini 3 FlashHallucination rate reduced to

**61%**, a** 31-point drop**vs Gemini 3 Flash on its omniscience setup** GDPval-AA 1656 Elo****5.5x** costlier than Gemini 3 Flash to run on its benchmark suite**75%** costlier than Gemini 3.1 Pro on the same suite ([ArtificialAnlys](https://x.com/ArtificialAnlys/status/2056795055512596817))

Arena:

**#9 Text Arena****#9 Code Arena: Frontend****1507** score,**+70** over Gemini-3 FlashBetter than Gemini 3.1 Pro across categories in its frontend coding eval (

[arena](https://x.com/arena/status/2056793176720195693),[arena](https://x.com/arena/status/2056803661859479812))

**Implications**

The notable shift is that Google appears to be using a “Flash” label for a model that, in prior cycles, would have been described more like a **high-end product model optimized for deployment** rather than simply a cheap lightweight tier. Several posters called this out directly, arguing Flash is becoming more expensive and possibly absorbing former Pro territory ([enricoros](https://x.com/enricoros/status/2056816088785289481), [simonw](https://x.com/simonw/status/2056867815605625172)).

The strongest technical signal is not “best absolute benchmark model,” but:

**material agentic gains****extreme serving speed****deep integration into product surfaces****tooling built around subagents and long-horizon execution**

That makes 3.5 Flash strategically important even if some competitors still win on raw price-adjusted intelligence in certain third-party comparisons.

**Gemini Omni: multimodal generation/editing as “create anything from any input”**

**What Google announced**

Google introduced **Gemini Omni** as a new family merging Gemini reasoning/world knowledge with Google’s generative media stack, starting with **video** creation and editing. Official messaging described it as “create anything from any input,” but current rollout is narrower:

Inputs:

**text, images, audio, video** Initial output emphasis:

**video** Product availability:

**Gemini app**,** Flow**,** YouTube Shorts/Create**, later** APIs**Current shipping model:

**Gemini Omni Flash**([GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056786446636212467),[Google](https://x.com/Google/status/2056786395067552140),[Google](https://x.com/Google/status/2056789307856462061))

Google/DeepMind claims:

Better

**world understanding** More robust

**physics** Multi-turn editing where scene/character consistency is retained

Ability to “reimagine” user video footage with conversational edits (

[Google](https://x.com/Google/status/2056786888930062369),[Google](https://x.com/Google/status/2056786589175677089))

Rollout specifics:

Paid Gemini users globally in app/Flow “today”

YouTube Shorts/Create rolling out “starting this week” at no cost

APIs for developers/enterprise in coming weeks (

[Google](https://x.com/Google/status/2056789307856462061),[GeminiApp](https://x.com/GeminiApp/status/2056814117047132301))

**Perspectives**

Supportive: users and Google employees described Omni as a major quality step, especially for

**video editing** and consistency ([joshwoodward](https://x.com/joshwoodward/status/2056827449556845051),[fofrAI](https://x.com/fofrAI/status/2056789242274259242),[osanseviero](https://x.com/osanseviero/status/2056863263305105424)).Strategic interpretation: several posters framed Omni as evidence Google is investing in

**world models** and embodied/physical priors, not just text/code competition ([demishassabis](https://x.com/demishassabis/status/2056831486251380783),[jparkerholder](https://x.com/jparkerholder/status/2056789448554062232),[kimmonismus](https://x.com/kimmonismus/status/2056802929957568881)).Skepticism: some UI/output examples drew criticism for looking like “B-tier video game interface” or too polished/template-like (

[teortaxesTex](https://x.com/teortaxesTex/status/2056787895977980172),[shlomifruchter](https://x.com/shlomifruchter/status/2056858151987884087)).

**Context**

Omni matters less as “yet another video model” and more as Google’s attempt to unify:

multimodal understanding,

media editing,

world grounding,

agent interfaces,

and eventually any-input/any-output generation.

This aligns with DeepMind’s long-running world-model agenda and Google’s product distribution advantage.

**Antigravity: Google’s agent OS, not just a coding assistant**

A major underappreciated I/O theme was that Google is no longer presenting agents as a thin wrapper around a chat model. Antigravity is becoming the **execution substrate**.

**What launched / expanded**

**Antigravity 2.0 desktop app**: agent-first desktop with core conversations, artifacts, multi-agent orchestration ([Google](https://x.com/Google/status/2056788868092006891),[Google](https://x.com/Google/status/2056838653855650286))**Antigravity SDK**([Google](https://x.com/Google/status/2056789045548896516))** Managed Agents in Gemini API**: single API call gives an agent plus hosted Linux sandbox; supports Bash/Python/Node, files, browsing, custom markdown-defined skills, repo/GCS mounts ([Google](https://x.com/Google/status/2056838495298367773),[GoogleAIStudio](https://x.com/GoogleAIStudio/status/2056836824686059616),[_philschmid](https://x.com/_philschmid/status/2056836567470362955))Integrations with

**AI Studio**,** Android**,** Firebase**,** Workspace**, web ([Google](https://x.com/Google/status/2056789045548896516),[Google](https://x.com/Google/status/2056837910851449177))One-click export from

**AI Studio to Antigravity**([Google](https://x.com/Google/status/2056838913944424469))Native

**Android app generation** in AI Studio / Android support in Antigravity ([Google](https://x.com/Google/status/2056838230591574098),[AndroidDev](https://x.com/AndroidDev/status/2056841786656711077))

**Technical signaling**

Google’s own demos centered on **parallel sub-agents**, **hosted execution**, **high-frequency iterative loops**, and **artifact-oriented workflows**. Jeff Dean explicitly described 3.5 Flash as a strong engine for “deploy sub-agents that collaborate, run high-frequency iterative loops, and solve real-world problems at scale” ([JeffDean](https://x.com/JeffDean/status/2056793419033588091)).

The marquee proof point:

OS built in

**12h****93** parallel sub-agents**15k+** requests**2.6B** tokens**< $1K** credits ([Google](https://x.com/Google/status/2056789235500466273))

Even if this is mostly a stage-managed benchmark/demo, it reveals the architecture Google wants developers to adopt: **many fast agents over one slow monolithic run**.

**Reactions**

Positive: this is Google’s answer to Codex/Claude Code/OpenClaw/Hermes-style workflows, with a stronger infra story (

[iScienceLuvr](https://x.com/iScienceLuvr/status/2056792158988816767),[theo](https://x.com/theo/status/2056826014739890204)).Critical: branding and product sprawl remain confusing; some users aren’t sure whether they should use Gemini CLI or Antigravity CLI, and Google’s design choices drew complaints (

[kchonyc](https://x.com/kchonyc/status/2056826706984337726),[zachtratar](https://x.com/zachtratar/status/2056848643580482002),[teortaxesTex](https://x.com/teortaxesTex/status/2056788641926509010)).

**Search, Gemini app, and consumer agents**

**Search**

Google announced a redesigned AI-powered Search box, multimodal query support, and the most ambitious consumer-facing move: **Search generating custom visual tools and simulations on the fly** using Antigravity + Gemini 3.5 Flash ([Google](https://x.com/Google/status/2056793802141044786), [Google](https://x.com/Google/status/2056795269694423065)).

It also previewed **information agents** in Search:

persistent monitoring tasks

web/news/social/real-time signals

synthesized updates with links and actions

This is a notable strategic shift: Search moves from retrieval/ranking to **background agentic monitoring + generated applets**.

**Gemini app**

Consumer Gemini updates included:

new “

**Neural Expressive**” design language ([Google](https://x.com/Google/status/2056799862604046663))inline/instant

**Gemini Live** voice ([Google](https://x.com/Google/status/2056800029688352988))**Daily Brief** personalized digest from inbox/calendar/tasks ([Google](https://x.com/Google/status/2056801159071883342),[GeminiApp](https://x.com/GeminiApp/status/2056800978343764238))**Gemini Spark** as a 24/7 personal AI agent on cloud VMs, checking with users before major actions ([Google](https://x.com/Google/status/2056791134295273554),[GeminiApp](https://x.com/GeminiApp/status/2056801918018564538))macOS app + upcoming Spark/voice desktop workflows (

[Google](https://x.com/Google/status/2056802434303869118),[GeminiApp](https://x.com/GeminiApp/status/2056802363269329304))

**Pricing / subscriptions**

Google introduced a new pricing ladder:

This reads as a more aggressive bid for premium power users, especially coders and creators.

**Trust, provenance, and standards**

Google pushed **SynthID** across Search, Gemini, Chrome, and hardware/media surfaces, and announced partnerships with **OpenAI, NVIDIA, Kakao, and ElevenLabs** to bring SynthID to their generated content ([Google](https://x.com/Google/status/2056787498676658576), [Google](https://x.com/Google/status/2056787749965799508)).

That is one of the more consequential standards moves from I/O:

it gives Google a shot at owning part of the provenance layer for generative media;

notably, OpenAI separately announced support for checking OpenAI-generated images via

**SynthID watermark + C2PA credentials**([OpenAI](https://x.com/OpenAI/status/2056793648571011232)).

This was less flashy than Omni/3.5 Flash, but likely more durable if provenance becomes mandatory infrastructure.

**Google’s science and world-model angle**

Several I/O items reinforced that Google does not want to compete only on coding/chat:

**Gemini for Science**: Literature Insights, Hypothesis Generation, Computational Discovery ([GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056808869242826957),[Google](https://x.com/Google/status/2056809034494124118))**Nature** publication links around ERA / Co-Scientist ([GoogleResearch](https://x.com/GoogleResearch/status/2056797037426045105),[GoogleResearch](https://x.com/GoogleResearch/status/2056857494107062718))**Project Genie + Street View grounding**, using ~20 years of maps imagery to create interactive real-location simulations ([Google](https://x.com/Google/status/2056850758029464009),[poolio](https://x.com/poolio/status/2056796361987850705),[bilawalsidhu](https://x.com/bilawalsidhu/status/2056804315721843024))

This broader context explains why some observers interpreted Omni as “world-model progress” rather than just a content tool ([demishassabis](https://x.com/demishassabis/status/2056831486251380783), [jparkerholder](https://x.com/jparkerholder/status/2056798252264018232)).

**Different opinions**

**Bullish / supportive**

Gemini 3.5 Flash viewed as a major leap for a speed-tier model, especially on agentic coding (

[kimmonismus](https://x.com/kimmonismus/status/2056791681073316071),[SundarPichai](https://x.com/sundarpichai/status/2056796893951426705)).Search + Antigravity seen as potentially transformative because Google can deploy generated UI/tools at enormous scale (

[Kseniase_](https://x.com/Kseniase_/status/2056798225378783656),[TheTuringPost](https://x.com/TheTuringPost/status/2056795871098913209)).Omni praised for editing quality and for hinting at a deeper world-model roadmap (

[joshwoodward](https://x.com/joshwoodward/status/2056827449556845051),[kimmonismus](https://x.com/kimmonismus/status/2056802929957568881)).

**Skeptical / opposing**

Concern that Google is leaning on

**self-reported benchmarks**, and independent comparisons still leave room for competitors ([scaling01](https://x.com/scaling01/status/2056794370909593987)).Concern that “Flash” is no longer cheap enough to justify the name; pricing has climbed sharply from prior Flash generations (

[enricoros](https://x.com/enricoros/status/2056816088785289481),[simonw](https://x.com/simonw/status/2056867815605625172)).Some believed

**GPT-5.5-medium** still dominates on a combined smart/cheap/latency basis ([scaling01](https://x.com/scaling01/status/2056803273756000721)).Some benchmark slices imply unevenness — e.g. poor TerminalBench-Hard or middling reasoning metrics despite strong agentic numbers (

[scaling01](https://x.com/scaling01/status/2056796392899645919),[teortaxesTex](https://x.com/teortaxesTex/status/2056794752167645653)).

**Neutral / analytical**

Artificial Analysis gave the strongest balanced take:

**excellent speed-intelligence frontier position**,** substantial agentic gains**, but materially** worse cost**than prior Flash and even higher than 3.1 Pro on their end-to-end suite ([ArtificialAnlys](https://x.com/ArtificialAnlys/status/2056795055512596817)).Arena’s data also supports a “real improvement, not just marketing” conclusion, especially for frontend/code tasks, without claiming category dominance (

[arena](https://x.com/arena/status/2056793176720195693)).

**Why this matters**

**Google now has a coherent deployment story.**

Earlier Gemini cycles often felt benchmark-heavy and product-fragmented. At I/O, Google tied model, infra, tools, APIs, consumer surfaces, and enterprise rollout together.**The center of gravity is shifting from chatbot UX to agent execution.**

The important primitives were not just model IQ: they were**subagents, hosted sandboxes, long-running tasks, generated artifacts, and integration with Search/Workspace/Android**.** Gemini 3.5 Flash suggests “fast enough to orchestrate many agents” may matter more than max benchmark score.**

For coding and tool use, throughput and latency are increasingly product-defining.**Omni reveals Google’s differentiation thesis.**

Google is betting on multimodal/world-grounded systems rather than purely text-centric competition.**Trust/provenance is becoming platform infrastructure.**

SynthID partnerships with OpenAI/NVIDIA/ElevenLabs/Kakao suggest some convergence around content-auth provenance layers.**The biggest unresolved question is economics.**

Technically strong or not, 3.5 Flash drew substantial pushback on cost inflation. If “Flash” is no longer the cheap workhorse tier, Google may win on capability deployment while losing some developer mindshare on predictability and pricing simplicity.

**Talent, Labs, and Ecosystem Moves**

**Karpathy joins Anthropic**: The day’s most engaged AI tweet was[Andrej Karpathy’s announcement](https://x.com/karpathy/status/2056753169888334312)that he has**joined Anthropic** to “get back to R&D.” The tweet dominated discussion, with subsequent speculation from[@scaling01](https://x.com/scaling01/status/2056773883982762114)citing Axios that he’ll work on**RSI/autoresearch** and start a new pretraining-focused effort. While the details remain unconfirmed by Anthropic, the move was widely interpreted as a major talent win for Anthropic.**OpenAI capacity products**: OpenAI announced, a commercial offering that lets customers secure[Guaranteed Capacity](https://x.com/OpenAI/status/2056823271774101907)**long-term compute access** for critical workloads.[Sam Altman](https://x.com/sama/status/2056827105401614656)framed it as a response to a world that will remain**capacity constrained** as models become more useful, offering**discounted tokens for 1–3 year commits**.** GitHub and coding toolchain integrations**:[GitHub](https://x.com/github/status/2056801675042779279)said** Gemini 3.5 Flash**is rolling out in** Copilot**, citing strong tool use, fast response times, and cache efficiency for iterative agentic coding.[Cursor](https://x.com/cursor_ai/status/2056803731367456993)launched integration with**Jira**, allowing cloud agents to take work items and create merge-ready PRs.[Code/VS Code](https://x.com/code/status/2056803208559759447)also announced Gemini 3.5 Flash availability.

**Training Algorithms, Benchmarks, and Agent Evaluation**

**RL/post-training discussion is shifting toward denser credit assignment**:[@nrehiew_](https://x.com/nrehiew_/status/2056751826356297834)argued that the next scalable training breakthrough may build on**GRPO** but with**denser, lower-bias credit assignment**, citing directions like** ECHO**,** Composer2**, self-distillation, and OPD.[@lateinteraction](https://x.com/lateinteraction/status/2056770702175318095)countered with a “pedagogical RL” framing: train a self-teacher that samples**correct and easy-to-follow** rollouts.**Can coding agents do research? Not yet**:[Intology AI](https://x.com/IntologyAI/status/2056764236668493868)released** NanoGPT-Bench**, an autonomous benchmark based on the NanoGPT Speedrun competition, testing whether coding agents can contribute to real AI R&D progress. Their headline result:**Codex, Claude Code, and Autoresearch recover only 9.3% of human progress**, mostly via hyperparameter tuning rather than algorithmic innovation.** Agent harnesses and memory are getting more formalized**:[@omarsar0](https://x.com/omarsar0/status/2056764334181884158)highlighted a 100+ page survey on** code-as-agent-harness**, arguing future systems need to be** executable, inspectable, stateful, and governed**.[François Chollet](https://x.com/fchollet/status/2056777649880752160)made the related point that real tasks are rarely Markovian, so agents without high-fidelity trajectory compression are dramatically less useful.**Verifier quality is emerging as a bottleneck**: Threads from[@Shahules786](https://x.com/Shahules786/status/2056773476585816255)emphasized that scaling agent benchmarks now depends less on adding tasks and more on**improving verifier quality**, citing** SWE-bench Verified**,** OSWorld-Verified**,** ComputerRL**, and** BenchGuard**.

**Science, Biology Models, and Domain-Specific Systems**

**Hugging Face releases Carbon DNA models**: One of the most technically interesting open releases was, a family of generative DNA foundation models. The team says[Carbon](https://x.com/lvwerra/status/2056774820872831234)**Carbon-3B matches Evo2-7B while running 250–275x faster at inference**, enough to process the whole human genome on a single GPU in under two days. The key recipe changes:** deterministic 6-mer tokenization**, a** factorized loss (FNS)**replacing plain cross-entropy late in training, and curated staged mixtures of functional DNA + mRNA data per[@LoubnaBenAllal1](https://x.com/LoubnaBenAllal1/status/2056771927570530475). The release includes**models, training code, evals, data, and a demo**.** Google pushes AI for science as a product category**: Google introduced, a suite of prototypes for researchers:[Gemini for Science](https://x.com/GoogleDeepMind/status/2056808869242826957)**Literature Insights**(paper synthesis via NotebookLM),** Hypothesis Generation**(a Co-Scientist-style multi-agent “idea tournament”), and** Computational Discovery**(built with AlphaEvolve and ERA to generate and score thousands of code variants in parallel). Google Research also noted that**ERA** has now been published in**Nature**([Google Research](https://x.com/GoogleResearch/status/2056797037426045105)).** Specialized pretraining is gaining support**:[@pratyushmaini](https://x.com/pratyushmaini/status/2056780651219804582)pointed to evidence that** early exposure / specialized pretraining**improves robustness to forgetting, arguing that enterprises serious about domain use cases should consider** training custom models from scratch**, not just post-training.

**Safety, Governance, and Monitoring of Internal Agents**

**METR’s first Frontier Risk Report**:[METR](https://x.com/METR_Evals/status/2056800023149760666)published a major new report based on unusually deep access across**Anthropic, Google, Meta, and OpenAI**, including model CoTs and non-public information about capabilities, alignment, and control. The report focuses on whether labs could**lose control of their own internally deployed agents** and includes extensive appendices and transcripts ([METR](https://x.com/METR_Evals/status/2056800047258649049)).**Monitoring internal agents is now an active practice**:[@idavidrein](https://x.com/idavidrein/status/2056800422422265897)described spending a month embedded at Anthropic stress-testing systems designed to detect whether internal AI agents could “go rogue.” A key caveat he noted is that the exercise allowed Anthropic discretion to redact sensitive information, so he frames it as an**exercise rather than a formal audit**.** New safety standards org**:[Steven Adler](https://x.com/sjgadler/status/2056762703033807068)announced** Guidelight**, a new AI safety standards organization co-founded with Page Hedley, releasing its first two standards. While the tweet thread in the dataset is partial, the move is notable as another sign of the field professionalizing around operational standards, not just model evals.

**Top tweets (by engagement)**

**Karpathy joins Anthropic**:[@karpathy](https://x.com/karpathy/status/2056753169888334312)** Google introduces the Gemini 3.5 model series**:[@Google](https://x.com/Google/status/2056788000546386273)** Google DeepMind launches Gemini Omni**:[@GoogleDeepMind](https://x.com/GoogleDeepMind/status/2056786446636212467)** Gemini 3.5 Flash GA for agents and coding**:[@Google](https://x.com/Google/status/2056788266872140232)** OpenAI Guaranteed Capacity**:[@OpenAI](https://x.com/OpenAI/status/2056823271774101907)** Google’s 24/7 personal agent, Gemini Spark**:[@Google](https://x.com/Google/status/2056791134295273554)

**AI Reddit Recap**

**/r/LocalLlama + /r/localLLM Recap**

## Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.
