[AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer

wpnews.pro

a quiet day lets us feature fundraises!

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!

Congrats to all our past guests who reached huge milestones this week:

: $100M ARR and profitable (Turbopuffer our podcast): $250M@$2.2B Series C (Exa our podcast): $355M@$4.7B Series C (Modal our podcast)

We really need to be raising that Latent Space fund soon… but meanwhile.. help us out by taking the 2026 AI Engineering Survey and get >$2k in Notion and Vercel credits and AIE WF tickets!

AI News for 5/20/2026-5/21/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

AI Twitter Recap

Model, Benchmark, and Research Updates: RAEv2, Gated DeltaNet-2, Data Filtering, and Open Math

RAEv2 and representation-first tokenization: Several researchers highlighted** RAEv2as a meaningful follow-on to Representation Autoencoders for unified vision understanding and generation.@1jaskiratsinghsays the update yields>10x faster convergence**, better reconstruction, and better generation, with tests extending to** text-to-image and world models**. A Chinese summary from@recatmusefully extracts the three main findings: summing the lastK encoder layers instead of only the final layer improves both reconstruction and generation without added inference cost;RAE and REPA are complementary across semantics vs. spatial structure; and REPA can be reformulated as an internal self-guidance mechanism, avoiding extra weak-model guidance passes.@sainingxi`ealso points to new evaluation views beyond FID, arguing there is still underexplored headroom in representation-powered pixel decoders.Alternatives to standard attention and tokenizer assumptions: NVIDIA’sdecouplesGated DeltaNet-2** eraseand writeoperations in linear attention with channel-wise gates, outperforming KDAand Mamba-3at 1.3Bparameters on language modeling and commonsense reasoning, with notable long-context retrieval gains on RULER**;@rasbtcalled it one of the more interesting hybrid-attention directions. On tokenization,@NousResearchreleased a controlled study of whysubword tokenization helps, simulating seven hypothesized benefits inside a1.7B byte-level pipeline; onlythree of seven interventions moved validation loss at that scale. Separately,@tatsu_hashimotoreported a surprising scaling result onDCLM: with enough compute, the best data filter may be** no filter**, with projections suggesting the crossover for internet-scale pools lands around** 1e30 FLOPs**; downstream evals appear noisy but directionally consistent (follow-up).Mechanistic interpretability and geometry:@GoodfireAIargues the dominant “models think in curved manifolds, SAEs use straight-line features” critique is only partly right. Their proposed fix is to cluster SAE features byjoint firing patterns, recovering geometry through** feature groups**rather than isolated atoms (thread continuation,post). This is a useful update to the current SAE discourse: not a rejection of sparse features, but a warning that interpretation should move from single features to structured ensembles.Math as an AI research domain: The biggest scientific discussion centered on OpenAI’s reported result on an Erdős unit-distance problem.@markchen90framed it as evidence that mathematics is currently the domain most amenable to AI-assisted research breakthroughs, while@wtgowersnoted that if the reported low human interaction level holds, the result is genuinely interesting. The discourse was immediately shaped by skepticism and benchmark/gameability concerns, with@memecrashesjoking that the result was “outdated not even 3 hours later by a human,” and@cloneofsimopointing out the predictable “goalpost moving” around what counts as legitimate AI mathematics. The interesting technical meta-point is that math continues to function as a relatively legible frontier for AI co-research because outputs can be checked, debated, and extended.

Agents, Harnesses, and Developer Tooling: Codex, Gemini, Devin, and Agent Infrastructure

Harnesses are still a major source of capability gains:@lvwerrareleased** physics-intern**, a science-problem harness that boosts models like** Gemini 3.1 Pro from 17.7 to 31.4**, surpassing** GPT 5.5 Proin that setup. The notable nuance is that GPT 5.5 Pro itself did notbenefit from the harness, suggesting model-specific absorption of scaffolding tricks. In the same spirit,@KLieretmademini-swe-agent** runnable onProgramBench, explicitly aiming to improve harness innovation around software engineering agents.** Agent design patterns are maturing from “single agent first” to explicit subagent orchestration**:@cwolferesearchgives a practical synthesis: start with** single-agent systems**, and only move to** manager/sub-agentor decentralized multi-agent topologies when tool sprawl or prompt bloat becomes unmanageable. That advice lines up with more operational observations from users of subagents:@andrew_lockedescribes Cognition’s sub-Devin workflow as a step change, compressing what previously looked like2+ engineer-weeks** into a couple of hours.Codex shipped a substantial product layer on top of the model: OpenAI’s “Codex Thursday” updates matter less as standalone features than as signs of where coding agents are going.@OpenAIDevslaunchedAppshots, which capture both screenshot and text from Mac app windows for richer working context; they also added** team plugin sharing**(link) and more detailed** org analytics**(link). The more important systems shift is remote computer use:@OpenAIDevssays Codex can now securely use apps on your Macfrom your phone even when the Mac is locked. This is a strong signal that the agent product surface is moving from chat IDEs to persistent cross-device operator workflows.Gemini’s agent/tool story is broadening quickly:@OfficialLoganKhighlighted that** Gemini 3.5 Flashranks#1 on APEX-Agents-AA**, outperforming larger models. On the applied side,@_philschmidshows a GitHub issue triage agent built with asingle Gemini API call and no orchestration framework, while@skalskip92demonstrates Gemini 3.5 Flash replacing a custom vision pipeline for lane/car reasoning with one multimodal API call. Google also expanded action surfaces:Daily Brief(announcement) and connected-app actions with** OpenTable, Canva, and Instacart**(announcement) are essentially consumer-facing agent workflows.** Developer infra is converging around retrieval, streaming, sandboxes, and security boundaries**: Weaviate shipped a built-in** MCP serverinside the database so coding agents can ingest a repo and use hybrid BM25 + vector retrievalwithout extra processes (announcement). LangChain introduced both asandbox Auth Proxy** for controlling agent-world boundaries (announcement) and a newtyped streaming protocol for rendering tools, subagents, media, and interrupts as first-class projections rather than token streams (overview). vLLM’sElastic Expert Parallelism is also notable systems work:@vllm_projectdescribes live resizing of MoEDP/EP topology without full restarts, using direct GPU-to-GPU transfers overNVLink/RDMA—important not just for scaling but for future fault-tolerant serving.

Infrastructure, Compute, and AI Business Signals: Modal, Turbopuffer, Hark, and the Compute Race

The infra layer had one of its clearest “this is where the money is” days:@Sirupsensaid** turbopuffercrossed$100M run-rate** in March, just19 months after $1M, while being** profitableand raising< $1M**. The company’s positioning is straightforward and timely: frontier teams know “the magic happens with AI when it draws in just the right context,” which turns a lot of product differentiation into asearch/retrieval problem(follow-up). That aligns with broader sentiment from@swyxthat “boring” AI infrastructure, not only glamorous frontier research, is where wealth creation is accruing.Modal raised big and continues to look like a core AI cloud winner:@bernhardssonannounced a**$355M Series C at a $4.65B valuation**. Investors and users emphasized the same thesis: rebuilding the cloud stack for AI workloads from the ground up, with strong performance and developer experience (Redpoint,user endorsement). This sits alongside other signals that agent-native compute is emerging as its own category;@latentspacepodsummarized Daytona’s pitch around60ms sandboxes,** 50K startups in 75 seconds**, and RL/evals workloads now representing roughly** halfof usage. Compute remains the strategic bottleneck, and the market appears tiered**:@AymericRouchersketched a useful compute taxonomy:** US leaders**(OpenAI, Anthropic, Google, with Meta/xAI joining) in the** multi-gigawattclass; Chinese giantsscaling from hundreds of MW toward multi-GW, increasingly on domestic stacks; and European contenderssuch as Mistral at around 90 MWtoday aiming for 1 GW by 2029**. The exact numbers are debatable, but the framing is consistent with@EpochAIResearch, which notes that even if OpenAI kicked off the recent compute buildout, frontier labs still use well under all global compute capacity, leaving open the question of how much further the buildout can accelerate. Component economics also continue to shift toward memory:@EpochAIResearchreportsHBM grew from52% to 63% of total AI chip component spending from Q1 2024 to Q4 2025.Capital is flowing to interface/hardware bets as well as infra:@adcock_brettannounced** Harkraised$700M at a $6B valuation**, aimed at GPU infrastructure, future model development, hardware, and multimodal/personal intelligence products. The details are sparse beyond hiring areas—foundation models, infra, speech, computer-use agents, hardware—but the size of the raise shows investor appetite for vertically integrated AI-device bets. Hark also reported a200-hour uninterrupted autonomous run forF.03(announcement), though without enough technical detail yet to evaluate the underlying robotics stack.

Multimodal, Video, Biology, and Robotics: Runway, Carbon, Earth Models, and Open Humanoids

Video editing and generation are getting more compositional: Runway launched** Aleph 2.0and the new Edit Studio**, letting users edit a single frame and propagate that edit through the rest of the video (Runway,product lead). This is a practical productization of the “reference-guided edit propagation” problem that multimodal builders care about. Separately, Alibaba researchers’MIGA was flagged by@HuggingPapersas atrain-free method forinfinite-frame video generation with a two-stage alignment mechanism for temporal consistency. On the open-source avatar side, Meituan releasedLongCat-Video-Avatar 1.5 withWhisper-Large replacing Wav2Vec2,8-step inference, long-video identity consistency, and broader stylized-domain generalization (announcement).Foundation models for biology and Earth observation continue to become more usable: Hugging Face Bio’s** CarbonDNA model family got follow-on demos and infra validation.@LoubnaBenAllal1highlighted applications insequence design, variant effect prediction, and learned representations**, while@Shekswessshowed** Carbon-500M, 3B, and 8Bcompiling and running on a single Trainium2 trn2.3xlargewith NxD Inference on day one. For geospatial modeling,@cgeorgiawreportedOlmoEarth v1.1** is3x cheaper/faster by changing the tokenization of multi-resolution Sentinel-2 inputs into3x fewer tokens, exploiting the quadratic compute savings.** Open robotics is getting more buildable**: Hugging Face’s** LeRobot Humanoiddrew attention as a genuinely full-stack open release rather than a showcase demo.@robotsdigestand@lukas_m_zieglerboth emphasize the same package: roughly$2.5k**,** 3D-printed**, complete hardware/CAD, calibration/runtime, simulation, identification tools, and training pipelines. The key point is not just affordability; it’s repairability and iteration speed for real robot learning workflows.

Top tweets (by engagement) OpenAI / Codex product expansion:Codex can securely use apps on your Mac from your phone, even when the Mac is locked, plusAppshotsfor richer app context.Infrastructure winners:turbopuffer at $100M run-rate, profitable, < $1M raised;Modal raises $355M Series C at $4.65B;Hark raises $700M at $6B.Research discussions with broad technical resonance:OpenAI’s Erdős-related math result discussion;RAEv2 release;“no filter” scaling result for LM data curation.Agent capability trendlines:Gemini 3.5 Flash tops APEX-Agents-AA;Gemma 4 E4B driving an iOS simulator on-device via Argent;Devin for Windows.

AI Reddit Recap

Keep reading with a 7-day free trial #

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.

source & further reading

latent.space — original article Why AI Infrastructure must evolve for Agent Experience — Akshat Bubna, Modal CTO [AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI [AINews] The Field Guide to Fable

[AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer

a quiet day lets us feature fundraises!

Keep reading with a 7-day free trial #

Run your AI side-project on zahid.host