# [AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer

> Source: <https://www.latent.space/p/ainews-new-ai-infra-unicorns-exa>
> Published: 2026-05-22 05:50:58+00:00

# [AINews] New AI Infra unicorns: Exa, Modal, TurboPuffer

### a quiet day lets us feature fundraises!

*Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!*

Congrats to all our past guests who reached huge milestones this week:

: $100M ARR and profitable ([Turbopuffer](https://x.com/Sirupsen/status/2057470756070781400)[our podcast](https://www.latent.space/p/turbopuffer)): $250M@$2.2B Series C ([Exa](https://exa.ai/blog/announcing-series-c)[our podcast](https://www.latent.space/p/exa)): $355M@$4.7B Series C ([Modal](https://x.com/bernhardsson/status/2057530320790995262?s=12)[our podcast](https://www.latent.space/p/modal))

We really need to be raising that Latent Space fund soon… but meanwhile.. **help us out** by taking the [2026 AI Engineering Survey](https://notion.qualtrics.com/jfe/form/SV_bP07tSVMXH7ePCS) and get >$2k in Notion and Vercel credits and [AIE WF tickets](https://ai.engineer/wf)!

AI News for 5/20/2026-5/21/2026. We checked 12 subreddits,

[544 Twitters]and no further Discords.[AINews’ website]lets you search all past issues. As a reminder,[AINews is now a section of Latent Space]. You can[opt in/out]of email frequencies!

**AI Twitter Recap**

**Model, Benchmark, and Research Updates: RAEv2, Gated DeltaNet-2, Data Filtering, and Open Math**

**RAEv2 and representation-first tokenization**: Several researchers highlighted** RAEv2**as a meaningful follow-on to Representation Autoencoders for unified vision understanding and generation.[@1jaskiratsingh](https://x.com/1jaskiratsingh/status/2057568174590304421)says the update yields**>10x faster convergence**, better reconstruction, and better generation, with tests extending to** text-to-image and world models**. A Chinese summary from[@recatm](https://x.com/recatm/status/2057456332861567359)usefully extracts the three main findings: summing the last**K encoder layers** instead of only the final layer improves both reconstruction and generation without added inference cost;**RAE and REPA are complementary** across semantics vs. spatial structure; and REPA can be reformulated as an internal self-guidance mechanism, avoiding extra weak-model guidance passes.[@sainingxi`e](https://x.com/sainingxie/status/2057595509519311077)also points to new evaluation views beyond FID, arguing there is still underexplored headroom in representation-powered pixel decoders.**Alternatives to standard attention and tokenizer assumptions**: NVIDIA’sdecouples[Gated DeltaNet-2](https://x.com/ahatamiz1/status/2057586630450610673)** erase**and** write**operations in linear attention with channel-wise gates, outperforming** KDA**and** Mamba-3**at** 1.3B**parameters on language modeling and commonsense reasoning, with notable long-context retrieval gains on** RULER**;[@rasbt](https://x.com/rasbt/status/2057599925878169761)called it one of the more interesting hybrid-attention directions. On tokenization,[@NousResearch](https://x.com/NousResearch/status/2057610978934546805)released a controlled study of why**subword tokenization** helps, simulating seven hypothesized benefits inside a**1.7B byte-level** pipeline; only**three of seven** interventions moved validation loss at that scale. Separately,[@tatsu_hashimoto](https://x.com/tatsu_hashimoto/status/2057489411768803526)reported a surprising scaling result on**DCLM**: with enough compute, the best data filter may be** no filter**, with projections suggesting the crossover for internet-scale pools lands around** 1e30 FLOPs**; downstream evals appear noisy but directionally consistent ([follow-up](https://x.com/tatsu_hashimoto/status/2057489440273322447)).**Mechanistic interpretability and geometry**:[@GoodfireAI](https://x.com/GoodfireAI/status/2057487848258101551)argues the dominant “models think in curved manifolds, SAEs use straight-line features” critique is only partly right. Their proposed fix is to cluster SAE features by**joint firing patterns**, recovering geometry through** feature groups**rather than isolated atoms ([thread continuation](https://x.com/GoodfireAI/status/2057487927089954962),[post](https://x.com/GoodfireAI/status/2057487939836502461)). This is a useful update to the current SAE discourse: not a rejection of sparse features, but a warning that interpretation should move from single features to structured ensembles.**Math as an AI research domain**: The biggest scientific discussion centered on OpenAI’s reported result on an Erdős unit-distance problem.[@markchen90](https://x.com/markchen90/status/2057517045575774598)framed it as evidence that mathematics is currently the domain most amenable to AI-assisted research breakthroughs, while[@wtgowers](https://x.com/wtgowers/status/2057536069218742518)noted that if the reported low human interaction level holds, the result is genuinely interesting. The discourse was immediately shaped by skepticism and benchmark/gameability concerns, with[@memecrashes](https://x.com/memecrashes/status/2057478155246440929)joking that the result was “outdated not even 3 hours later by a human,” and[@cloneofsimo](https://x.com/cloneofsimo/status/2057486750004756524)pointing out the predictable “goalpost moving” around what counts as legitimate AI mathematics. The interesting technical meta-point is that math continues to function as a relatively legible frontier for AI co-research because outputs can be checked, debated, and extended.

**Agents, Harnesses, and Developer Tooling: Codex, Gemini, Devin, and Agent Infrastructure**

**Harnesses are still a major source of capability gains**:[@lvwerra](https://x.com/lvwerra/status/2057476832664953225)released** physics-intern**, a science-problem harness that boosts models like** Gemini 3.1 Pro from 17.7 to 31.4**, surpassing** GPT 5.5 Pro**in that setup. The notable nuance is that GPT 5.5 Pro itself did** not**benefit from the harness, suggesting model-specific absorption of scaffolding tricks. In the same spirit,[@KLieret](https://x.com/KLieret/status/2057471442066030795)made**mini-swe-agent** runnable on**ProgramBench**, explicitly aiming to improve harness innovation around software engineering agents.** Agent design patterns are maturing from “single agent first” to explicit subagent orchestration**:[@cwolferesearch](https://x.com/cwolferesearch/status/2057486293882282293)gives a practical synthesis: start with** single-agent systems**, and only move to** manager/sub-agent**or decentralized multi-agent topologies when tool sprawl or prompt bloat becomes unmanageable. That advice lines up with more operational observations from users of subagents:[@andrew_locke](https://x.com/andrew_locke/status/2057537633555993058)describes Cognition’s sub-Devin workflow as a step change, compressing what previously looked like**2+ engineer-weeks** into a couple of hours.**Codex shipped a substantial product layer on top of the model**: OpenAI’s “Codex Thursday” updates matter less as standalone features than as signs of where coding agents are going.[@OpenAIDevs](https://x.com/OpenAIDevs/status/2057530207976989179)launched**Appshots**, which capture both screenshot and text from Mac app windows for richer working context; they also added** team plugin sharing**([link](https://x.com/OpenAIDevs/status/2057530212339097994)) and more detailed** org analytics**([link](https://x.com/OpenAIDevs/status/2057530213974814844)). The more important systems shift is remote computer use:[@OpenAIDevs](https://x.com/OpenAIDevs/status/2057536706778378692)says Codex can now securely use apps on your Mac**from your phone even when the Mac is locked**. This is a strong signal that the agent product surface is moving from chat IDEs to persistent cross-device operator workflows.**Gemini’s agent/tool story is broadening quickly**:[@OfficialLoganK](https://x.com/OfficialLoganK/status/2057460544643404125)highlighted that** Gemini 3.5 Flash**ranks**#1 on APEX-Agents-AA**, outperforming larger models. On the applied side,[@_philschmid](https://x.com/_philschmid/status/2057513254856151339)shows a GitHub issue triage agent built with a**single Gemini API call** and no orchestration framework, while[@skalskip92](https://x.com/skalskip92/status/2057502215506473121)demonstrates Gemini 3.5 Flash replacing a custom vision pipeline for lane/car reasoning with one multimodal API call. Google also expanded action surfaces:**Daily Brief**([announcement](https://x.com/GeminiApp/status/2057500470147698936)) and connected-app actions with** OpenTable, Canva, and Instacart**([announcement](https://x.com/GeminiApp/status/2057550225863246236)) are essentially consumer-facing agent workflows.** Developer infra is converging around retrieval, streaming, sandboxes, and security boundaries**: Weaviate shipped a built-in** MCP server**inside the database so coding agents can ingest a repo and use** hybrid BM25 + vector retrieval**without extra processes ([announcement](https://x.com/weaviate_io/status/2057476556449010024)). LangChain introduced both a**sandbox Auth Proxy** for controlling agent-world boundaries ([announcement](https://x.com/LangChain/status/2057508777759236401)) and a new**typed streaming protocol** for rendering tools, subagents, media, and interrupts as first-class projections rather than token streams ([overview](https://x.com/bromann/status/2057507753191518602)). vLLM’s**Elastic Expert Parallelism** is also notable systems work:[@vllm_project](https://x.com/vllm_project/status/2057602243860574463)describes live resizing of MoE**DP/EP topology** without full restarts, using direct GPU-to-GPU transfers over**NVLink/RDMA**—important not just for scaling but for future fault-tolerant serving.

**Infrastructure, Compute, and AI Business Signals: Modal, Turbopuffer, Hark, and the Compute Race**

**The infra layer had one of its clearest “this is where the money is” days**:[@Sirupsen](https://x.com/Sirupsen/status/2057470756070781400)said** turbopuffer**crossed**$100M run-rate** in March, just**19 months after $1M**, while being** profitable**and raising**< $1M**. The company’s positioning is straightforward and timely: frontier teams know “the magic happens with AI when it draws in just the right context,” which turns a lot of product differentiation into a**search/retrieval problem**([follow-up](https://x.com/Sirupsen/status/2057470791516844188)). That aligns with broader sentiment from[@swyx](https://x.com/swyx/status/2057543654340710556)that “boring” AI infrastructure, not only glamorous frontier research, is where wealth creation is accruing.**Modal raised big and continues to look like a core AI cloud winner**:[@bernhardsson](https://x.com/bernhardsson/status/2057530320790995262)announced a**$355M Series C at a $4.65B valuation**. Investors and users emphasized the same thesis: rebuilding the cloud stack for AI workloads from the ground up, with strong performance and developer experience ([Redpoint](https://x.com/Redpoint/status/2057532087570166134),[user endorsement](https://x.com/mathemagic1an/status/2057534253790097788)). This sits alongside other signals that agent-native compute is emerging as its own category;[@latentspacepod](https://x.com/latentspacepod/status/2057565350187995260)summarized Daytona’s pitch around**60ms sandboxes**,** 50K startups in 75 seconds**, and RL/evals workloads now representing roughly** half**of usage.** Compute remains the strategic bottleneck, and the market appears tiered**:[@AymericRoucher](https://x.com/AymericRoucher/status/2057492189626720729)sketched a useful compute taxonomy:** US leaders**(OpenAI, Anthropic, Google, with Meta/xAI joining) in the** multi-gigawatt**class;** Chinese giants**scaling from hundreds of MW toward multi-GW, increasingly on domestic stacks; and** European contenders**such as Mistral at around** 90 MW**today aiming for** 1 GW by 2029**. The exact numbers are debatable, but the framing is consistent with[@EpochAIResearch](https://x.com/EpochAIResearch/status/2057499893854536185), which notes that even if OpenAI kicked off the recent compute buildout, frontier labs still use well under all global compute capacity, leaving open the question of how much further the buildout can accelerate. Component economics also continue to shift toward memory:[@EpochAIResearch](https://x.com/EpochAIResearch/status/2057531410030997789)reports**HBM** grew from**52% to 63%** of total AI chip component spending from Q1 2024 to Q4 2025.**Capital is flowing to interface/hardware bets as well as infra**:[@adcock_brett](https://x.com/adcock_brett/status/2057462134989263047)announced** Hark**raised**$700M at a $6B valuation**, aimed at GPU infrastructure, future model development, hardware, and multimodal/personal intelligence products. The details are sparse beyond hiring areas—foundation models, infra, speech, computer-use agents, hardware—but the size of the raise shows investor appetite for vertically integrated AI-device bets. Hark also reported a**200-hour** uninterrupted autonomous run for**F.03**([announcement](https://x.com/adcock_brett/status/2057651077928145235)), though without enough technical detail yet to evaluate the underlying robotics stack.

**Multimodal, Video, Biology, and Robotics: Runway, Carbon, Earth Models, and Open Humanoids**

**Video editing and generation are getting more compositional**: Runway launched** Aleph 2.0**and the new** Edit Studio**, letting users edit a single frame and propagate that edit through the rest of the video ([Runway](https://x.com/runwayml/status/2057530497597600169),[product lead](https://x.com/iamneubert/status/2057535909524824226)). This is a practical productization of the “reference-guided edit propagation” problem that multimodal builders care about. Separately, Alibaba researchers’**MIGA** was flagged by[@HuggingPapers](https://x.com/HuggingPapers/status/2057506246899724355)as a**train-free** method for**infinite-frame** video generation with a two-stage alignment mechanism for temporal consistency. On the open-source avatar side, Meituan released**LongCat-Video-Avatar 1.5** with**Whisper-Large** replacing Wav2Vec2,**8-step inference**, long-video identity consistency, and broader stylized-domain generalization ([announcement](https://x.com/Meituan_LongCat/status/2057494106889486646)).**Foundation models for biology and Earth observation continue to become more usable**: Hugging Face Bio’s** Carbon**DNA model family got follow-on demos and infra validation.[@LoubnaBenAllal1](https://x.com/LoubnaBenAllal1/status/2057488110263435640)highlighted applications in**sequence design, variant effect prediction, and learned representations**, while[@Shekswess](https://x.com/Shekswess/status/2057468970471448787)showed** Carbon-500M, 3B, and 8B**compiling and running on a single** Trainium2 trn2.3xlarge**with NxD Inference on day one. For geospatial modeling,[@cgeorgiaw](https://x.com/cgeorgiaw/status/2057481909802774664)reported**OlmoEarth v1.1** is**3x cheaper/faster** by changing the tokenization of multi-resolution Sentinel-2 inputs into**3x fewer tokens**, exploiting the quadratic compute savings.** Open robotics is getting more buildable**: Hugging Face’s** LeRobot Humanoid**drew attention as a genuinely full-stack open release rather than a showcase demo.[@robotsdigest](https://x.com/robotsdigest/status/2057507896129380581)and[@lukas_m_ziegler](https://x.com/lukas_m_ziegler/status/2057515219946205399)both emphasize the same package: roughly**$2.5k**,** 3D-printed**, complete hardware/CAD, calibration/runtime, simulation, identification tools, and training pipelines. The key point is not just affordability; it’s repairability and iteration speed for real robot learning workflows.

**Top tweets (by engagement)**

**OpenAI / Codex product expansion**:[Codex can securely use apps on your Mac from your phone, even when the Mac is locked](https://x.com/OpenAIDevs/status/2057536706778378692), plus[Appshots](https://x.com/OpenAIDevs/status/2057530207976989179)for richer app context.**Infrastructure winners**:[turbopuffer at $100M run-rate, profitable, < $1M raised](https://x.com/Sirupsen/status/2057470756070781400);[Modal raises $355M Series C at $4.65B](https://x.com/bernhardsson/status/2057530320790995262);[Hark raises $700M at $6B](https://x.com/adcock_brett/status/2057462134989263047).**Research discussions with broad technical resonance**:[OpenAI’s Erdős-related math result discussion](https://x.com/markchen90/status/2057517045575774598);[RAEv2 release](https://x.com/1jaskiratsingh/status/2057568174590304421);[“no filter” scaling result for LM data curation](https://x.com/tatsu_hashimoto/status/2057489411768803526).**Agent capability trendlines**:[Gemini 3.5 Flash tops APEX-Agents-AA](https://x.com/OfficialLoganK/status/2057460544643404125);[Gemma 4 E4B driving an iOS simulator on-device via Argent](https://x.com/googlegemma/status/2057570113390551452);[Devin for Windows](https://x.com/cognition/status/2057496130225668360).

**AI Reddit Recap**

## Keep reading with a 7-day free trial

Subscribe to Latent.Space to keep reading this post and get 7 days of free access to the full post archives.