Sail raises $80M to make AI agents cheaper to run

Sail Research has raised $80 million in seed and Series A funding at a $450 million valuation to develop AI inference infrastructure optimized for agents. The startup claims its engine can serve tokens at up to 10 times lower cost than rivals by prioritizing throughput over latency. Founded by former Apple and NVIDIA engineers, Sail aims to make AI agents economically viable for long-running tasks.

Sail Research has raised $80m to make AI agents cheaper to run. The startup, founded by ex-Apple and ex-NVIDIA engineers, says it can serve the tokens agents burn through at up to 10 times lower cost. AI agents are hungry. Leave one running for hours and it can chew through billions of tokens on a single task. That gets expensive fast, and the bill is what stops many agents from leaving the lab. A new startup called Sail Research thinks it can fix the economics. Sail has raised $80m https://www.sailresearch.com/blog/sail-raises-80m in combined seed and Series A funding at a $450m valuation. Sequoia led the seed round and Kleiner Perkins led the Series A. Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A and Abstract Ventures also joined. The angel list reads like its own headline. It includes John Hennessy, the chairman of Alphabet, Lip-Bu Tan, the chief executive of Intel, and Tri Dao, the chief scientist at Together AI. The San Francisco company also drew angels from Anthropic, OpenAI, SpaceX and Thinking Machines. Built for agents, not people Sail’s pitch starts with a simple observation. Engineers built today’s AI infrastructure for a human waiting at a prompt. That user wants one thing above all: speed. An agent is different. It works on its own for hours or days, and it cares about scale, reliability and cost. That gap is the whole opportunity. A person needs a fast reply. An agent needs to sustain thousands of calls over a long stretch without the price spiralling. Sail argues the existing stack optimises for the wrong thing. “Most inference infrastructure was designed to minimise latency on a single request, but that’s the wrong optimisation for agents,” said Samir Menon, co-founder and chief technology officer. Agents, he says, need to hold throughput across thousands of concurrent calls over hours. Sail rebuilt the stack around that constraint. The thesis has a name. Sail calls it “abundant intelligence,” the idea that the more compute and context an agent gets, the better its work. The job is to make that compute cheap enough to hand over freely. How it claims to cut the cost Sail sells two things. First comes the inference engine. Sail rebuilt it for throughput, not speed, to serve agents spending billions of tokens on one task. The company claims it delivers up to 10 times lower cost per token than rivals. The second is a sandbox it calls Sailboxes. These environments run for hours or days, not seconds. Crucially, they only charge for the time an agent is actually working, which trims the dead-time costs that pile up on long tasks. The savings come from squeezing the whole stack. Sail customises open-source inference engines to push GPU performance toward the frontier. It spreads workloads across providers for resilience. It also hunts for cheap, underused compute wherever it sits. There is a benchmark to point to. Sail says its inference topped BrowseComp-Plus, a deep-research evaluation. It hit 90.72% accuracy at up to 10 times lower cost than leading alternatives. The platform also plugs in easily. Its API works with existing OpenAI workflows and supports open models including DeepSeek, Gemma, GLM, Kimi and Nemotron. The founders and the bet The team comes from the hardware side of AI. Co-founder and chief executive Neil Movva spent years at NVIDIA pushing GPU performance to its limits, then worked on infrastructure at Apple and Together AI. Menon also comes from Apple, where he built systems at large scale. That background shapes the product. Sail’s edge, the founders argue, comes from tight integration all the way from the silicon to the API. Control the full path and you can open up the trade-off between cost and latency in a way a single layer cannot. “Sail exists to make intelligence abundant,” Movva said. “Every decision we make, from the chip level to the API, is about giving teams the tokens, the scale, and the runtime to build agents without limits.” The framing is deliberately big. The company wants to sound like plumbing for a much larger future. Kleiner Perkins is buying the premise. “The infrastructure layer for the agent era is one of the most important bets in AI right now,” said partner Aditya Naganath. He praised the founders’ mix of compute expertise and systems rigour, the kind that comes from building at the limits of scale. A crowded, costly market The timing fits a clear trend. Inference, the cost of actually running a model, has become the most valuable layer in AI infrastructure. Nebius recently paid $643m for the 20-person startup Eigen AI https://thenextweb.com/news/nebius-eigen-ai-inference-optimization , a sign of how badly the industry wants people who can make chips produce more tokens for less. The money is chasing a real problem. Token prices have collapsed, yet enterprise AI bills https://thenextweb.com/news/token-prices-fell-98-enterprise-ai-bills-tripled-now-the-industry-wants-a-standards-body-to-explain-why have tripled, because agents consume so many more tokens per task. Cutting the price per token is one of the few levers that bends the curve back down. Sail is not alone in pulling it. Others attack the same cost from different angles. Fractile is building inference chips https://thenextweb.com/news/fractile-220m-inference-chip as an alternative to NVIDIA, while GPU clouds like RunPod https://thenextweb.com/news/runpod-100m-summit-partners-1bn-valuation rent raw compute by the hour. The layer is filling up fast. The capital backs that up. Inference specialist Baseten https://thenextweb.com/news/baseten-13-billion-valuation-blackbird recently raised $1.5bn at a valuation as high as $13bn. Against those numbers, Sail’s $450m valuation looks modest, which leaves it plenty of room to grow if the thesis holds. The open question The backdrop is enormous. Forecasters expect global AI spending to hit $2.5tn in 2026, yet the most ambitious agent workloads remain out of reach for most companies. Sail wants to be the reason that changes. It already has paying customers to point to. The web-data firm Parallel, the code-review platform Detail.dev and the startup Jack and Jill all run on Sail. Detail.dev says it has pushed trillions of tokens through the platform and likes the economics. The risk is that efficiency is a moving target. Every rival is chasing the same 10x, and frontier labs keep cutting their own prices. A cost edge built on clever engineering can erode as the whole field gets cheaper. Sail is betting its full-stack approach is harder to copy than a single trick. If agents really do become the main way AI gets used, the company that makes them affordable to run could matter enormously. Whether that company is Sail, at the scale of trillions of tokens, is the question this round leaves open. Get the TNW newsletter Get the most important tech news in your inbox each week.