Full Sail on Asynchronous Inference

wpnews.pro

cd /news/ai-infrastructure/full-sail-on-asynchronous-inference · home › topics › ai-infrastructure › article

[ARTICLE · art-39894] src=tomtunguz.com ↗ pub=2026-06-25T00:00Z topic=ai-infrastructure verified=true sentiment=↑ positive

Full Sail on Asynchronous Inference

Sail Research, founded by Neil Movva and Samir Menon, announced a Series A investment from Kleiner Perkins, Redpoint, and Sequoia for its asynchronous inference platform that reduces costs by up to 6x compared to real-time services. The platform uses queueing and spot capacity to optimize throughput for AI agents, serving trillions of tokens in code review, deep research, and cybersecurity.

read2 min views1 publishedJun 25, 2026

Full Sail on Asynchronous Inference — Image: Tomtunguz (auto-discovered)

Today all inference is real-time. A human types, a model responds, & the clock starts over. The infrastructure is built for someone waiting on the other end. Every millisecond of latency costs money because the serving stack optimizes for cold-start, not throughput.

As we built internal AI systems at Theory, we embraced queueing. Parallelize ten agents on a single task, let them run for hours, & the productivity gains are enormous. It is the product of token-maxxing, 1 pushing every dollar of compute to do more work. But the cost was unsustainable.

[Neil Movva](https://www.linkedin.com/in/nmovva)&

[Samir Menon](https://www.linkedin.com/in/samir-menon-27954214b)of Sail Research.

2Neil Movva built one of the fastest LLM inference stacks at Together AI. Samir Menon ran LLMs inside hardware enclaves at Blyss. Both are systems engineers to the core. They were building the system we needed.

As the inference market segments into real-time, near-real-time, & batch, async inference sits in the batch tier & carries a massive cost advantage. 3 The key is model selection & routing.

Sail distributes requests across open models like DeepSeek, Qwen, Kimi, & GLM, picking the cheapest capable model for each task. GLM-5.1 on Sail costs 6x less per token than Anthropic’s Haiku. 4 Wait two minutes instead of two seconds for a code review, & the same token costs 6x less.

Sail uses spot capacity when it is available & fails over to reliable compute when it is not. Fleet-aware orchestration keeps utilization high & cost low.

Real-time stacks reserve capacity per request. Queued stacks pack requests into idle capacity. Different architecture, different economics.

Sailboxes are cloud computers for the bursty rhythm of agents. A sailbox stays alive as long as the agent needs, holds state across the entire task, s when it waits on inference, & resumes in seconds when the response arrives. You pay for active time. No paying for idle.

Sail has served trillions of tokens to customers in code review, deep research, & cybersecurity.2

Today we announced our Series A investment in Sail alongside Kleiner Perkins, Redpoint, & Sequoia.

As agents grow from chat assistants into background workers scanning codebases overnight, enriching every CRM row, processing every document, the vast majority of tokens will flow through a queue. The future runs in the background. We are thrilled to partner with Neil, Samir, & the entire Sail team.

If you’re building agents, get started here. #

Sail Research : Cost Efficiency- Input tokens per dollar, comparing Sail’s GLM-5.1 to Anthropic Haiku 4.5 & OpenAI GPT-5.4-mini.↩︎

source & further reading

tomtunguz.com — original article Defending Against AI-Powered Attackers So You Want to Sell Inference Databricks Widens the Lead on the Yellow Brick Token Path

~/api · this article 200

$curl api.wpnews.pro/v1/news/full-sail-on-asynchronou…

Read original on tomtunguz.com → www.tomtunguz.com/sail-inference-queue/

mentioned entities

Sail Research

Neil Movva

Samir Menon

Kleiner Perkins

Redpoint

Sequoia

Together AI

Blyss

metadata

slugfull-sail-on-asynchronous-inference

topic#ai-infrastructure

secondary4 topics

sentimentpositive

canonicaltomtunguz.com

navigation

← prevsimonw/browser-compat-db

next →12 rules of agentic AI for succe…

── more in #ai-infrastructure 4 stories · sorted by recency

thenextweb.com · 25 Jun · #ai-infrastructure

Sail raises $80M to make AI agents cheaper to run

cryptobriefing.com · 25 Jun · #ai-infrastructure

Sail Research raises $80M to build AI infrastructure for long-running agents

fortune.com · 25 Jun · #ai-infrastructure

A former Apple engineer thinks AI infrastructure is built for the wrong future. Investors just gave him $80 million to fix it

letsdatascience.com · 25 Jun · #ai-infrastructure

Sail Research Raises $80M to Build Agent Infrastructure

── more on @sail research 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required