OpenAI Shipped Your Voice Stack at $0.25/Min. Vapi Went Enterprise. The Infra Layer Abandoned Agencies in Eleven Days.

wpnews.pro

On May 7, OpenAI made gpt-realtime-2 generally available in the Realtime API at roughly $0.25 to $0.35 per minute of conversation, with GPT-5-class reasoning, 128K context, native translation, and 70+ languages. Coverage: TechCrunch, OpenAI's announcement, and a clean teardown at DataCamp. Five days later, on May 12, Vapi closed a $50M Series B at a $500M valuation and publicly positioned itself as the enterprise infrastructure layer, with Amazon Ring running 100% of inbound through them. Two days after that, Synthflow's enterprise page started leading with two deal proof points: a $230M multinational BPO operator running 40+ branded agents at 600K calls per month, and a top U.S.-based CRM platform white-labeling Synthflow at 500K calls per month, both inside 60 days. The full receipts are on Synthflow's enterprise comparison page.

Eleven days. Three events. One outcome. The middle of the AI voice stack collapsed, and an agency owner with five clients is the only customer left in the room nobody is engineering for.

For the last 18 months, the canonical AI voice agency stack was a wrapper or infra platform (Retell, Vapi, Voicerr, Synthflow) plus GoHighLevel plus Zapier plus Stripe plus Twilio plus a custom dashboard. The wrapper layer existed because the orchestration was hard. Stitching STT, an LLM, TTS, telephony, turn-taking, barge-in, latency tuning, and call quality together was a four-vendor problem with 200ms of jitter to manage. You paid the wrapper for the integration, not the model. On May 7 the orchestration disappeared. OpenAI now handles speech-to-speech reasoning inside one API call, in one model, with caching that drops the dominant cost line by 80x. The wrapper's reason to exist was the stitching. The stitching just got eaten by the model layer. eWeek summarized the strategic shift cleanly:

"OpenAI's bet is that doing the audio reasoning inside one model is more defensible than stitching three vendors together. Whether ElevenLabs, Deepgram and the rest can hold their wedge depends on how quickly they push their own integrated stacks."

At the same time, the wrappers and infra companies that could not be commoditized from below decided to escape upward. Vapi's TechCrunch story is the public version of this. The internal version is the customer list at the launch of gpt-realtime-2 on the same week: Zillow, Glean, Genspark, Bluejay, Intercom, Priceline, Foundation Health, Deutsche Telekom. Read it aloud. Not one agency. Synthflow's customer list is the same shape: a $230M BPO, a national CRM platform, 600K calls per month, 500K calls per month, dedicated success engineers, procurement-driven roadmaps.

The implication for the agency layer is concrete, not vibes. Your platform's product roadmap is now being written for a customer who pays a six-figure annual contract and signs a master service agreement. Your support ticket about a missing white-label string in a webhook payload sits behind their compliance review. Your feature request for a campaign retry rule sits behind their procurement cycle. That is what "enterprise focus" actually means in the day-to-day. It is not a marketing line. It is the prioritization queue.

Meanwhile the wrapper era's pricing power evaporated in the other direction. Voicerr already moved from $28 per month to $199 to $299 per month, a 7x to 10x hike documented at Trillet, because the upstream cost moved underneath them and they had no other lever. They lost both directions in the same quarter. Pricing power from above, model commoditization from below. A wrapper sitting on top of an infra layer that just became one OpenAI call has nowhere to go.

Three paths are now visible for the agency operator with three to fifteen clients:

Hermes is the operating platform for AI voice agencies. It is the third path above. We are not a wrapper on Vapi. We are not a thin UI on Retell. We are the application layer the infra companies just walked away from.

The pricing stays where it has been since we wrote it down. Starter is $149 per month with 300 included minutes and 3 workspaces. Business is $399 per month with 1,000 included minutes and 7 workspaces. Agency is $699 per month with 2,000 included minutes and 20 workspaces. Overage is $0.24 per minute against a $0.18 landed cost on the new gpt-realtime-2 economics, and we run a multi-model routing strategy on top so the cache hit rate is actually load-bearing instead of theoretical. The 25% spread is locked because we run the upstream relationship. Wrappers cannot promise this. They do not control the cost structure and they have already proved it by raising prices 7 to 10 times.

The application layer is the work product. White-label is native, not bolted on. Every workspace has its own subdomain, custom branding, end-client portal, and per-workspace billing. The CRM is built into the same database as the campaign engine, so a call outcome writes a lead status without a Zapier hop. The campaign engine knows how to retry, schedule callbacks, hand off to a human, and reconcile A2P 10DLC submissions ($30 pass-through, $0 margin). The billing surface knows the difference between an included minute and an overage minute. None of this is a feature roadmap. It is the live platform, and it is the layer OpenAI's API does not ship.

If gpt-realtime-2 is $0.25 to $0.35 per minute and Hermes overage is $0.24 per minute, how does the math even work?

Hermes runs a multi-model routing strategy with cache configuration, system prompt reuse, and per-call telemetry tuned at the platform level. Cached audio input on gpt-realtime-2 drops from $32 per 1M tokens to $0.40 per 1M tokens, which is an 80x reduction on the dominant cost line in a real outbound conversation. Combine that with selective model routing for non-reasoning segments (greeting, clarifying questions, scheduled callbacks, voicemail) and the landed cost lands around $0.18 per minute. We charge $0.24. That is a 25% spread, locked, on top of three included-minute tiers ($149 / 300 min, $399 / 1,000 min, $699 / 2,000 min). An agency owner does not need to engineer cache hits or pick which model handles which conversation segment. The platform does it.

If OpenAI now ships everything in one API call, why does an agency need a platform at all?

Because the platform is no longer the voice. The platform is the CRM, the campaign engine, the workspace structure, the white-label portal, the billing reconciliation, the A2P 10DLC submission, the lead status writebacks, the call attribution, the recording storage, the transcript pipeline, the knowledge base management, the prompt versioning, the retry logic, and the client-facing dashboard. None of that ships in the gpt-realtime-2 endpoint. The voice infrastructure collapsed into one API call. The application layer did not. That is the wedge an agency-built platform fills.

Should I just build this myself on Pipecat plus OpenAI now that the infra is cheap?

If you have a dedicated engineer and four months to burn, yes, the DIY path got cleaner with Pipecat hitting v1.0.0 on April 14. If you have five clients and need new agents live this week, no. The honest math is that a self-built stack on Pipecat plus OpenAI plus Twilio plus a CRM plus a billing surface plus a white-label portal costs $40K to $60K in developer time before you close your sixth retainer, and the maintenance does not stop. The reason platforms still exist is that the application layer is wide, not deep. We built it once. You charge clients $1,500 to $3,000 a month on top. The infrastructure layer did not pivot to enterprise out of malice. It pivoted because the model layer commoditized it from below and the only revenue ramp left is the six-figure logo. Vapi did the right thing for its cap table. Synthflow did the right thing for its cap table. OpenAI did the right thing for its cap table. None of them owe the AI voice agency owner a different roadmap.

The agency layer does the right thing for itself by noticing this eleven-day window early. The wedge that opens when an infra layer graduates is the wedge an application-layer platform built specifically for agencies should occupy. By builders, for builders. One platform. Your brand. Your margins. From $149 per month. First agent live in 72 hours.

Originally published at buildwithhermes.com/blog/infra-layer-abandoned-agencies-eleven-days-2026-05-18.

source & further reading

dev.to — original article Choosing the Right Model-Routing Threshold for Frontier Models MCP Server CORS: The Preflight Problem That Broke My MCP Server 92 Times And How I Fixed It For Good Humanizing Artificial Intelligence for SRE Teams: Reducing Alert Fatigue With Smarter AI Guidance

OpenAI Shipped Your Voice Stack at $0.25/Min. Vapi Went Enterprise. The Infra Layer Abandoned Agencies in Eleven Days.

Run your AI side-project on zahid.host