cd /news/ai-products/realtime-voice-speech-and-transcript… · home topics ai-products article
[ARTICLE · art-43622] src=vercel.com ↗ pub= topic=ai-products verified=true sentiment=↑ positive

Realtime voice, speech, and transcription now supported on AI Gateway

Vercel's AI Gateway now supports realtime voice, speech, and transcription models in beta, enabling developers to build voice agents with low-latency audio-in/audio-out capabilities. The platform offers observability, spend controls, and bring-your-own-key support without markup or platform fees, with integration via AI SDK 7.

read2 min views1 publishedJun 29, 2026

AI Gateway now supports voice and audio models. You can build realtime voice agents, generate speech from text, and transcribe audio to text. This provides the same observability, spend controls, and bring-your-own-key support as text, image, and video models in AI Gateway, with no markup or platform fees. These capabilities are in beta and available via AI SDK 7.

With realtime support, a single model takes audio in and audio out, so a user can talk and hear a reply back in near real time instead of waiting on a chain of separate models.

Capability What it does
Model listens to the user, works out a response, and speaks it back in a live, low-latency conversation. It can call your tools mid-conversation to look something up or take an action. The
Generate spoken audio from text, with a selectable voice and output format such as MP3. Use it for voiceovers, audio versions of written content, and spoken responses.
Transcribe recordings into text, from a file buffer, base64 string, or URL. Use it for voice notes or other transcriptions.

Two ways to get started:

Follow the realtime example below or the realtime quickstart to add a voice agent to your app.

Use the playground. Talk to a realtime model in the browser, no code required, in the AI Gateway Playground. A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it.

Add the token route:

Then connect from the browser. The useRealtime

hook fetches that route and manages the WebSocket connection, microphone capture, and audio playback:

You can also try audio models without writing any code. Open the models page, click into a model, and interact with it right in the browser:

Talk to a realtime model to hold a voice conversation

Send text and have a transcription model read it back

Speak to an audio model and have it transcribe your words

For more information on realtime voice, speech, and transcription models on AI Gateway, see the documentation. To view a list of all the supported realtime voice, speech, and transcription models on AI Gateway, check the full list here.

── more in #ai-products 4 stories · sorted by recency
── more on @vercel 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/realtime-voice-speec…] indexed:0 read:2min 2026-06-29 ·