Realtime voice, speech, and transcription now supported on AI Gateway

Vercel's AI Gateway now supports realtime voice, speech, and transcription models in beta, enabling developers to build voice agents with low-latency audio-in/audio-out capabilities. The platform offers observability, spend controls, and bring-your-own-key support without markup or platform fees, with integration via AI SDK 7.

AI Gateway https://vercel.com/ai-gateway now supports voice and audio models. You can build realtime voice agents, generate speech from text, and transcribe audio to text. This provides the same observability, spend controls, and bring-your-own-key support as text, image, and video models in AI Gateway, with no markup or platform fees. These capabilities are in beta and available via AI SDK https://ai-sdk.dev 7. With realtime support, a single model takes audio in and audio out, so a user can talk and hear a reply back in near real time instead of waiting on a chain of separate models. Capability | What it does | |---|---| | Model listens to the user, works out a response, and speaks it back in a live, low-latency conversation. It can call your tools mid-conversation to look something up or take an action. The | | Generate spoken audio from text, with a selectable voice and output format such as MP3. Use it for voiceovers, audio versions of written content, and spoken responses. | | Transcribe recordings into text, from a file buffer, base64 string, or URL. Use it for voice notes or other transcriptions. | Two ways to get started: Follow the realtime example below or the realtime quickstart https://vercel.com/docs/ai-gateway/getting-started/realtime to add a voice agent to your app. Use the playground https://vercel.com/ai-gateway/models/gpt-realtime-2 . Talk to a realtime model in the browser, no code required, in the AI Gateway Playground. A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it. Add the token route: Then connect from the browser. The useRealtime hook fetches that route and manages the WebSocket connection, microphone capture, and audio playback: You can also try audio models without writing any code. Open the models page https://vercel.com/ai-gateway/models , click into a model https://vercel.com/ai-gateway/models/gpt-realtime-2 , and interact with it right in the browser: Talk to a realtime model to hold a voice conversation Send text and have a transcription model read it back Speak to an audio model and have it transcribe your words For more information on realtime voice https://vercel.com/docs/ai-gateway/modalities/realtime , speech https://vercel.com/docs/ai-gateway/modalities/text-to-speech , and transcription https://vercel.com/docs/ai-gateway/modalities/speech-to-text models on AI Gateway, see the documentation. To view a list of all the supported realtime voice https://vercel.com/ai-gateway/models?capabilities=realtime , speech https://vercel.com/ai-gateway/models?capabilities=speech , and transcription https://vercel.com/ai-gateway/models?capabilities=transcription models on AI Gateway, check the full list here https://vercel.com/ai-gateway/models .