# Realtime voice, speech, and transcription now supported on AI Gateway

> Source: <https://vercel.com/changelog/realtime-voice-speech-and-transcription-now-supported-on-ai-gateway>
> Published: 2026-06-29 00:00:00+00:00

[AI Gateway](https://vercel.com/ai-gateway) now supports voice and audio models. You can build realtime voice agents, generate speech from text, and transcribe audio to text. This provides the same observability, spend controls, and bring-your-own-key support as text, image, and video models in AI Gateway, with no markup or platform fees. These capabilities are in beta and available via [AI SDK](https://ai-sdk.dev) 7.

With realtime support, a single model takes audio in and audio out, so a user can talk and hear a reply back in near real time instead of waiting on a chain of separate models.

Capability | What it does |
|---|---|
| Model listens to the user, works out a response, and speaks it back in a live, low-latency conversation. It can call your tools mid-conversation to look something up or take an action. The |
| Generate spoken audio from text, with a selectable voice and output format such as MP3. Use it for voiceovers, audio versions of written content, and spoken responses. |
| Transcribe recordings into text, from a file buffer, base64 string, or URL. Use it for voice notes or other transcriptions. |

**Two ways to get started:**

Follow the realtime example below or the [realtime quickstart](https://vercel.com/docs/ai-gateway/getting-started/realtime) to add a voice agent to your app.

Use the [playground](https://vercel.com/ai-gateway/models/gpt-realtime-2). Talk to a realtime model in the browser, no code required, in the AI Gateway Playground.

A voice agent has two pieces: a server route that mints a short-lived token, so your API key never reaches the client, and a browser component that connects with it.

Add the token route:

Then connect from the browser. The `useRealtime`

hook fetches that route and manages the WebSocket connection, microphone capture, and audio playback:

You can also try audio models without writing any code. Open the [models page](https://vercel.com/ai-gateway/models), click into a [model](https://vercel.com/ai-gateway/models/gpt-realtime-2), and interact with it right in the browser:

Talk to a realtime model to hold a voice conversation

Send text and have a transcription model read it back

Speak to an audio model and have it transcribe your words

For more information on [realtime voice](https://vercel.com/docs/ai-gateway/modalities/realtime), [speech](https://vercel.com/docs/ai-gateway/modalities/text-to-speech), and [transcription](https://vercel.com/docs/ai-gateway/modalities/speech-to-text) models on AI Gateway, see the documentation. To view a list of all the supported [realtime voice](https://vercel.com/ai-gateway/models?capabilities=realtime), [speech](https://vercel.com/ai-gateway/models?capabilities=speech), and [transcription](https://vercel.com/ai-gateway/models?capabilities=transcription) models on AI Gateway, check the full list [here](https://vercel.com/ai-gateway/models).
