OpenAI WebRTC Audio Session, now with document context

OpenAI's GPT-Realtime-2 model, promoted as having GPT-5-class reasoning, is now available in a WebRTC audio playground with document context support, enabling conversational audio interactions in the browser. The tool allows users to select the new model and paste document text for spoken discussions, though the model has not yet appeared in the ChatGPT iPhone app.

OpenAI WebRTC Audio Session, now with document context https://tools.simonwillison.net/openai-webrtc Last month OpenAI introduced a brand new model https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/ to that API called GPT‑Realtime‑2 https://developers.openai.com/api/docs/models/gpt-realtime-2 , which they promoted as "our first voice model with GPT‑5‑class reasoning" - with a Sep 30, 2024 knowledge cut-off. I've been waiting for that model to show up in the ChatGPT iPhone app but it still hasn't, so I revisited my old playground. You can now pick the better model, and you can also paste in a big chunk of document context so you can have as audio conversation in your browser about whatever information you think would be useful to explore in a conversational way. Tags: audio https://simonwillison.net/tags/audio , tools https://simonwillison.net/tags/tools , ai https://simonwillison.net/tags/ai , openai https://simonwillison.net/tags/openai , generative-ai https://simonwillison.net/tags/generative-ai , llms https://simonwillison.net/tags/llms , multi-modal-output https://simonwillison.net/tags/multi-modal-output , webrtc https://simonwillison.net/tags/webrtc