{"slug": "building-a-personal-meeting-assistant-that-routes-through-your-existing-audio", "title": "Building a personal meeting assistant that routes through your existing audio", "summary": "Developers built Otto, a local voice assistant that routes meeting audio through existing devices without joining the call, using Deepgram's streaming transcription and text-to-speech APIs. The assistant listens for a wake word, transcribes speech with diarization and endpointing, and responds via natural-sounding Aura voices, making it platform-agnostic and accessible to all meeting participants.", "body_md": "We built Otto, a personal voice assistant that runs locally on your machine. Its primary function is to act as a personal assistant in meetings. AI assistants for meetings aren't a new idea. Google Meet, Zoom, and Teams all have built-in AI features with varying levels of usefulness. Otto is different in two ways: it doesn't have to 'join' the meeting. Audio is routed through your existing audio devices and forked to Otto to give the assistant a clean copy of the meeting audio without needing to know the meeting platform. It's also available to everyone in the meeting. Anyone can invoke it with the wake word and everyone can hear its responses, routed back through your audio system.\n\nWhat makes that work is Deepgram. It transcribes the call as people speak, and sends the assistant's replies back in a natural voice, fast enough to feel like a real conversation. This post is a short tour of how we used Deepgram to build it, and the [full code is on GitHub](https://github.com/ritza-co/otto-call-agent) so you can clone it and try it on your own calls.\n\n[View Otto on GitHub →](https://github.com/ritza-co/otto-call-agent)\n\n## Deepgram features we used\n\nWe use Deepgram for two jobs: turning the call into text as people speak, and turning Otto's answers back into speech.\n\n### Transcribing the call\n\nDeepgram's [Listen API](https://developers.deepgram.com/docs/live-streaming-audio) handles the live transcription over a streaming WebSocket, on the Nova-3 model. We run two streams at once: one for the call's incoming audio (a mix of everyone else on the call) and one for your own microphone. Keeping them separate means we always know whether you or a remote participant said something. A few of its features do the heavy lifting:\n\nlabels who said what, so the transcript reads like a conversation instead of a wall of text.[Diarization](https://developers.deepgram.com/docs/diarization)boosts the wake word \"Otto\", so the assistant reliably hears its name even over a noisy call.[Keyterm prompting](https://developers.deepgram.com/docs/keyterm)detects when someone has finished a thought, so Otto answers promptly without talking over a half-finished sentence.[Endpointing](https://developers.deepgram.com/docs/endpointing)and smart formatting keep the transcript readable as it fills in, with punctuation and capitalization already in place.[Interim results](https://developers.deepgram.com/docs/interim-results)\n\n### Speaking the reply\n\nWhen Otto has an answer, Deepgram's [Speak API](https://developers.deepgram.com/docs/text-to-speech) turns the text into audio, streamed back as it's generated. We use one of the natural-sounding [Aura voices](https://developers.deepgram.com/docs/tts-models) so Otto doesn't sound robotic in the middle of a human conversation. Because the audio streams in rather than arriving all at once, the reply starts playing almost immediately instead of after an awkward pause.\n\n## How it fits together\n\nDeepgram does the listening and the speaking, but a few pieces around it make the live loop work. Here's the whole path, start to finish:\n\n**Capture the audio.** We grab the call's sound straight from what your computer is already playing, and your microphone separately, using a macOS Core Audio[system tap](https://github.com/ritza-co/otto-call-agent/blob/main/scripts/system-tap.swift). Nothing in your audio setup changes and no bot joins the meeting. Both feeds go to Deepgram as the[two transcription streams](https://github.com/ritza-co/otto-call-agent/blob/main/src/deepgram.ts)above.**Wait for the name.** Every turn is transcribed, but Otto stays quiet until it hears \"Otto\". A quick check decides whether someone actually addressed it, so a passing mention or a \"thanks, Otto\" doesn't set it off.**Find an answer.** Once Otto is addressed, we hand the recent transcript and the question to an LLM. It can pull in notes from past meetings or search the web, then write a short reply.**Say it out loud.** Aura turns that reply into audio, which we feed into a virtual microphone that the call app treats as its mic input, so everyone on the call hears it (and it plays to your headphones too).\n\nThe whole loop runs on your own machine, so the live transcript and your meeting notes stay local.\n\n## Try it yourself\n\nOtto runs on macOS (14.4 or later) and takes a few minutes to set up. You'll need:\n\n- Homebrew, to install the audio tools\n- A Deepgram API key (free trial available) for transcription and text-to-speech\n- An OpenAI API key, which Otto uses to write its answers (you can swap in another LLM provider)\n\nThe [README](https://github.com/ritza-co/otto-call-agent) has the full walkthrough. In short, you:\n\n- Clone the repo and install its dependencies.\n- Install BlackHole, the free virtual audio device Otto speaks through.\n- Add your API keys to a\n`.env`\n\nfile. - Run the setup script, which builds the audio helper and flags anything still missing.\n- Grant the recording permission so Otto can hear the call.\n- Start Otto and join your meeting.\n- Set your call app's microphone to BlackHole 2ch, then ask Otto a question.\n\nThat's the whole thing: a meeting assistant that listens, thinks, and talks back in real time, with Deepgram handling the hearing and the speaking. The complete project (capture pipeline, wake-word logic, Deepgram streams, and all) is on GitHub. Clone it, change the wake word or the voice, and make it your own.\n\n[Get the full project on GitHub →](https://github.com/ritza-co/otto-call-agent)", "url": "https://wpnews.pro/news/building-a-personal-meeting-assistant-that-routes-through-your-existing-audio", "canonical_source": "https://techstackups.com/articles/deepgram-personal-meeting-assistant/", "published_at": "2026-06-15 08:31:22+00:00", "updated_at": "2026-06-15 08:42:29.515685+00:00", "lang": "en", "topics": ["ai-tools", "natural-language-processing", "ai-products", "developer-tools"], "entities": ["Deepgram", "Otto", "Google Meet", "Zoom", "Teams", "Nova-3", "Aura"], "alternates": {"html": "https://wpnews.pro/news/building-a-personal-meeting-assistant-that-routes-through-your-existing-audio", "markdown": "https://wpnews.pro/news/building-a-personal-meeting-assistant-that-routes-through-your-existing-audio.md", "text": "https://wpnews.pro/news/building-a-personal-meeting-assistant-that-routes-through-your-existing-audio.txt", "jsonld": "https://wpnews.pro/news/building-a-personal-meeting-assistant-that-routes-through-your-existing-audio.jsonld"}}