{"slug": "building-real-time-voice-agents-from-scratch", "title": "Building Real-Time Voice Agents from Scratch", "summary": "Nemorize has published a learning roadmap titled \"Building Real-Time Voice Agents from Scratch,\" covering the full pipeline from audio fundamentals and speech detection to ASR, LLM streaming, and TTS. The roadmap is divided into five parts, including foundations, pipeline construction, hard problems like barge-in and latency, engineering best practices, and capstone extensions.", "body_md": "[← Back to Roadmaps](/roadmaps)\n\n# Building Real-Time Voice Agents from Scratch - Learning Roadmap | Nemorize\n\nLoading roadmap...\n\n## Learning Topics\n\nThis roadmap covers the following topics:\n\n✅\n\n**Part I: Foundations**- ✅\n[Shape of a Voice Agent](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-7db5-9311-c80162b6688e)- ⚪ mic → ASR → LLM → TTS Loop\n- ⚪ Trade Matrix\n\n- ✅\n[Audio Fundamentals](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-752c-a717-ef08fc8f6f0b)- ⚪ SR_IN vs SR_OUT\n- ⚪ float32 ↔ int16 Conversions\n\n- ✅\n[VAD: Detecting Speech](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-7413-838c-53bc3384556b)- ⚪ Threshold Tuning\n- ⚪ Pre-roll Buffer\n\n✅\n\n**Part II: The Pipeline**- ⚪ ASR with faster-whisper\n- ⚪ Model Size Trade-offs\n- ⚪ ASR as a Blocking Call\n\n- ⚪ LLM Streaming & State\n- ⚪ Speakable System Prompt\n- ⚪ The Commit Pattern\n\n- ⚪ TTS & Latency Trick\n- ⚪ pop_sentences Deep Dive\n- ⚪ Kokoro vs Piper Backends\n\n✅\n\n**Part III: The Hard Parts**- ⚪ Barge-in: Interruption\n- ⚪ Yield-Point Latency\n- ⚪ Cancel Wire Protocol\n\n- ⚪ The Feedback Loop\n- ⚪ Browser AEC\n\n- ⚪ Playback State Machine\n- ⚪ Three Distinct Moments\n\n✅\n\n**Part IV: Engineering It Well**- ⚪ Frontend Audio Scheduling\n- ⚪ AudioWorklet for Mic Capture\n- ⚪ Gapless playHead Scheduling\n\n- ⚪ Concurrency & Orchestration\n- ⚪ run_in_executor Pattern\n- ⚪ asyncio vs Threads — Same Shape\n\n✅\n\n**Part V: Make It Yours**- ⚪ Capstone Extensions\n- ⚪ Measurable Latency Fork\n- ⚪ Extension Projects\n\n- ⚪ The Production Bridge\n- ⚪ Trade-offs You Now Own\n- ⚪ Why Hosted APIs Choose as They Do\n\n## Community Feedback\n\nShare your thoughts and rate this roadmap\n\nSign in to share your feedback and rate this roadmap\n\nLoading comments...", "url": "https://wpnews.pro/news/building-real-time-voice-agents-from-scratch", "canonical_source": "https://nemorize.com/roadmaps/building-real-time-voice-agents-from-scratch", "published_at": "2026-05-29 10:46:23+00:00", "updated_at": "2026-05-29 11:15:51.685700+00:00", "lang": "en", "topics": ["ai-agents", "artificial-intelligence", "natural-language-processing", "ai-tools", "ai-infrastructure"], "entities": ["Nemorize", "faster-whisper", "Kokoro", "Piper"], "alternates": {"html": "https://wpnews.pro/news/building-real-time-voice-agents-from-scratch", "markdown": "https://wpnews.pro/news/building-real-time-voice-agents-from-scratch.md", "text": "https://wpnews.pro/news/building-real-time-voice-agents-from-scratch.txt", "jsonld": "https://wpnews.pro/news/building-real-time-voice-agents-from-scratch.jsonld"}}