# Building Real-Time Voice Agents from Scratch

> Source: <https://nemorize.com/roadmaps/building-real-time-voice-agents-from-scratch>
> Published: 2026-05-29 10:46:23+00:00

[← Back to Roadmaps](/roadmaps)

# Building Real-Time Voice Agents from Scratch - Learning Roadmap | Nemorize

Loading roadmap...

## Learning Topics

This roadmap covers the following topics:

✅

**Part I: Foundations**- ✅
[Shape of a Voice Agent](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-7db5-9311-c80162b6688e)- ⚪ mic → ASR → LLM → TTS Loop
- ⚪ Trade Matrix

- ✅
[Audio Fundamentals](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-752c-a717-ef08fc8f6f0b)- ⚪ SR_IN vs SR_OUT
- ⚪ float32 ↔ int16 Conversions

- ✅
[VAD: Detecting Speech](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-7413-838c-53bc3384556b)- ⚪ Threshold Tuning
- ⚪ Pre-roll Buffer

✅

**Part II: The Pipeline**- ⚪ ASR with faster-whisper
- ⚪ Model Size Trade-offs
- ⚪ ASR as a Blocking Call

- ⚪ LLM Streaming & State
- ⚪ Speakable System Prompt
- ⚪ The Commit Pattern

- ⚪ TTS & Latency Trick
- ⚪ pop_sentences Deep Dive
- ⚪ Kokoro vs Piper Backends

✅

**Part III: The Hard Parts**- ⚪ Barge-in: Interruption
- ⚪ Yield-Point Latency
- ⚪ Cancel Wire Protocol

- ⚪ The Feedback Loop
- ⚪ Browser AEC

- ⚪ Playback State Machine
- ⚪ Three Distinct Moments

✅

**Part IV: Engineering It Well**- ⚪ Frontend Audio Scheduling
- ⚪ AudioWorklet for Mic Capture
- ⚪ Gapless playHead Scheduling

- ⚪ Concurrency & Orchestration
- ⚪ run_in_executor Pattern
- ⚪ asyncio vs Threads — Same Shape

✅

**Part V: Make It Yours**- ⚪ Capstone Extensions
- ⚪ Measurable Latency Fork
- ⚪ Extension Projects

- ⚪ The Production Bridge
- ⚪ Trade-offs You Now Own
- ⚪ Why Hosted APIs Choose as They Do

## Community Feedback

Share your thoughts and rate this roadmap

Sign in to share your feedback and rate this roadmap

Loading comments...
