← Back to Roadmaps roadmap...
Learning Topics #
This roadmap covers the following topics:
✅
**Part I: Foundations**- ✅
[Shape of a Voice Agent](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-7db5-9311-c80162b6688e)- ⚪ mic → ASR → LLM → TTS Loop
-
⚪ Trade Matrix
-
✅ Audio Fundamentals- ⚪ SR_IN vs SR_OUT
-
⚪ float32 ↔ int16 Conversions
-
✅
[VAD: Detecting Speech](/roadmaps/building-real-time-voice-agents-from-scratch/lessons/019e6873-1262-7413-838c-53bc3384556b)- ⚪ Threshold Tuning
- ⚪ Pre-roll Buffer
✅
**Part II: The Pipeline**- ⚪ ASR with faster-whisper
- ⚪ Model Size Trade-offs
-
⚪ ASR as a Blocking Call
-
⚪ LLM Streaming & State
-
⚪ Speakable System Prompt
-
⚪ The Commit Pattern
-
⚪ TTS & Latency Trick
-
⚪ pop_sentences Deep Dive
-
⚪ Kokoro vs Piper Backends
✅
**Part III: The Hard Parts**- ⚪ Barge-in: Interruption
- ⚪ Yield-Point Latency
-
⚪ Cancel Wire Protocol
-
⚪ The Feedback Loop
-
⚪ Browser AEC
-
⚪ Playback State Machine
-
⚪ Three Distinct Moments
✅
Part IV: Engineering It Well- ⚪ Frontend Audio Scheduling
-
⚪ AudioWorklet for Mic Capture
-
⚪ Gapless playHead Scheduling
-
⚪ Concurrency & Orchestration
-
⚪ run_in_executor Pattern
-
⚪ asyncio vs Threads — Same Shape
✅
Part V: Make It Yours- ⚪ Capstone Extensions
-
⚪ Measurable Latency Fork
-
⚪ Extension Projects
-
⚪ The Production Bridge
-
⚪ Trade-offs You Now Own
-
⚪ Why Hosted APIs Choose as They Do
Community Feedback #
Share your thoughts and rate this roadmap
Sign in to share your feedback and rate this roadmap
comments...