I built a real-time Voice AI assistant that listens, thinks, and talks back β using entirely open-source tools and APIs.
No ChatGPT wrappers.
No expensive SDKs.
Just raw engineering.
π Live Demo
π Try it here:
https://huggingface.co/spaces/Kailashalgo/voice-ai-chat Press and hold the mic button β speak β AI replies out loud.
π§ What This Project Does
The app creates a full voice conversation pipeline:
You speak into the browser
Whisper converts speech β text
LLaMA 3.3 70B generates a response
gTTS converts text β speech
Audio plays back instantly
It feels surprisingly natural and fast.
π οΈ Tech Stack
Layer Tool
π€ Speech to Text Whisper Large V3 Turbo (Groq API)
π§ LLM LLaMA 3.3 70B
π Text to Speech gTTS
β‘ Backend FastAPI + Python
π Frontend Vanilla HTML/CSS/JS
π³ Deployment Docker
βοΈ Hosting HuggingFace Spaces
β‘ Why I Built This
Most AI voice demos online are:
expensive,
closed-source,
or heavily abstracted.
I wanted to understand how real-time voice AI systems actually work under the hood.
This project helped me explore:
streaming workflows,
latency optimization,
speech pipelines,
browser audio APIs,
and LLM orchestration.
π§© System Architecture
The complete flow:
User Voice
β Whisper STT
β LLaMA Processing
β gTTS Voice Generation
β Browser Playback
Simple architecture β but extremely powerful.
π Project Structure
voice-ai-chat/ βββ backend/
β βββ main.py
β βββ stt.py
β βββ tts.py
β βββ requirements.txt
βββ frontend/
β βββ index.html
βββ Dockerfile
βββ .env.example
βββ README.md
βοΈ Running Locally
Clone the repository
git clone [https://github.com/kailashv2/voice-ai-chat.git](https://github.com/kailashv2/voice-ai-chat.git)
cd voice-ai-chat
Create virtual environment
python -m venv venv
Install dependencies
pip install -r requirements.txt
Add Groq API key
GROQ_API_KEY=your_key_here
Start FastAPI server
uvicorn main:app --reload π³ Docker Support
docker build -t voice-ai-chat .
docker run -p 7860:7860 -e GROQ_API_KEY=your_key voice-ai-chat
πΈ Cost
Completely free to build and deploy.
Groq free tier
Whisper via Groq
gTTS
HuggingFace Spaces free hosting
π₯ What I Learned
The hardest part wasn't the AI.
It was reducing latency and making conversations feel natural.
Voice interfaces are fundamentally different from text chat:
response speed matters more,
interruptions matter,
audio processing matters,
UX matters a lot.
This project gave me a much deeper understanding of production-grade AI interaction systems.
π Live Project
Demo:
https://huggingface.co/spaces/Kailashalgo/voice-ai-chat GitHub:
https://github.com/kailashv2/voice-ai-chat π¨π» Built By
Kailash
Building AI systems, full-stack products, and agentic workflows.
If you found this useful, consider starring the repo β