Offline AI Voice Assistant on Raspberry Pi 4 with Gemma

wpnews.pro

This is a fully offline voice assistant built on a Raspberry Pi 4 or 5. You speak to it, it thinks using a language model running locally on the board, and it talks back through a speaker. The whole pipeline runs on the Pi itself, so once it's set up it needs no internet connection at all. I built it because I wanted something I could actually trust in my own home and the maker side of it turned out to be a genuinely fun project.

What This Offline Raspberry Pi Voice Assistant Project Does

At a high level the device does three things in sequence, all on-board:

It records a few seconds of audio from a USB microphone when triggered.
It converts that audio to text using Whisper, then feeds the text to a local language model through Ollama, and gets a written answer back.
It turns that answer into natural speech with Piper TTS and plays it through a speaker.

The result is a small box on your desk that answers spoken questions in roughly ten to twenty seconds, with nothing ever sent to a remote server. It's the kind of thing a commercial smart speaker pretends to be but isn't, because the commercial ones do all their thinking in the cloud.

You don't need anything exotic here, which is part of the appeal. Most of this is stuff makers already have in a drawer.

This table lists the core parts and rough roles for the build.

Component- Raspberry Pi 4 or 5

MicroSD card
USB microphone
Speaker
Power supply

The single most important spec is RAM, because it decides which language model you can run. On a 2GB Pi you stick to a 1B-class model and keep things snappy. With 4GB or more you can move up to a larger model and trade speed for noticeably smarter replies.

On the software side it's all open and free: Raspberry Pi OS Bookworm 64-bit, Ollama for the language model, Whisper for speech-to-text, and Piper for the voice. You only need internet during the initial download phase, after which it runs dark.

How a Local LLM Runs on Raspberry Pi Without the CloudThe piece that surprises most people is that you can run a real language model on a board this cheap. Ollama is the tool that makes it painless - it's the same one people use on laptops, and it runs on the Pi's ARM CPU with no GPU involved. You pull a model once, and after that it answers locally.

The reason this works on such modest hardware is that the models are small by design. A 1B-class model isn't going to write you an essay, but it handles short questions and quick facts well enough for a voice assistant, where speed matters more than brilliance anyway. If you want the full breakdown of which models fit which boards and how to get Ollama serving them, I covered the model setup in detail when I wrote about how to run a local LLM on Raspberry Pi with Ollama in the previous part of this series.

The three components do hand off cleanly, but they're all competing for the same limited memory, and that's where the build gets interesting.

Build Phases and the Problems Worth Knowing AboutI won't pretend this was plug-and-play. Here are the phases at a high level, in the order I tackled them:

Get the OS updated and Ollama installed, then pull a model sized for your RAM.
Set up and test the USB mic and the speaker so audio actually flows in and out.
Install the Python pieces for Whisper and Piper and grab a voice model.
Wire the four functions - record, transcribe, ask the model, speak - into one loop.
Optionally add wake word detection so it's hands-free.

The problems that cost me real time were almost all in the audio and memory areas. My microphone recorded in a sample format I didn't expect which made every transcription come back empty, and it took me a while to realise the recording was fine and the format assumption was the bug. all three components at once also pushed a 2GB board to its limit, so it needed a bit of breathing room configured before it stayed stable. And adding the always-listening wake word feature uses extra memory that a 2GB Pi can barely spare. None of it is hard once you know, but its not that bad to just have the fixes written down in front of you.

Pros and Cons of a DIY Offline Voice Assistant on Raspberry PiThe honest trade-offs, because a project page that only lists upsides isn't worth much:

It's completely private and works with no internet, which is the entire reason to build it.
It costs nothing to run after setup and the parts are cheap and reusable.
It's slower than any commercial assistant, with a ten to twenty second turnaround.
The smaller models are useful but will seperate fact from fiction imperfectly, so don't trust it for anything critical.
Linux audio config is the fiddliest part and will test your patience before the fun starts.

For a maker, the slowness and the rough edges are features as much as flaws, because you learn how every stage works instead of buying a black box. What You Can Add Next to This Pi Voice Assistant ProjectOnce the core loop runs, this becomes a great base to extend. A few directions that are well within reach: hands-free wake word activation so you trigger it by voice, GPIO control so spoken commands switch real lights and relays, a camera with a small vision model for object recognition, multilingual support since Whisper handles dozens of languages, and a 3D-printed enclosure to turn it into a finished desk device. Each one reuses the same local pipeline, so you never lose the offline, private nature of the build.

If you want to replicate this exactly, including the audio format flags, the memory fix, the model recommendations per board, and the wake word code, the full write-up is my [complete offline Raspberry Pi voice assistant tutorial with Whisper and Piper TTS ](https://rootsaid.com/embedded-systems/ai-assistant-raspberry-pi/)over on RootSaid. That's where every command and config value lives.

[Read more](javascript:void(0))

[RootSaid](https://rootsaid.com/embedded-systems/ai-assistant-raspberry-pi/).

source & further reading

hackster.io — original article Real-time LiDAR Room Detection at the Edge Stream Edge AI Vision to Any Browser via WebRTC The AI Newscaster: Zero-Dependency News with Arduino Uno Q

Offline AI Voice Assistant on Raspberry Pi 4 with Gemma

Run your AI side-project on zahid.host