Hestia – a local-first Home Assistant that trusts timers over the LLM

Hestia, a local-first, self-hosted home assistant, runs a local LLM on user-owned hardware with no cloud or internet exposure. It delegates deterministic tasks to timers and databases, using the LLM only for judgment and conversation, and integrates with Home Assistant, Plex, and media automation tools. The project prioritizes reliability and privacy over a smarter brain.

A local-first, self-hosted assistant for your home. One stateful "brain" runs a local LLM on hardware you own, and every window into it — your phone, a terminal, the kitchen mic, Home Assistant — talks to that same brain. Nothing runs in the cloud, nothing is exposed to the internet, and your data never leaves the house. The idea it's built on. Most "AI for the home" points the model at the things it's worst at: remembering a schedule, watching a threshold, firing a reminder at the right minute. Hestia does the opposite. Anything deterministic — a chore is due, the soil is dry, the trash goes out Tuesday — is handed to something dumb and reliable: a timer, a record, a row in a database. The LLM is left to do the one thing it's genuinely good at, which is judgment and conversation. The goal was never a smarter brain. It's a more reliable one. ARCHITECTURE.md /thefullnacho/hestia/blob/main/ARCHITECTURE.md is the long version; MEMORY-DESIGN.md /thefullnacho/hestia/blob/main/MEMORY-DESIGN.md covers the memory plan. What it actually is. A brain brain/ — an OpenAI-compatible endpoint POST /v1/chat/completions wrapping a local LLM Ollama, qwen3:14b with an agent loop. Every client speaks one dialect. Eight scoped tools — home control Home Assistant , media Plex + arr , memory , records , reminder , search , status , weather . There is deliberately no shell tool : the brain can act in your house but cannot run arbitrary commands. Memory that grows — markdown soft-facts plus a SQLite record of the things in your life pets, garden, wildlife, chores , and a background note-taker that proposes durable facts for you to approve rather than writing them silently. A media appliance — Plex + the arr stack + Bazarr subtitles + qBittorrent behind a fail-closed VPN kill-switch. Voice — talk to it through Home Assistant's Assist pipeline or the browser. What it isn't. A cloud service, a wrapper around someone else's API, or anything you should put on the public internet. It runs rootless on your own box and never phones home. ⚠️ ReadThe brain has no built-in authentication and can control your devices, so it must stay on a private network Tailscale or LAN . That's a deliberate trade-off, not an oversight — the doc explains the trust model. SECURITY.md before running it. Hestia is part of the Forager / Homesteader Labs constellation, alongside forager ml , forager-field-station , and the Homesteader Labs site. Phase 0 — Reach + brain ✅ — talk to your home model from your phone details below . Phase 1 — Media appliance ✅ — Plex + qBittorrent + gluetun VPN kill-switch verified + the arr automation layer Prowlarr/Sonarr/Radarr + FlareSolverr + Bazarr subtitles . Full loop: search → download via VPN → hardlink → Plex. Phase 2 — House Home Assistant ✅ — HA running; lights and devices reachable via the home tool. Phase 3 — Voice ✅ — speak to Hestia through HA's Assist pipeline and a browser voice loop. Phase 4 — The seam memory + tools ✅ core in place, still growing — the brain is a tool-calling agent with the eight tools above plus deterministic skill injection, and HA's conversation agent points at Hestia , so Assist and voice route through the brain which can control HA back . It also gets smarter over time via the note-taker see Memory & learning . Next: vision Eyes . Win: talk to your home model from your phone. The brain brain/ is a thin OpenAI-compatible proxy onto Ollama. Every client — terminal, phone, kitchen mic — speaks one dialect POST /v1/chat/completions . In Phase 0 it forces the chosen model, injects Hestia's system prompt persona + the hardened safety rules from the benchmark A/B , and streams the reply back. Memory and tools land in Phase 4 behind this same URL. | Service | What | Bind | GPU | |---|---|---|---| hestia-ollama | Ollama inference engine | 127.0.0.1:11434 localhost only | RTX 5080 only | hestia-brain | Hestia /v1 proxy | 0.0.0.0:8730 reachable over Tailscale | — | Both are user systemd services no root , defined in deploy/systemd/ and installed into ~/.config/systemd/user/ . Linger is enabled, so they survive logout/reboot. Ollama is pinned to the 5080 CUDA VISIBLE DEVICES , leaving the 4060 Ti free for Phase 3 Whisper/Piper per the benchmark verdict. Model: qwen3:14b resident, thinking off — the current pick after the model eval brain/eval models.py ; qwen2.5:14b kept on disk as a fallback . See MODEL EVAL.md .Day to day, use deploy/hestiactl symlinked into ~/.local/bin — one command for the whole estate, run from the GPU box: hestiactl status brain health + local units + every container on hl-relay hestiactl health raw /health JSON hestiactl up|down|restart X X: brain ollama | arr services | plex qbit ha adguard ... | all hestiactl logs X -f journalctl local or docker logs remote hestiactl vpn verify the qBittorrent kill-switch all covers only the Hestia-managed pieces local units + arr stack ; core containers AdGuard = house DNS, gluetun, HA are controlled one at a time and ask for confirmation before stopping. The underlying commands, for when you need them directly: status / logs systemctl --user status hestia-ollama hestia-brain journalctl --user -u hestia-brain -f restart after editing brain code or a service file systemctl --user daemon-reload only if you edited a .service systemctl --user restart hestia-brain health Ollama up + model present? — brain binds the Tailscale IP, not localhost curl -s 127.0.0.1:8730/health | jq talk to it curl -s 127.0.0.1:8730/v1/chat/completions -H 'content-type: application/json' \ -d '{"messages": {"role":"user","content":"hello Hestia"} }' | jq -r .choices 0 .message.content If you edit a deploy/systemd/ .service file, re-copy it into ~/.config/systemd/user/ before daemon-reload . Tailscale is the one piece that needs root, so it isn't auto-installed. On the GPU box: curl -fsSL https://tailscale.com/install.sh | sh sudo tailscale up Then on the phone: install the Tailscale app, sign in to the same tailnet. The brain is then reachable at http://<gpu-box-tailscale-name :8730/v1 from any app that speaks OpenAI set that as the base URL; any API key string works — Ollama ignores it . Nothing is exposed to the public internet. brain/ hestia.py the agent loop: /v1/chat/completions + /health, tools, memory, note-taker hook config.py single source of paths + secret loading; makes the brain relocatable prompt.py SYSTEM PROMPT — persona + hardened safety rules records store.py / memory store.py SQLite entities+events / markdown soft facts note taker.py background "gets smarter over time" extractor review notes.py CLI to review + promote the note-taker's proposals tools/ home, media, memory, records, reminder, search, status, weather + skill router tests/ pytest: stores, dispatch, note-taker run: uv run --project brain pytest pyproject.toml deps + dev pytest + pytest config uv-managed, isolated venv Relocatable. Every path derives from config.py 's own location, so moving or restoring the repo to a new path needs no edits; HESTIA ROOT overrides if needed. All service URLs, tokens, and thresholds stay env-overridable next to the tools that use them. Win: the media stack runs, independent of the brain. Most of this already existed on the Micro before Hestia: Plex hl-plex , qBittorrent behind gluetun Surfshark, OpenVPN, NL with a fail-closed VPN kill-switch , plus AdGuard, MQTT, and Home Assistant. The kill-switch is verified: qBittorrent's traffic egresses via the VPN datacenter IP, not the host's. Don't docker compose up the existing /opt/home/compose.yml blindly — its volume paths are literal /path/to/... host dirs that the running containers depend on. Hestia added the missing automation layer as a separate, isolated stack deploy/media/compose.yml , deployed to /opt/home/arr/ : Prowlarr :9696, indexer manager , Sonarr :8989, TV , Radarr :7878, movies . All reachable over Tailscale. Also added FlareSolverr :8191 so Prowlarr can reach Cloudflare-protected indexers, wired as a Prowlarr indexer-proxy tag flaresolverr . Wired via API: root folders point at the existing Plex library /data/TV Shows , /data/Movies ; a remote-path mapping /downloads → /data/downloads lets Sonarr/Radarr hardlink from qBittorrent's downloads into the library instant, no copy — both are one filesystem under /mnt/media ; Prowlarr is connected to Sonarr + Radarr fullSync . Five reputable public indexers added The Pirate Bay, Knaben, LimeTorrents, plus 1337x + EZTV via FlareSolverr and synced down to the apps. YTS deliberately excluded history of feeding user data to copyright trolls . qBittorrent is wired as the download client in both Sonarr category tv-sonarr and Radarr radarr , tested OK. The full loop works: search → download through the VPN → hardlink into the Plex library. Both apps report no health warnings. cd /opt/home/arr docker compose ps docker compose pull && docker compose up -d update arr deploy/ha/custom components/hestia/ is a thin custom HA integration: it registers a conversation agent conversation.hestia that forwards each utterance to Hestia's /v1 and speaks the reply. Hestia owns the loop memory + tools, incl. controlling HA back ; HA is just input + a tool. This is the architecture's keystone made real. Wiring on hl-relay not in this repo — lives in HA's config : - Integration files installed to /opt/home/ha config/custom components/hestia/ . - A config entry points it at http://127.0.0.1:8730/v1/chat/completions Hestia over Tailscale; the HA container can reach it . - The preferred Assist pipeline's conversation engine is set to conversation.hestia , so the Assist chat and voice satellites route through the brain. Verified: via HA's conversation API, "turn on the TV light" drove the real light and "what coffee should I buy?" recalled a memory — HA → Hestia → HA round trip. Two stores back the brain: memory store markdown soft facts/preferences, git-auditable and records store SQLite entities + a uniform event log: pets/lineage, wildlife, chores, service reminders, the garden . Both are injected into the system prompt per request, scoped to what the request implies. The brain also learns passively. After each exchange — once the answer is already on the wire — a background note-taker note taker.py reads the turn and proposes durable facts it heard "trash pickup is Tuesday mornings" . True to propose, don't dispose , those land in a review inbox memory/inbox/ , not straight into live memory: uv run --project brain python brain/review notes.py list uv run --project brain python brain/review notes.py promote <id | --all uv run --project brain python brain/review notes.py discard <id | --all It reuses the resident model by default and never blocks or breaks a request. Tuning knobs: HESTIA NOTETAKER=0 disables it; HESTIA NOTETAKER AUTOWRITE=1 skips the review queue and writes durable memories directly; HESTIA NOTETAKER MODEL points it at a cheaper model e.g. a second Ollama on the free 4060 Ti to take the load off the brain. Hestia is licensed under the GNU Affero General Public License v3.0 — see LICENSE /thefullnacho/hestia/blob/main/LICENSE . The AGPL is deliberate: Hestia is built to be self-hosted, so the copyleft keeps it open even for anyone who runs a modified version as a network service, while imposing nothing on you for running it at home. Before running it, read SECURITY.md : the brain has no built-in authentication and can control your Home Assistant devices, so it must stay on a private network Tailscale/LAN and must never be exposed to the public internet. It deliberately has no shell tool. © 2026 TheFullNacho and contributors.