Hestia – a local-first Home Assistant that trusts timers over the LLM Hestia, a local-first, self-hosted home assistant, runs a local LLM on user-owned hardware with no cloud or internet exposure. It delegates deterministic tasks to timers and databases, using the LLM only for judgment and conversation, and integrates with Home Assistant, Plex, and media automation tools. The project prioritizes reliability and privacy over a smarter brain. A local-first, self-hosted assistant for your home. One stateful "brain" runs a local LLM on hardware you own, and every window into it — your phone, a terminal, the kitchen mic, Home Assistant — talks to that same brain. Nothing runs in the cloud, nothing is exposed to the internet, and your data never leaves the house. The idea it's built on. Most "AI for the home" points the model at the things it's worst at: remembering a schedule, watching a threshold, firing a reminder at the right minute. Hestia does the opposite. Anything deterministic — a chore is due, the soil is dry, the trash goes out Tuesday — is handed to something dumb and reliable: a timer, a record, a row in a database. The LLM is left to do the one thing it's genuinely good at, which is judgment and conversation. The goal was never a smarter brain. It's a more reliable one. ARCHITECTURE.md /thefullnacho/hestia/blob/main/ARCHITECTURE.md is the long version; MEMORY-DESIGN.md /thefullnacho/hestia/blob/main/MEMORY-DESIGN.md covers the memory plan. What it actually is. A brain brain/ — an OpenAI-compatible endpoint POST /v1/chat/completions wrapping a local LLM Ollama, qwen3:14b with an agent loop. Every client speaks one dialect. Eight scoped tools — home control Home Assistant , media Plex + arr , memory , records , reminder , search , status , weather . There is deliberately no shell tool : the brain can act in your house but cannot run arbitrary commands. Memory that grows — markdown soft-facts plus a SQLite record of the things in your life pets, garden, wildlife, chores , and a background note-taker that proposes durable facts for you to approve rather than writing them silently. A media appliance — Plex + the arr stack + Bazarr subtitles + qBittorrent behind a fail-closed VPN kill-switch. Voice — talk to it through Home Assistant's Assist pipeline or the browser. What it isn't. A cloud service, a wrapper around someone else's API, or anything you should put on the public internet. It runs rootless on your own box and never phones home. ⚠️ ReadThe brain has no built-in authentication and can control your devices, so it must stay on a private network Tailscale or LAN . That's a deliberate trade-off, not an oversight — the doc explains the trust model. SECURITY.md before running it. Hestia is part of the Forager / Homesteader Labs constellation, alongside forager ml , forager-field-station , and the Homesteader Labs site. Phase 0 — Reach + brain ✅ — talk to your home model from your phone details below . Phase 1 — Media appliance ✅ — Plex + qBittorrent + gluetun VPN kill-switch verified + the arr automation layer Prowlarr/Sonarr/Radarr + FlareSolverr + Bazarr subtitles . Full loop: search → download via VPN → hardlink → Plex. Phase 2 — House Home Assistant ✅ — HA running; lights and devices reachable via the home tool. Phase 3 — Voice ✅ — speak to Hestia through HA's Assist pipeline and a browser voice loop. Phase 4 — The seam memory + tools ✅ core in place, still growing — the brain is a tool-calling agent with the eight tools above plus deterministic skill injection, and HA's conversation agent points at Hestia , so Assist and voice route through the brain which can control HA back . It also gets smarter over time via the note-taker see Memory & learning . Next: vision Eyes . Win: talk to your home model from your phone. The brain brain/ is a thin OpenAI-compatible proxy onto Ollama. Every client — terminal, phone, kitchen mic — speaks one dialect POST /v1/chat/completions . In Phase 0 it forces the chosen model, injects Hestia's system prompt persona + the hardened safety rules from the benchmark A/B , and streams the reply back. Memory and tools land in Phase 4 behind this same URL. | Service | What | Bind | GPU | |---|---|---|---| hestia-ollama | Ollama inference engine | 127.0.0.1:11434 localhost only | RTX 5080 only | hestia-brain | Hestia /v1 proxy | 0.0.0.0:8730 reachable over Tailscale | — | Both are user systemd services no root , defined in deploy/systemd/ and installed into ~/.config/systemd/user/ . Linger is enabled, so they survive logout/reboot. Ollama is pinned to the 5080 CUDA VISIBLE DEVICES , leaving the 4060 Ti free for Phase 3 Whisper/Piper per the benchmark verdict. Model: qwen3:14b resident, thinking off — the current pick after the model eval brain/eval models.py ; qwen2.5:14b kept on disk as a fallback . See MODEL EVAL.md .Day to day, use deploy/hestiactl symlinked into ~/.local/bin — one command for the whole estate, run from the GPU box: hestiactl status brain health + local units + every container on hl-relay hestiactl health raw /health JSON hestiactl up|down|restart X X: brain ollama | arr services | plex qbit ha adguard ... | all hestiactl logs X -f journalctl local or docker logs remote hestiactl vpn verify the qBittorrent kill-switch all covers only the Hestia-managed pieces local units + arr stack ; core containers AdGuard = house DNS, gluetun, HA are controlled one at a time and ask for confirmation before stopping. The underlying commands, for when you need them directly: status / logs systemctl --user status hestia-ollama hestia-brain journalctl --user -u hestia-brain -f restart after editing brain code or a service file systemctl --user daemon-reload only if you edited a .service systemctl --user restart hestia-brain health Ollama up + model present? — brain binds the Tailscale IP, not localhost curl -s 127.0.0.1:8730/health | jq talk to it curl -s 127.0.0.1:8730/v1/chat/completions -H 'content-type: application/json' \ -d '{"messages": {"role":"user","content":"hello Hestia"} }' | jq -r .choices 0 .message.content If you edit a deploy/systemd/ .service file, re-copy it into ~/.config/systemd/user/ before daemon-reload . Tailscale is the one piece that needs root, so it isn't auto-installed. On the GPU box: curl -fsSL https://tailscale.com/install.sh | sh sudo tailscale up Then on the phone: install the Tailscale app, sign in to the same tailnet. The brain is then reachable at http://