How I Built AETHER: A Local AI Assistant That Controls My PC, Sends WhatsApp Messages, and Learns…

A developer built AETHER, a fully offline, voice-controlled AI assistant that runs three local language models on a laptop to control a PC, send WhatsApp messages, and perform desktop automation without cloud APIs. The system uses a custom fine-tuned 1.5B parameter model for tool routing, enabling multi-step commands like checking battery and messaging contacts. This demonstrates that specialized small models can replace massive cloud-based AI for practical desktop assistance.

Estimated Read Time: 12–15 min A deep dive into building a multi-model AI runtime with voice control, desktop automation, and self-evolving capabilities It started, like most dangerous projects do, with a simple thought: “Why can’t I just talk to my computer and have it do things?” Not ask ChatGPT a question. Not type a prompt. Actually control my PC with voice command, open apps, send messages, write code, search the web, like Tony Stark does in the movies. Siri can set a timer. Alexa can play a song. But neither of them can send a WhatsApp message to a specific contact with a custom message. Neither can write a Python script, execute it, and tell me the results. Neither can look at my screen and explain what’s on it. So I built one. Meet AETHER: Adaptive, Evolving, Tactical, Heuristic-Engine Response, a fully offline, voice-controlled AI assistant that runs entirely on my laptop. No OpenAI API. No cloud. No subscriptions. Just three small language models running locally through Ollama, stitched together with a lot of Python and an unreasonable amount of late nights. This article is the technical deep-dive. If you want to see AETHER in action first, here’s a 4-minute demo: ai machinelearning opensource buildinpublic jarvis deeplearning python edgeai | Anish Pathak https://www.linkedin.com/posts/anish-pathak-85834a319 ai-machinelearning-opensource-activity-7472483101949128704-zIbD?utm source=share&utm medium=member android&rcm=ACoAAFCswEQBV00 rvnNAzoFn-VHZl9gXJH101I What AETHER Can Actually Do Before we get into the architecture, here’s what this thing can do in practice. These aren’t mockups, every single one of these works in the demo video above: That last one is the feature that makes people pause. The core insight behind AETHER is that you don’t need one massive AI model to do everything. You need multiple specialized models that each do one thing well, with a routing system that sends each task to the right brain. AETHER runs three LLMs simultaneously through Ollama, each with a distinct role: Aether-Orchestrator 1.5B, custom fine-tuned : Tool Router- takes a command, outputs a JSON array of tool calls Qwen 2.5 Coder 1.5B : Conversational Brain- general chat, content generation, response synthesis DeepSeek-R1 1.5B : Reasoning Engine- complex math, logic, and chain-of-thought problems Three 1.5B parameter models running locally. Total: about 4.5B parameters. For context, GPT-4 is estimated at over 1 trillion parameters. AETHER runs on a fraction of that, entirely on a laptop GPU. When a user says “Send a WhatsApp to Mom saying I’ll be late,” AETHER needs to: This is called tool routing , and it’s what makes or breaks an AI assistant. I fine-tuned Qwen 2.5 Coder 1.5B specifically for this task. Here’s how: Step 1 — Synthetic Dataset Generation: I used OpenRouter’s API to generate 1,500 diverse training examples. Each example is a user utterance paired with the correct JSON tool call: { "user": "Hey can you check the battery and then text mom saying I'll be home soon?", "assistant": " {\"name\": \"get system diagnostics\"}, {\"name\": \"send whatsapp message\", \"arguments\": {\"contact name\": \"mom\", \"message\": \"I'll be home soon\"}} "} The dataset covers all 15+ tools with variations in phrasing, slang, typos, and multi-tool chains. Step 2 — Fine-Tuning on Kaggle for $0 : I used Unsloth + LoRA rank 16, alpha 32 to fine-tune on Kaggle’s free GPU tier. Three epochs. The entire training took about 40 minutes. Step 3 — Export to GGUF and deploy via Ollama: Merged the LoRA weights, quantized to Q4 K M 4-bit , and loaded into Ollama with a strict system prompt that forces pure JSON output with zero conversational filler. The result? A 1.5B parameter model that reliably maps natural language to structured tool calls. It can even handle multi-tool chains in a single pass: “Check my battery, open LinkedIn, and send a WhatsApp to Anish saying hey” → get system diagnostics, open url "linkedin" , send whatsapp message "Anish", "hey" AETHER isn’t a chatbot. It’s an operating system layer. Here’s what it can control: This is my favorite part of AETHER, and the one that gets the biggest reaction in demos. If you ask AETHER to do something it can’t do, you can teach it: “Learn a new permanent skill called get bitcoin price that fetches the current Bitcoin price” Here’s what happens under the hood: After teaching it, AETHER now permanently knows this skill. Next time you boot up, it auto-loads from the skills/ directory. AETHER isn’t just a terminal script. It has a full React dashboard built with Next.js 14 and React Three Fibre. The centrepiece is a 3D holographic orb that physically maps to AETHER’s cognitive state: Idle: Slow breathing pulse, dim blue Listening: Emerald green glow, expanded Thinking: Orange rotation, faster particles Speaking: Purple radiance, audio-reactive particles The HUD overlay shows real-time system telemetry — status, current mode, active model. Every state transition is streamed via WebSockets from the FastAPI backend to the frontend in real-time. Giving an AI control over your operating system is dangerous. AETHER has a tiered authorization system : Auto-Authorized safe : Web searches, opening URLs, media control, system diagnostics, timers Requires GUI Confirmation: WhatsApp messaging, shell commands, app + type, script execution, screen analysis When a dangerous action is triggered, the frontend pops up a confirmation modal. The AI thread blocks until you click Allow or Deny. Building AETHER taught me things that no tutorial covers: You don’t need GPT-4 to route tool calls. A 1.5B model, fine-tuned on 1,500 domain-specific examples, can reliably output structured JSON. The key is specificity , don’t try to make a small model do everything. Make it do one thing perfectly. The other 80% is the plumbing. Getting Whisper STT to not block the main thread. Getting PyAutoGUI to not race-condition with window focus. Getting three Ollama models to share GPU VRAM without OOM kills. Getting WebSockets to stream state without dropping frames. When you call openai.chat.completions.create , you learn an API. When you run Ollama locally, quantize your own model, and write the inference loop yourself, you learn how language models actually work, attention, tokenization, KV caching, batch sizes, and VRAM management. Text chat is forgiving , you can rephrase, edit, retry. Voice is one-shot. If the STT mishears you, or the model takes 5 seconds to respond, the experience falls apart. This forced me to optimize every component for latency: CPU-based intent classification ~50ms , cached embeddings, pre-loaded models, and streaming TTS. Backend: Python, FastAPI, WebSockets, Uvicorn Frontend: Next.js 14, React Three Fiber, TailwindCSS LLM Inference: Ollama localhost , GGUF Quantized Models Custom Training: Unsloth, LoRA, Kaggle GPU free tier Speech-to-Text: Faster-Whisper CUDA, float16 Text-to-Speech: Edge-TTS / Kokoro-82M Vision: LLaVA-Phi3, PyAutoGUI screenshots Memory: FAISS Vector DB + MiniLM embeddings OS Automation: PyAutoGUI, Pyperclip, PyCaw, Win32 ctypes AETHER is open source. If you have a laptop with an NVIDIA GPU and about 8GB of VRAM, you can run the full stack. GitHub: github.com/Anishp-cell/AETHER v1.0 https://github.com/Anishp-cell/AETHER v1.0 If you found this interesting, I’d love to hear what features you’d want in an offline AI assistant. Drop a comment or connect with me on LinkedIn https://www.linkedin.com/in/anish-pathak . If you’re building with local LLMs, edge AI, or voice interfaces — let’s connect. I’m actively looking for collaborators and research opportunities in this space. How I Built AETHER: A Local AI Assistant That Controls My PC, Sends WhatsApp Messages, and Learns… https://pub.towardsai.net/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-messages-and-learns-f225c347590b was originally published in Towards AI https://pub.towardsai.net on Medium, where people are continuing the conversation by highlighting and responding to this story.