{"slug": "how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-and", "title": "How I Built AETHER: A Local AI Assistant That Controls My PC, Sends WhatsApp Messages, and Learns…", "summary": "A developer built AETHER, a fully offline, voice-controlled AI assistant that runs three local language models on a laptop to control a PC, send WhatsApp messages, and perform desktop automation without cloud APIs. The system uses a custom fine-tuned 1.5B parameter model for tool routing, enabling multi-step commands like checking battery and messaging contacts. This demonstrates that specialized small models can replace massive cloud-based AI for practical desktop assistance.", "body_md": "**Estimated Read Time:** 12–15 min\n\n*A deep dive into building a multi-model AI runtime with voice control, desktop automation, and self-evolving capabilities*\n\nIt started, like most dangerous projects do, with a simple thought:\n\n**“Why can’t I just talk to my computer and have it do things?”**\n\nNot ask ChatGPT a question. Not type a prompt. Actually *control** *my PC with voice command, open apps, send messages, write code, search the web, like Tony Stark does in the movies.\n\nSiri can set a timer. Alexa can play a song. But neither of them can send a WhatsApp message to a specific contact with a custom message. Neither can write a Python script, execute it, and tell me the results. Neither can look at my screen and explain what’s on it.\n\nSo I built one.\n\nMeet **AETHER:*** Adaptive, Evolving, Tactical, Heuristic-Engine Response,* a fully offline, voice-controlled AI assistant that runs entirely on my laptop. No OpenAI API. No cloud. No subscriptions. Just three small language models running locally through Ollama, stitched together with a lot of Python and an unreasonable amount of late nights.\n\nThis article is the technical deep-dive. If you want to see AETHER in action first, here’s a 4-minute demo:\n\n[#ai #machinelearning #opensource #buildinpublic #jarvis #deeplearning #python #edgeai | Anish Pathak](https://www.linkedin.com/posts/anish-pathak-85834a319_ai-machinelearning-opensource-activity-7472483101949128704-zIbD?utm_source=share&utm_medium=member_android&rcm=ACoAAFCswEQBV00_rvnNAzoFn-VHZl9gXJH101I)\n\n**What AETHER Can Actually Do**\n\nBefore we get into the architecture, here’s what this thing can do in practice. These aren’t mockups, every single one of these works in the demo video above:\n\nThat last one is the feature that makes people pause.\n\nThe core insight behind AETHER is that you don’t need one massive AI model to do everything. You need multiple *specialized* models that each do one thing well, with a routing system that sends each task to the right brain.\n\nAETHER runs **three LLMs simultaneously** through Ollama, each with a distinct role:\n\n**Aether-Orchestrator** (1.5B, custom fine-tuned): Tool Router- takes a command, outputs a JSON array of tool calls\n\n**Qwen 2.5 Coder** (1.5B): Conversational Brain- general chat, content generation, response synthesis\n\n**DeepSeek-R1** (1.5B): Reasoning Engine- complex math, logic, and chain-of-thought problems\n\nThree 1.5B parameter models running locally. Total: about 4.5B parameters. For context, GPT-4 is estimated at over 1 *trillion* parameters. AETHER runs on a fraction of that, entirely on a laptop GPU.\n\nWhen a user says *“Send a WhatsApp to Mom saying I’ll be late,”* AETHER needs to:\n\nThis is called **tool routing**, and it’s what makes or breaks an AI assistant.\n\nI fine-tuned Qwen 2.5 Coder (1.5B) specifically for this task. Here’s how:\n\n**Step 1 — Synthetic Dataset Generation:** I used OpenRouter’s API to generate 1,500 diverse training examples. Each example is a user utterance paired with the correct JSON tool call:\n\n```\n{  \"user\": \"Hey can you check the battery and then text mom saying I'll be home soon?\",  \"assistant\": \"[{\\\"name\\\": \\\"get_system_diagnostics\\\"}, {\\\"name\\\": \\\"send_whatsapp_message\\\", \\\"arguments\\\": {\\\"contact_name\\\": \\\"mom\\\", \\\"message\\\": \\\"I'll be home soon\\\"}}]\"}\n```\n\nThe dataset covers all 15+ tools with variations in phrasing, slang, typos, and multi-tool chains.\n\n**Step 2 — Fine-Tuning on Kaggle (for $0):** I used Unsloth + LoRA (rank 16, alpha 32) to fine-tune on Kaggle’s free GPU tier. Three epochs. The entire training took about 40 minutes.\n\n**Step 3 — Export to GGUF and deploy via Ollama:** Merged the LoRA weights, quantized to Q4_K_M (4-bit), and loaded into Ollama with a strict system prompt that forces pure JSON output with zero conversational filler.\n\nThe result? A 1.5B parameter model that reliably maps natural language to structured tool calls. It can even handle multi-tool chains in a single pass:\n\n*“Check my battery, open LinkedIn, and send a WhatsApp to Anish saying hey”* → [get_system_diagnostics, open_url(\"linkedin\"), send_whatsapp_message(\"Anish\", \"hey\")]\n\nAETHER isn’t a chatbot. It’s an operating system layer. Here’s what it can control:\n\nThis is my favorite part of AETHER, and the one that gets the biggest reaction in demos.\n\nIf you ask AETHER to do something it *can’t* do, you can teach it:\n\n“Learn a new permanent skill called get_bitcoin_price that fetches the current Bitcoin price”\n\nHere’s what happens under the hood:\n\nAfter teaching it, AETHER now *permanently* knows this skill. Next time you boot up, it auto-loads from the skills/ directory.\n\nAETHER isn’t just a terminal script. It has a full React dashboard built with Next.js 14 and React Three Fibre.\n\nThe centrepiece is a **3D holographic orb** that physically maps to AETHER’s cognitive state:\n\n**Idle: **Slow breathing pulse, dim blue\n\n**Listening: **Emerald green glow, expanded\n\n**Thinking: **Orange rotation, faster particles\n\n**Speaking: **Purple radiance, audio-reactive particles\n\nThe HUD overlay shows real-time system telemetry — status, current mode, active model. Every state transition is streamed via WebSockets from the FastAPI backend to the frontend in real-time.\n\nGiving an AI control over your operating system is dangerous. AETHER has a **tiered authorization system**:\n\n**Auto-Authorized (safe):** Web searches, opening URLs, media control, system diagnostics, timers\n\n**Requires GUI Confirmation:** WhatsApp messaging, shell commands, app + type, script execution, screen analysis\n\nWhen a dangerous action is triggered, the frontend pops up a confirmation modal. The AI thread blocks until you click Allow or Deny.\n\nBuilding AETHER taught me things that no tutorial covers:\n\nYou don’t need GPT-4 to route tool calls. A 1.5B model, fine-tuned on 1,500 domain-specific examples, can reliably output structured JSON. The key is *specificity*, don’t try to make a small model do everything. Make it do one thing perfectly.\n\nThe other 80% is the plumbing. Getting Whisper STT to not block the main thread. Getting PyAutoGUI to not race-condition with window focus. Getting three Ollama models to share GPU VRAM without OOM kills. Getting WebSockets to stream state without dropping frames.\n\nWhen you call openai.chat.completions.create(), you learn an API. When you run Ollama locally, quantize your own model, and write the inference loop yourself, you learn how language models actually work, attention, tokenization, KV caching, batch sizes, and VRAM management.\n\nText chat is forgiving , you can rephrase, edit, retry. Voice is one-shot. If the STT mishears you, or the model takes 5 seconds to respond, the experience falls apart. This forced me to optimize every component for latency: CPU-based intent classification (~50ms), cached embeddings, pre-loaded models, and streaming TTS.\n\n**Backend:** Python, FastAPI, WebSockets, Uvicorn\n\n**Frontend: **Next.js 14, React Three Fiber, TailwindCSS\n\n**LLM Inference: **Ollama (localhost), GGUF Quantized Models\n\n**Custom Training: **Unsloth, LoRA, Kaggle GPU (free tier)\n\n**Speech-to-Text: **Faster-Whisper (CUDA, float16)\n\n**Text-to-Speech: **Edge-TTS / Kokoro-82M\n\n**Vision: **LLaVA-Phi3, PyAutoGUI screenshots\n\n**Memory: **FAISS Vector DB + MiniLM embeddings\n\n**OS Automation: **PyAutoGUI, Pyperclip, PyCaw, Win32 ctypes\n\nAETHER is open source. If you have a laptop with an NVIDIA GPU and about 8GB of VRAM, you can run the full stack.\n\n**GitHub:** [github.com/Anishp-cell/AETHER_v1.0](https://github.com/Anishp-cell/AETHER_v1.0)\n\nIf you found this interesting, I’d love to hear what features you’d want in an offline AI assistant. Drop a comment or connect with me on [LinkedIn](https://www.linkedin.com/in/anish-pathak).\n\n*If you’re building with local LLMs, edge AI, or voice interfaces — let’s connect. I’m actively looking for collaborators and research opportunities in this space.*\n\n[How I Built AETHER: A Local AI Assistant That Controls My PC, Sends WhatsApp Messages, and Learns…](https://pub.towardsai.net/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-messages-and-learns-f225c347590b) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.", "url": "https://wpnews.pro/news/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-and", "canonical_source": "https://pub.towardsai.net/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-messages-and-learns-f225c347590b?source=rss----98111c9905da---4", "published_at": "2026-06-17 00:01:02+00:00", "updated_at": "2026-06-17 00:28:54.539388+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-tools", "ai-infrastructure"], "entities": ["AETHER", "Ollama", "Qwen 2.5 Coder", "DeepSeek-R1", "OpenRouter", "Kaggle", "Unsloth", "LoRA"], "alternates": {"html": "https://wpnews.pro/news/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-and", "markdown": "https://wpnews.pro/news/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-and.md", "text": "https://wpnews.pro/news/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-and.txt", "jsonld": "https://wpnews.pro/news/how-i-built-aether-a-local-ai-assistant-that-controls-my-pc-sends-whatsapp-and.jsonld"}}