{"slug": "i-built-an-ai-that-decides-which-ai-to-talk-to-running-24-7-from-my-living-room", "title": "I Built an AI That Decides Which AI to Talk To — Running 24/7 From My Living Room", "summary": "Based on the article, the author built an autonomous AI agent called OpenClaw that runs 24/7 on a Raspberry Pi and manages tasks like research, coding, and document editing. To optimize costs and performance, the author created a lightweight Python router that automatically directs simple requests to a local, free LLM on a Mac Mini and complex reasoning tasks to paid frontier models. The system uses Google's open-source AgentGateway to unify the endpoints, handle authentication, and provide observability without the client agent knowing which backend model is used.", "body_md": "Last Saturday when I waked up, my AI agent reviewed 14 restaurant ratings in Indiranagar, updated a shared Google Sheet, signed a 20-page PDF I'd been ignoring for a week, and wrote a bash script to clean up my server logs.\nI didn't ask it to do any of that. It just... does things now.\nMeet OpenClaw — my long-running autonomous agent that lives on a Raspberry Pi, plugged into Discord, running 24/7. It manages my memory, handles research, writes code, edits documents, finds the best weekend spots in Bangalore by scraping live ratings — basically, it runs half my life on autopilot.\nBut a few weeks ago, I noticed something that bothered me.\nI asked it: \"Write a Python script to parse JSON logs.\" Simple coding task. It sent that request to a cloud API, waited 3 seconds, burned tokens I paid for, and came back with an answer — when I had a perfectly capable local LLM sitting idle on my Mac Mini, three feet away.\nThen I asked: \"Think step by step about the trade-offs between event-driven vs polling architecture for my notification system.\" That's a hard reasoning question. I want that going to a frontier model. That's worth the tokens.\nSame agent. Same endpoint. Completely different needs.\nAnd that's when a stupid idea hit me:\nWhat if the system could figure out which brain to use — before the request even reaches a model?\nTurns out, it's not stupid at all. And it took me a weekend, a Raspberry Pi, a Mac Mini, 50 lines of Python, and an open-source gateway to build it.\nHere's how.\nHere's what's running in my living room:\nRaspberry Pi → Runs OpenClaw, my autonomous agent. It takes input from Discord, manages context, memory, and orchestrates everything.\nMac Mini → The brain farm. Runs three things:\nOllama with qwen2.5-coder:7b — a local coding model that never leaves my network\nAgentGateway — an open-source AI gateway from Google that handles routing, auth, observability\nA lightweight Python router — the \"intent classifier\" I wrote in ~50 lines of code\nThe magic? OpenClaw doesn't know any of this is happening. It just sends a request to one endpoint. Behind the scenes, the system figures out the rest.\nThree models. Three price points. One unified endpoint. OpenClaw just hits http://192.168.1.15:1234/v1/chat/completions and forgets about it.\nI evaluated a few options — raw Envoy, Nginx with Lua scripting, even building a full proxy from scratch. But AgentGateway stood out for a few reasons:\nWhat it gives you out of the box:\nProtocol translation — It speaks OpenAI-compatible API on the frontend, but can talk to Gemini, Vertex AI, Bedrock, Ollama, and more on the backend. I don't write a single line of provider-specific code.\nBackend authentication — API keys are managed at the gateway level. OpenClaw never sees or stores any API key. I just set backendAuth: key: $GEMINI_API_KEY in the config and it handles the rest.\nModel aliasing — OpenClaw sends model: \"inteli-llm\" in every request. AgentGateway silently translates that to qwen2.5-coder:7b, gpt-4o, or gemini-2.5-flash depending on which route matched. The client has no idea.\nObservability — Every request gets logged with provider name, model, token counts, and latency. I can see exactly how many tokens are going to OpenAI vs staying local.\nPrompt guards & rate limiting — Built-in regex-based PII masking, webhook-based content moderation, and rate limiting. Enterprise-grade features I get for free.\nWeighted load balancing & failover — If Ollama crashes (it happens), I can configure automatic failover to a cloud model. No downtime.\nWhat it doesn't do (yet): Content-aware routing. AgentGateway routes based on path, headers, and methods — which is the right design for a gateway. It doesn't peek into your request body to decide where to send it. That's a feature, not a bug — gateways should be fast and protocol-level, not parsing JSON payloads.\nBut I needed content-aware routing. So instead of searching for other tool, I extended it.\nI wrote a tiny FastAPI proxy that sits in front of AgentGateway. Here's what it does:\ncode\n, python\n, script\n, function\n, bug\n? → codingthink\n, analyze\n, reasoning\n, deduce\n? Or prompt > 400 chars? → reasoningcoding_keywords = [\"code\", \"python\", \"javascript\", \"bash\", \"script\", \"function\", \"bug\"]\nreasoning_keywords = [\"think\", \"analyze\", \"explain in detail\", \"reasoning\", \"logic\", \"deduce\"]\nif any(k in prompt_lower for k in coding_keywords):\nintent = \"coding\"\nelif len(prompt) > 400 or any(k in prompt_lower for k in reasoning_keywords):\nintent = \"reasoning\"\nelse:\nintent = \"simple\"\nHere's what this setup actually saves me:\nBefore this setup, every single request was going to a cloud API. Now, roughly 60-70% of my queries stay local — coding questions, quick lookups, simple formatting tasks. They're fast, free, and private.\nThe expensive reasoning model only gets called when I genuinely need it. And the mid-tier Gemini handles everything in between.\nMy monthly API bill dropped significantly, and the local responses are actually faster.\n1. Header-based routing over path-based routing Initially, I was going to use URL paths (/coding\n, /reasoning\n, /simple\n) and strip them with URL rewriting. But header injection is cleaner — the original request path stays intact, and AgentGateway's header matching is first-class.\n2. Classification at the proxy, not the gateway I could have tried to use AgentGateway's CEL expressions or ExtProc policies for classification. But those run after backend selection, not before. Keeping classification in a separate lightweight layer means I can swap algorithms without touching my gateway config.\n3. Keyword heuristics over ML classifiers Could I use a small classifier model or even RouteLLM for smarter routing? Absolutely. But for a homelab, keyword matching is:\n4. One unified model name OpenClaw sends model: \"inteli-llm\"\nfor everything. AgentGateway's modelAliases\nfeature translates it per-route. This means I can swap out backend models without touching a single line of OpenClaw's config. Last week it was gemini-1.5-flash\n, this week it's gemini-2.5-flash\n. OpenClaw never knew.\nSmarter classification — Maybe a tiny local classifier model, or even using the first few tokens of a response to reclassify and retry on a better model.\nMetrics dashboard — AgentGateway already emits OpenTelemetry traces. I want to hook up a Grafana dashboard to see which models are handling what, with latency and token breakdowns.\nFailover chains — If Ollama is under heavy load, automatically fall back to Gemini for coding tasks. AgentGateway supports priority groups for this.\nMore agents — OpenClaw is just the beginning. I want to run specialized agents for different domains, all routing through the same gateway.\nYou don't need a Kubernetes cluster or a $10K GPU server to build a multi-model AI system. A Raspberry Pi, a Mac Mini, an open-source gateway, and 50 lines of Python got me:\n✅ An always-on autonomous agent ✅Intelligent routing ✅across 3 different LLMs ✅Local-first for privacy and speed ✅Cloud when I need the horsepower ✅Zero API keys exposed to the client ✅A monthly bill I actually don't mind paying\nThe best part? The entire config is a single YAML file and a single Python script. No Docker. No Kubernetes. No Terraform. Just two processes on a Mac Mini and an agent on a Pi.\nSometimes the best infrastructure is the one you can explain in a napkin sketch.\nIf you're building something similar or want to see the config files, drop a comment — happy to share the full setup.", "url": "https://wpnews.pro/news/i-built-an-ai-that-decides-which-ai-to-talk-to-running-24-7-from-my-living-room", "canonical_source": "https://dev.to/anup_sharma_86fa94612fe3c/i-built-an-ai-that-decides-which-ai-to-talk-to-running-247-from-my-living-room-211p", "published_at": "2026-05-23 07:43:58+00:00", "updated_at": "2026-05-23 08:01:55.825546+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "developer-tools"], "entities": ["OpenClaw", "Raspberry Pi", "Mac Mini", "Discord", "Indiranagar", "Bangalore", "Google Sheet", "Python"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-ai-that-decides-which-ai-to-talk-to-running-24-7-from-my-living-room", "markdown": "https://wpnews.pro/news/i-built-an-ai-that-decides-which-ai-to-talk-to-running-24-7-from-my-living-room.md", "text": "https://wpnews.pro/news/i-built-an-ai-that-decides-which-ai-to-talk-to-running-24-7-from-my-living-room.txt", "jsonld": "https://wpnews.pro/news/i-built-an-ai-that-decides-which-ai-to-talk-to-running-24-7-from-my-living-room.jsonld"}}