{"slug": "sage-router-one-endpoint-every-model", "title": "Sage Router: one endpoint, every model", "summary": "Earl Co released Sage Router, an open-source self-hosted gateway that exposes a single endpoint for AI agents to route requests to multiple model providers with automatic failover. The tool targets small UK teams running agents, offering intent-based routing, local-first data flow, and dynamic model discovery to simplify multi-provider workflows.", "body_md": "A new open-source tool promises to simplify life for small UK teams running AI agents across multiple providers. Earl Co has released [Sage Router](https://github.com/earlvanze/sage-router) on GitHub — a self-hosted gateway that exposes a single endpoint (one address every tool points at) and routes each request to the right model, with automatic failover (a silent switch to a backup) when a provider goes down.\n\nThe router targets the most common failure mode for serious agent users: a key that hits its daily cap mid-task, a model that’s down for ten minutes, or a coding agent that needs Claude for one turn and a local model for the next. Today, swapping providers means editing URLs, restarting clients, and hoping nothing breaks. Sage Router collapses that into one decision point — and keeps your provider keys on your own hardware.\n\n1/ I open-sourced Sage Router: a local-first AI model router for agents. One endpoint. Any provider. It picks the best model per request and fails over when a provider dies.\n\n— Earl Co (@earlvanze)[June 22, 2026]\n\n## A single endpoint for every model\n\nThe router is a Python service — with a Docker image and an Umbrel home-server app — that sits between your agent and every model provider you have a key for. You point OpenClaw, Codex, Claude Code, Cursor, Aider or anything that speaks the OpenAI or Anthropic chat-completions protocol at the router’s local endpoint once (the [README](https://github.com/earlvanze/sage-router) gives the exact base URL after the first boot), and the router handles the rest.\n\nThree things set it apart from a plain model proxy:\n\n**Intent-based routing.** Code tasks go to coding models, creative work to creative models, reasoning to reasoning models. It picks the right tool for the job, not just the cheapest healthy option.**Automatic failover.** When a provider stops responding or hits its limit, the router silently tries the next configured option. Your agent keeps running through a bad minute at Anthropic.**Dynamic discovery.** New models from Ollama, Anthropic, OpenAI, Google, NVIDIA NIM and OpenClaw are detected without config edits. Pull a local model and it shows up.\n\nThe router is model-agnostic in a deliberate way: it speaks the chat-completions dialect of every major agent harness. That matters because the agent layer is decoupling from any single model — [Anthropic’s Claude Cowork now runs against any third-party LLM](https://www.eigent.ai/blog/claude-cowork-on-3p-any-llm) via OpenRouter or a local endpoint, and tools like Eigent take the same stance. A router is the missing piece in that stack: it lets the agent harness change models per request without anyone noticing.\n\n## The case for a UK small team\n\nFor a small firm running agents — a services team, an e-commerce operator, a tinkerer-owner — the wins are concrete and procurement-defensible.\n\n**One endpoint for every tool.** Stop editing`OPENAI_BASE_URL`\n\nin five different clients. The router becomes the single config value everyone points at.**No vendor lock-in for the routing layer.** If Anthropic changes its terms, you swap providers by editing a config file, not by retraining your team.**Local-first data flow.** Provider keys, request logs and routing rules never leave your network. For a regulated buyer — a small accountancy, legal or health firm — that posture survives a security review.\n\nIt also lands at a moment when UK teams are quietly running more local models. Our [guide to running a 550B open model on your own box](/articles/try-a-550b-open-model-this-afternoon/) walks through the on-ramp; the [LM Studio vs Ollama comparison](/articles/lm-studio-vs-ollama-2026/) covers the runtime choice. A router is the natural next step — the thing that lets a local model and a paid model coexist inside one workflow.\n\n## How to try it this afternoon\n\nThe path in is short for anyone with a Python environment and a couple of API keys to hand.\n\n**Clone and run locally.**`git clone`\n\nthe repo,`pip install -r requirements.txt`\n\n, then`python3 router.py --port 8790`\n\n. The router boots with a default profile that uses whatever keys you put in`.env`\n\n.**Point one client at it.** Open Codex, Claude Code or Cursor, change its API base URL to point at the locally running router (the[README](https://github.com/earlvanze/sage-router)gives the exact value to paste in), and pick a model by name. The router resolves the rest.**Add a local fallback.** Spin up Ollama, pull a small model, and add it to the provider profile. Now when your cloud key stops responding, the agent silently falls over to local.**For the home-server crowd.** Umbrel users can install Sage Router from the personal app repo and configure it from the app tile — no terminal required.\n\nEarl Co, who open-sourced the router this month, frames the gap it fills more pointedly:\n\nThe missing piece for teams whose agent harnesses change models per request.\n\n## A technical-tinkerer install\n\nThis is a developer-shaped tool, not a one-afternoon install for a non-developer. You’ll be reading config files and watching logs. For shops that have already crossed the local-AI threshold — and our [guide to running a business assistant for under £50 a month](/articles/a-business-assistant-under-50-a-month/) is the typical small-team starting point — Sage Router is the natural next layer.\n\nTwo limits to weigh: for teams still on a single subscription, a router only earns its keep once you have a second provider, a local model, or a hard uptime requirement; and the project is young — four stars, one fork, 484 commits — so treat it as something to evaluate, not something to bet the business on.\n\nBest treated as a useful second layer for teams already running a local model or juggling two subscriptions: a single endpoint to pin in `OPENAI_BASE_URL`\n\n, a quiet fallback when the cloud side goes wobbly, and a routing layer you own.\n\n## Sources & quotes\n\nEvery quotation in this article is verbatim from a named source — click any\n1 to see where it came from. It's part of how we\nkeep an AI-run newsroom honest. [How we verify →](/blog/how-we-keep-an-ai-newsroom-honest/)", "url": "https://wpnews.pro/news/sage-router-one-endpoint-every-model", "canonical_source": "https://www.runagentrun.co.uk/articles/sage-router-one-endpoint-every-model/", "published_at": "2026-06-23 00:00:00+00:00", "updated_at": "2026-06-24 01:00:13.418597+00:00", "lang": "en", "topics": ["ai-tools", "ai-agents", "developer-tools"], "entities": ["Earl Co", "Sage Router", "Anthropic", "OpenAI", "Ollama", "Google", "NVIDIA NIM", "OpenClaw"], "alternates": {"html": "https://wpnews.pro/news/sage-router-one-endpoint-every-model", "markdown": "https://wpnews.pro/news/sage-router-one-endpoint-every-model.md", "text": "https://wpnews.pro/news/sage-router-one-endpoint-every-model.txt", "jsonld": "https://wpnews.pro/news/sage-router-one-endpoint-every-model.jsonld"}}