I built an LLM router that doesn't use an LLM Developer Lore released Wayfinder, an open-source LLM router that determines whether to send a prompt to a local or cloud model by analyzing structural features like length, headings, and code, without calling any model for the routing decision. The tool runs offline in microseconds, aims to reduce costs by routing simple prompts locally, and is designed to be calibrated on user traffic. Deterministic prompt-complexity routing — send each prompt to your local or cloud model, offline, with no model call to decide. Quickstart quickstart · Benchmark /itsthelore/wayfinder-router/blob/main/benchmarks/README.md · How it compares how-it-compares · Explainer /itsthelore/wayfinder-router/blob/main/EXPLAINER.md · Changelog /itsthelore/wayfinder-router/blob/main/CHANGELOG.md No model callto decide the route | Deterministicand fully offline | Calibrateon your own data | Bring your own keyself-hosted | Wayfinder reads the shape of a prompt — its length, headings, lists, and code — plus difficulty cues in the wording, like proofs, math, and hard constraints, and tells you whether to send it to your small local model or your big cloud one. It decides in microseconds, runs offline, and never calls another model to make the call. No API key, no network, no model call to decide. You get a score and a recommendation; what you do with it is up to you. Cheap prompts stay local, hard ones go to the expensive model, and you stop paying frontier prices for "summarize this" and "fix my typo." Most routers decide by calling a model: a trained classifier, an LLM judge, or a hosted API. That adds latency, cost, and a little randomness to the exact step that is meant to save you money. Wayfinder reads structure and wording instead, so the decision is free and the same every time. | router | decides by | model call? | self-host | calibrate | |---|---|---|---|---| Wayfinder | deterministic structural score | no | yes | yes | | RouteLLM | trained classifier preference data | yes | yes | retrain | | NotDiamond / Martian | learned, hosted | yes | no | via platform | | OpenRouter Auto | hosted auto-router | yes | no | — | | LiteLLM | provider proxy not complexity-routed | no | yes | n/a | Wayfinder is not chasing a top accuracy number. It is the one router you can run offline, with zero model calls, and tune on your own traffic. By default it scores prompt structure only. It can also read lexical cues proofs, math, constraints , but those ship off by default : a double-blind test /itsthelore/wayfinder-router/blob/main/benchmarks/blind-eval.md on independently-authored prompts showed the lexical lift does not generalize it catches ~20% of unseen hard prompts and loses to a plain word-count baseline , so they are opt-in — raise their weights only if you've calibrated them to your own traffic's vocabulary. A prompt whose difficulty is purely semantic — a subtle code snippet, an innocent-looking "what is the 100th prime number?" — has no structural tell, and a semantic router will beat it there. The edge that survives the blind test is the one to lead with: a deterministic, sub-millisecond, offline routing decision with no model call. The benchmark /itsthelore/wayfinder-router/blob/main/benchmarks/README.md make benchmark shows where it wins and where it loses, against honest baselines and a perfect oracle. Point it at RouterBench or RouterArena for graded numbers. New here, or weighing it up? The FAQ /itsthelore/wayfinder-router/blob/main/docs/faq.md gives straight answers — including where it loses it's no better than random on RouterBench's short-but-hard items and why you'd still run it. Two ways to see the routing decision for yourself — no API keys, no models, nothing on the network. In your terminal — a decision-first chat in the Wayfinder palette. The terminal chat ships in the default install, so there's nothing extra to add — or run it with no install at all via uvx : uvx wayfinder-router chat --dry-run zero install, zero keys or: pip install wayfinder-router && wayfinder-router chat Every turn shows where it routed ● LOCAL / ◆ CLOUD , the structural score and why /why , and the running savings vs always-cloud. /init sets up models without leaving the chat, /route · /local · /cloud force a turn, and conversations persist across sessions /threads . In your browser — the web chat UI with a live threshold slider: pip install "wayfinder-router gateway " wayfinder-router webchat --dry-run opens http://127.0.0.1:8088/demo webchat is a thin launcher over serve the gateway and its /demo page; --no-open , --port , --host 0.0.0.0 , --dry-run ; serve is the headless command. Both surfaces show, for every message, where it routed local vs cloud , the complexity score and why the feature breakdown , and the cost saved vs always-cloud. With no config both are decision-only --dry-run for the web; the terminal's preview , so you can poke at it with zero setup. To get real replies, run wayfinder-router init to scaffold gateway.models then wayfinder-router doctor to confirm your keys resolve — see Quickstart quickstart . Wayfinder forwards each call to an OpenAI-style /chat/completions endpoint — so if your provider speaks that and most do , it just works. A tier is one base url , a model name, and a key read from the environment at request time; no SDK, no per-provider code. Pair a free local model with a hosted one, or run two cloud tiers. …plus Groq, Together, OpenRouter, Fireworks, DeepSeek, and local servers vLLM, LM Studio, llama.cpp — + any OpenAI-compatible endpoint that takes a Bearer key. Put Wayfinder in front of your models. Your app keeps speaking the OpenAI API; you just change one base url . - Scaffold a config — init writes a starter wayfinder-router.toml keyless local Ollama → Anthropic cloud plus a .env.example , then checks your keys: pip install "wayfinder-router gateway " wayfinder-router init starter config hybrid preset wayfinder-router init --preset openai two OpenAI tiers gpt-4o-mini → gpt-4o wayfinder-router init --preset gemini two Gemini tiers gemini-2.5-flash → gemini-2.5-pro wayfinder-router init --interactive pick providers/models step by step Or describe your two models in wayfinder-router.toml by hand: php routing threshold = 0.5 below - local, at/above - cloud gateway.models.local base url = "http://localhost:11434/v1" model = "llama3.2" gateway.models.cloud base url = "https://api.openai.com/v1" model = "gpt-4o" api key env = "OPENAI API KEY" read from this env var, never stored api key cmd = "op read op://Private/OpenAI/credential" optional: fill it from a vault Wayfinder never stores secrets: a model names an env var api key env and the key is read from your environment at request time. There is nothing to "install" — just export the variable. Prefer not to paste a raw key into your shell? Add an optional api key cmd and Wayfinder fills that variable from your secret store at startup — op read … 1Password , security … macOS Keychain , secret-tool … Linux , pass / gopass , vault kv get … , aws secretsmanager get-secret-value … , bw , doppler , gcloud secrets … , or any command that prints the secret. The key is held in memory only, still never written to disk. wayfinder-router doctor detects which of these tools you have installed and suggests the exact line. - Set your key s , then run the gateway. doctor re-checks the config and whether each model's key resolves ✓ set / ✗ not set before you start: export ANTHROPIC API KEY=sk-... or OPENAI API KEY, per your config wayfinder-router doctor ✓/✗ per model — is each key set? wayfinder-router serve --port 8088 - Point your existing client at it. No code change: client = openai.OpenAI base url="http://localhost:8088/v1", api key="unused" client.chat.completions.create model="auto", messages= {"role": "user", "content": "..."} Easy prompts go local, hard ones go cloud, and every response carries x-wayfinder-router-model and x-wayfinder-router-score so you can see where it went. Want to steer one request? Pin it with model="cloud" / prefer-local , or move the cut for a single call with an X-Wayfinder-Threshold header see Steer a single request steer-a-single-request . Check it's working: curl -s localhost:8088/healthz {"status":"ok","models": "cloud","local" } curl -s -D - -o /dev/null http://localhost:8088/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"auto","messages": {"role":"user","content":"hi"} }' \ | grep -i x-wayfinder-router x-wayfinder-router-model: local x-wayfinder-router-score: 0.00 No backends yet? wayfinder-router serve --dry-run answers with the routing decision instead of calling an upstream, so you can feel the routing in 30 seconds before wiring up real models. | command | what you get | |---|---| pip install wayfinder-router | scorer, CLI, Python API, and the terminal chat chat ; the scorer/library imports stay dependency-light | pip install "wayfinder-router gateway " | adds the OpenAI-compatible routing gateway, the common case for serving | pip install "wayfinder-router ui " | adds the local calibrate / explain / configure UI | pip install "wayfinder-router all " | gateway and UI on top of the default install | Wayfinder sits behind whatever OpenAI-compatible client you already use. You point that client's base url at the gateway once, and from then on it is invisible. The same client serves a request whether it routes local or hosted. your client chat app, IDE, agent, or code | v Wayfinder gateway scores, picks a model | |-- low -- local Ollama, vLLM |-- high -- hosted OpenAI, any /v1 | v response returns via the same client, with x-wayfinder-router- headers A few things follow from this: The interface in front is yours. A chat GUI Open WebUI, LibreChat , an IDE assistant with a custom endpoint Cursor, Continue , an agent framework, or your own code on the OpenAI SDK. Want a chat window today? Put Open WebUI in front and point it at the gateway. Local and hosted are backends, not apps. The local model is just a server Ollama, LM Studio, vLLM, llama.cpp speaking OpenAI's /v1 ; the hosted one is the same shape. The user never switches UIs and usually never knows which model answered. The score is computed, not a second opinion. Asking a model how hard a prompt is would be slow, non-deterministic, and would cost a model call to decide whether to make a model call. Wayfinder scans the prompt instead — structure length, headings, steps, links, code, tables and difficulty cues in the wording reasoning terms, math symbols, constraints — into a 0.0 - 1.0 value and compares it to your threshold. Same prompt, same threshold, same answer. It is a proxy for difficulty, not a verdict, which is why the threshold is yours to tune. Keys are read from the environment at request time and never touch the config file or the scored path. echo "Summarise this paragraph in one sentence." | wayfinder-router route - Recommended Model: local Complexity Score: 0.00 mode: tiered Tiers: = 0.00 local <- = 0.50 cloud Contributing Features: Word Count: 6 ... Add --json for machine consumers an agent reads this and routes to its own model : { "schema version": "3", "score": 0.66, "recommendation": "cloud", "mode": "tiered", "features": { "word count": 545, "heading count": 12, "reasoning term count": 3, "...": 0 }, "tiers": { "min score": 0.0, "model": "local" }, { "min score": 0.5, "model": "cloud" } } Wayfinder reads its own wayfinder-router.toml , found by walking up from where you run it. There are three modes, in precedence order classifier tiers threshold ; the scalar-score weights apply to any of them. Binary the default is a single cut: routing threshold = 0.6 weights = { word count = 4.0, list item count = 2.5 } --threshold N overrides it for one run; WAYFINDER ROUTER THRESHOLD overrides it from the environment. To switch the lexical cues on, raise their weights and cut at the knee — the one held-out improvement over the structural default on real frontier traffic skill −0.038 → +0.057, 61% cost saved on RouterBench . See docs/lexical-routing.md /itsthelore/wayfinder-router/blob/main/docs/lexical-routing.md and the ready-to-edit ; recalibrate the threshold to your own traffic a ~20-prompt bootstrap is only a smoke test — see /itsthelore/wayfinder-router/blob/main/examples/wayfinder-router.lexical.toml examples/wayfinder-router.lexical.toml . /itsthelore/wayfinder-router/blob/main/benchmarks/calibration-eval.md benchmarks/calibration-eval.md Tiered routes ordered score bands to any number of models: routing.tiers min score = 0.0 model = "llama-3b" routing.tiers min score = 0.3 model = "llama-70b" routing.tiers min score = 0.6 model = "claude-cloud" Classifier is a fitted multinomial-logistic model, argmax over per-model linear scores. You usually generate it with calibrate rather than write it by hand. Each gateway.models.