{"slug": "i-built-an-llm-router-that-doesn-t-use-an-llm", "title": "I built an LLM router that doesn't use an LLM", "summary": "Developer Lore released Wayfinder, an open-source LLM router that determines whether to send a prompt to a local or cloud model by analyzing structural features like length, headings, and code, without calling any model for the routing decision. The tool runs offline in microseconds, aims to reduce costs by routing simple prompts locally, and is designed to be calibrated on user traffic.", "body_md": "**Deterministic prompt-complexity routing — send each prompt to your\nlocal or cloud model, offline, with no model call to decide.**\n\n[Quickstart](#quickstart) ·\n[Benchmark](/itsthelore/wayfinder-router/blob/main/benchmarks/README.md) ·\n[How it compares](#how-it-compares) ·\n[Explainer](/itsthelore/wayfinder-router/blob/main/EXPLAINER.md) ·\n[Changelog](/itsthelore/wayfinder-router/blob/main/CHANGELOG.md)\n\nNo model callto decide the route |\nDeterministicand fully offline |\nCalibrateon your own data |\nBring your own keyself-hosted |\n\nWayfinder reads the shape of a prompt — its length, headings, lists, and code — plus difficulty cues in the wording, like proofs, math, and hard constraints, and tells you whether to send it to your small local model or your big cloud one. It decides in microseconds, runs offline, and never calls another model to make the call. No API key, no network, no model call to decide. You get a score and a recommendation; what you do with it is up to you.\n\nCheap prompts stay local, hard ones go to the expensive model, and you stop paying frontier prices for \"summarize this\" and \"fix my typo.\"\n\nMost routers decide by calling a model: a trained classifier, an LLM judge, or a hosted API. That adds latency, cost, and a little randomness to the exact step that is meant to save you money. Wayfinder reads structure and wording instead, so the decision is free and the same every time.\n\n| router | decides by | model call? | self-host | calibrate |\n|---|---|---|---|---|\nWayfinder |\ndeterministic structural score | no |\nyes |\nyes |\n| RouteLLM | trained classifier (preference data) | yes | yes | retrain |\n| NotDiamond / Martian | learned, hosted | yes | no | via platform |\n| OpenRouter (Auto) | hosted auto-router | yes | no | — |\n| LiteLLM | provider proxy (not complexity-routed) | no | yes | n/a |\n\nWayfinder is not chasing a top accuracy number. It is the one router you can run\noffline, with zero model calls, and tune on your own traffic. By default it scores\nprompt *structure* only. It can also read lexical cues (proofs, math, constraints),\nbut those ship **off by default**: a [double-blind test](/itsthelore/wayfinder-router/blob/main/benchmarks/blind-eval.md)\non independently-authored prompts showed the lexical lift does *not* generalize (it\ncatches ~20% of unseen hard prompts and loses to a plain word-count baseline), so\nthey are opt-in — raise their weights only if you've calibrated them to your own\ntraffic's vocabulary. A prompt whose difficulty is purely semantic — a subtle code\nsnippet, an innocent-looking \"what is the 100th prime number?\" — has no structural\ntell, and a semantic router will beat it there. The edge that survives the blind\ntest is the one to lead with: a deterministic, sub-millisecond, offline routing\ndecision with no model call. The [benchmark](/itsthelore/wayfinder-router/blob/main/benchmarks/README.md) (`make benchmark`\n\n)\nshows where it wins and where it loses, against honest baselines and a perfect\noracle. Point it at RouterBench or RouterArena for graded numbers.\n\nNew here, or weighing it up? The [FAQ](/itsthelore/wayfinder-router/blob/main/docs/faq.md) gives straight answers —\nincluding where it loses (it's no better than random on RouterBench's short-but-hard\nitems) and why you'd still run it.\n\nTwo ways to see the routing decision for yourself — no API keys, no models, nothing on the network.\n\n**In your terminal** — a decision-first chat in the Wayfinder palette. The terminal\nchat ships in the default install, so there's nothing extra to add — or run it with\nno install at all via `uvx`\n\n:\n\n```\nuvx wayfinder-router chat --dry-run      # zero install, zero keys\n# or:  pip install wayfinder-router && wayfinder-router chat\n```\n\nEvery turn shows where it routed (`● LOCAL`\n\n/ `◆ CLOUD`\n\n), the structural score and *why*\n(`/why`\n\n), and the running savings vs always-cloud. `/init`\n\nsets up models without leaving\nthe chat, `/route`\n\n· `/local`\n\n· `/cloud`\n\nforce a turn, and conversations persist across\nsessions (`/threads`\n\n).\n\n**In your browser** — the web chat UI with a live threshold slider:\n\n```\npip install \"wayfinder-router[gateway]\"\nwayfinder-router webchat --dry-run\n# opens http://127.0.0.1:8088/demo\n```\n\n`webchat`\n\nis a thin launcher over `serve`\n\n(the gateway and its `/demo`\n\npage; `--no-open`\n\n,\n`--port`\n\n, `--host 0.0.0.0`\n\n, `--dry-run`\n\n); `serve`\n\nis the headless command. Both surfaces\nshow, for every message, where it routed (local vs cloud), the complexity score and *why*\n(the feature breakdown), and the cost saved vs always-cloud. With no config both are\ndecision-only (`--dry-run`\n\nfor the web; the terminal's preview), so you can poke at it with\nzero setup. To get real replies, run `wayfinder-router init`\n\nto scaffold `[gateway.models]`\n\n(then `wayfinder-router doctor`\n\nto confirm your keys resolve) — see [Quickstart](#quickstart).\n\nWayfinder forwards each call to an OpenAI-style `/chat/completions`\n\nendpoint — so if\nyour provider speaks that (and most do), **it just works.** A tier is one `base_url`\n\n,\na model name, and a key read from the environment at request time; no SDK, no\nper-provider code. Pair a free local model with a hosted one, or run two cloud tiers.\n\n…plus Groq, Together, OpenRouter, Fireworks, DeepSeek, and local servers\n(vLLM, LM Studio, llama.cpp) — + any OpenAI-compatible endpoint\nthat takes a Bearer key.\n\nPut Wayfinder in front of your models. Your app keeps speaking the OpenAI API; you\njust change one `base_url`\n\n.\n\n-\nScaffold a config —\n\n`init`\n\nwrites a starter`wayfinder-router.toml`\n\n(keyless local Ollama → Anthropic cloud) plus a`.env.example`\n\n, then checks your keys:\n\n```\npip install \"wayfinder-router[gateway]\"\nwayfinder-router init                 # starter config (hybrid preset)\nwayfinder-router init --preset openai # two OpenAI tiers (gpt-4o-mini → gpt-4o)\nwayfinder-router init --preset gemini # two Gemini tiers (gemini-2.5-flash → gemini-2.5-pro)\nwayfinder-router init --interactive   # pick providers/models step by step\n```\n\nOr describe your two models in\n\n`wayfinder-router.toml`\n\nby hand:\n\n``` php\n[routing]\nthreshold = 0.5            # below -> local, at/above -> cloud\n\n[gateway.models.local]\nbase_url = \"http://localhost:11434/v1\"\nmodel = \"llama3.2\"\n\n[gateway.models.cloud]\nbase_url = \"https://api.openai.com/v1\"\nmodel = \"gpt-4o\"\napi_key_env = \"OPENAI_API_KEY\"   # read from this env var, never stored\n# api_key_cmd = \"op read op://Private/OpenAI/credential\"  # optional: fill it from a vault\n```\n\nWayfinder never stores secrets: a model names an env var (\n\n`api_key_env`\n\n) and the key is read from your environment at request time. There is nothing to \"install\" — just export the variable. Prefer not to paste a raw key into your shell? Add an optional`api_key_cmd`\n\nand Wayfinder fills that variable from your secret store at startup —`op read …`\n\n(1Password),`security …`\n\n(macOS Keychain),`secret-tool …`\n\n(Linux),`pass`\n\n/`gopass`\n\n,`vault kv get …`\n\n,`aws secretsmanager get-secret-value …`\n\n,`bw`\n\n,`doppler`\n\n,`gcloud secrets …`\n\n, or any command that prints the secret. The key is held in memory only, still never written to disk.`wayfinder-router doctor`\n\ndetects which of these tools you have installed and suggests the exact line. -\nSet your key(s), then run the gateway.\n\n`doctor`\n\nre-checks the config and whether each model's key resolves (`✓ set`\n\n/`✗ not set`\n\n) before you start:\n\n```\nexport ANTHROPIC_API_KEY=sk-...     # or OPENAI_API_KEY, per your config\nwayfinder-router doctor             # ✓/✗ per model — is each key set?\nwayfinder-router serve --port 8088\n```\n\n-\nPoint your existing client at it. No code change:\n\n```\nclient = openai.OpenAI(base_url=\"http://localhost:8088/v1\", api_key=\"unused\")\nclient.chat.completions.create(model=\"auto\", messages=[{\"role\": \"user\", \"content\": \"...\"}])\n```\n\nEasy prompts go local, hard ones go cloud, and every response carries\n`x-wayfinder-router-model`\n\nand `x-wayfinder-router-score`\n\nso you can see where it\nwent. Want to steer one request? Pin it with `model=\"cloud\"`\n\n/ `prefer-local`\n\n, or\nmove the cut for a single call with an `X-Wayfinder-Threshold`\n\nheader (see\n[Steer a single request](#steer-a-single-request)).\n\nCheck it's working:\n\n```\ncurl -s localhost:8088/healthz\n# {\"status\":\"ok\",\"models\":[\"cloud\",\"local\"]}\n\ncurl -s -D - -o /dev/null http://localhost:8088/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"auto\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}]}' \\\n  | grep -i x-wayfinder-router\n# x-wayfinder-router-model: local\n# x-wayfinder-router-score: 0.00\n```\n\nNo backends yet? `wayfinder-router serve --dry-run`\n\nanswers with the routing\ndecision instead of calling an upstream, so you can feel the routing in 30 seconds\nbefore wiring up real models.\n\n| command | what you get |\n|---|---|\n`pip install wayfinder-router` |\nscorer, CLI, Python API, and the terminal chat (`chat` ); the scorer/library imports stay dependency-light |\n`pip install \"wayfinder-router[gateway]\"` |\nadds the OpenAI-compatible routing gateway, the common case for serving |\n`pip install \"wayfinder-router[ui]\"` |\nadds the local calibrate / explain / configure UI |\n`pip install \"wayfinder-router[all]\"` |\ngateway and UI on top of the default install |\n\nWayfinder sits behind whatever OpenAI-compatible client you already use. You point\nthat client's `base_url`\n\nat the gateway once, and from then on it is invisible. The\nsame client serves a request whether it routes local or hosted.\n\n```\n  your client   (chat app, IDE, agent, or code)\n       |\n       v\n  Wayfinder gateway   scores, picks a model\n       |\n       |-- low  -->  local    (Ollama, vLLM)\n       |-- high -->  hosted   (OpenAI, any /v1)\n       |\n       v\n  response returns via the same client,\n  with x-wayfinder-router-* headers\n```\n\nA few things follow from this:\n\n**The interface in front is yours.** A chat GUI (Open WebUI, LibreChat), an IDE assistant with a custom endpoint (Cursor, Continue), an agent framework, or your own code on the OpenAI SDK. Want a chat window today? Put Open WebUI in front and point it at the gateway.**Local and hosted are backends, not apps.** The local model is just a server (Ollama, LM Studio, vLLM, llama.cpp) speaking OpenAI's`/v1`\n\n; the hosted one is the same shape. The user never switches UIs and usually never knows which model answered.**The score is computed, not a second opinion.** Asking a model how hard a prompt is would be slow, non-deterministic, and would cost a model call to decide whether to make a model call. Wayfinder scans the prompt instead — structure (length, headings, steps, links, code, tables) and difficulty cues in the wording (reasoning terms, math symbols, constraints) — into a`0.0`\n\n-`1.0`\n\nvalue and compares it to your threshold. Same prompt, same threshold, same answer. It is a proxy for difficulty, not a verdict, which is why the threshold is yours to tune.\n\nKeys are read from the environment at request time and never touch the config file or the scored path.\n\n```\necho \"Summarise this paragraph in one sentence.\" | wayfinder-router route -\nRecommended Model: local\nComplexity Score: 0.00  (mode: tiered)\n\nTiers:\n  >= 0.00  local <-\n  >= 0.50  cloud\n\nContributing Features:\n  Word Count: 6\n  ...\n```\n\nAdd `--json`\n\nfor machine consumers (an agent reads this and routes to its own\nmodel):\n\n```\n{\n  \"schema_version\": \"3\",\n  \"score\": 0.66,\n  \"recommendation\": \"cloud\",\n  \"mode\": \"tiered\",\n  \"features\": { \"word_count\": 545, \"heading_count\": 12, \"reasoning_term_count\": 3, \"...\": 0 },\n  \"tiers\": [{ \"min_score\": 0.0, \"model\": \"local\" }, { \"min_score\": 0.5, \"model\": \"cloud\" }]\n}\n```\n\nWayfinder reads its own `wayfinder-router.toml`\n\n, found by walking up from where you\nrun it. There are three modes, in precedence order (classifier > tiers >\nthreshold); the scalar-score `weights`\n\napply to any of them.\n\n**Binary** (the default) is a single cut:\n\n```\n[routing]\nthreshold = 0.6\nweights = { word_count = 4.0, list_item_count = 2.5 }\n```\n\n`--threshold N`\n\noverrides it for one run; `WAYFINDER_ROUTER_THRESHOLD`\n\noverrides it\nfrom the environment.\n\nTo switch the lexical cues on, raise their `weights`\n\nand cut at the knee — the one\nheld-out improvement over the structural default on real frontier traffic (skill\n−0.038 → +0.057, 61% cost saved on RouterBench). See\n[ docs/lexical-routing.md](/itsthelore/wayfinder-router/blob/main/docs/lexical-routing.md) and the ready-to-edit\n\n[; recalibrate the threshold to your own traffic (a ~20-prompt bootstrap is only a smoke test — see](/itsthelore/wayfinder-router/blob/main/examples/wayfinder-router.lexical.toml)\n\n`examples/wayfinder-router.lexical.toml`\n\n[).](/itsthelore/wayfinder-router/blob/main/benchmarks/calibration-eval.md)\n\n`benchmarks/calibration-eval.md`\n\n**Tiered** routes ordered score bands to any number of models:\n\n```\n[[routing.tiers]]\nmin_score = 0.0\nmodel = \"llama-3b\"\n[[routing.tiers]]\nmin_score = 0.3\nmodel = \"llama-70b\"\n[[routing.tiers]]\nmin_score = 0.6\nmodel = \"claude-cloud\"\n```\n\n**Classifier** is a fitted multinomial-logistic model, `argmax`\n\nover per-model\nlinear scores. You usually generate it with `calibrate`\n\nrather than write it by\nhand.\n\nEach `[gateway.models.<name>]`\n\nblock maps a routed name to an upstream `base_url`\n\n, a\n`model`\n\n, and an optional `api_key_env`\n\n(the name of an environment variable, never\nthe secret itself). The gateway is the only part that touches keys or the network;\nthe scorer, config, and calibrator stay pure and offline.\n\nThe cut is a proxy, so tune it against your own traffic. `wayfinder-router calibrate`\n\nreads a labeled JSONL dataset (`{\"text\": ..., \"label\": ...}`\n\n) and prints\na config fragment. It runs offline and never calls a model; the labels are your\nground truth.\n\n```\nwayfinder-router calibrate data.jsonl --mode threshold              # sweep the binary cut\nwayfinder-router calibrate data.jsonl --mode tiers                  # ordinal multi-model\nwayfinder-router calibrate data.jsonl --mode classifier --out wayfinder-router.toml\n```\n\nThe fragment drops straight into `wayfinder-router.toml`\n\n; the accuracy and chosen\nbreakpoints print to stderr. The classifier is fit by deterministic L2-regularized\nNewton/IRLS, pure Python, converging in a handful of iterations.\n\nTo pick a cut in cost terms instead of bare accuracy, use a cost-aware objective.\n`--objective knee`\n\nchooses the cost-aware knee automatically (it maximizes\nquality-recovered × cost-saved — no target to guess, and it can't collapse to\nalways-routing-to-the-expensive-model the way pure accuracy does on skewed labels);\n`--objective cost-quality --target-savings X`\n\ninstead holds a specific savings floor.\nAdd `--weights`\n\nto score with — and emit — custom feature weights, e.g. the lexical\nopt-in, so the output is a complete, deployable config (see\n[ docs/lexical-routing.md](/itsthelore/wayfinder-router/blob/main/docs/lexical-routing.md)):\n\n```\nwayfinder-router calibrate data.jsonl --mode threshold --objective knee \\\n  --costs local=0.2,cloud=1.0 \\\n  --weights reasoning_term_count=5,math_symbol_count=3,constraint_term_count=1.5\n```\n\nCost is metadata only — it shapes the calibrated cut and is reported on the\n`/metrics`\n\nendpoint, but never enters a per-request decision, which stays\ndeterministic and free.\n\nThe deployment's config sets the default boundary, but a client can override the decision for one request over plain OpenAI transport. An override only changes where the request goes; the prompt is still scored, and nothing adds a model call.\n\n**The**`model`\n\nfield is a routing directive.`auto`\n\n(or any normal model id) lets Wayfinder decide; a configured endpoint name (`local`\n\n,`cloud`\n\n) pins the request there;`prefer-local`\n\n/`prefer-hosted`\n\npin to the low / high end of your router (`prefer-cloud`\n\nstill works as an alias of`prefer-hosted`\n\n).**An** for that request, a number in`X-Wayfinder-Threshold`\n\nheader re-cuts the decision`0.0`\n\n-`1.0`\n\nreusing your weights (binary routers only).\n\n```\n# Pin one call to cloud regardless of score:\nclient.chat.completions.create(model=\"cloud\", messages=[...])\n# Or move the cut for one call (keep model=\"auto\"):\nclient.chat.completions.create(\n    model=\"auto\", messages=[...], extra_headers={\"X-Wayfinder-Threshold\": \"0.8\"}\n)\n```\n\nEach response adds `x-wayfinder-router-mode`\n\n(`scored`\n\n/ `pinned`\n\n/\n`threshold-override`\n\n) next to the `-model`\n\nand `-score`\n\nheaders, so you can see\nwhich channel decided the route.\n\nBecause the `model`\n\nfield is a routing directive, any OpenAI-compatible chat UI can\ndrive routing with no code change: the app's normal model dropdown becomes a\nper-conversation routing picker (`auto`\n\n/ `prefer-local`\n\n/ `prefer-hosted`\n\n/ a\npinned endpoint). The gateway lists these at `GET /v1/models`\n\n, so a UI discovers\nthem on its own.\n\n**LibreChat**— copyand`examples/librechat.yaml`\n\ninto your checkout, run`examples/docker-compose.override.yml`\n\n`docker compose up`\n\n, and pick the \"Wayfinder\" endpoint.**Open WebUI**— add an OpenAI connection pointing at the gateway; it auto-discovers the routing options.\n\nSee [ examples/](/itsthelore/wayfinder-router/blob/main/examples) for both. The one thing a stock UI can't express is a\nlive per-conversation threshold slider; that's what the\n\n`wayfinder-chat`\n\nfork adds,\nand this no-fork path proves it out first.Wayfinder's controls are spread across the tools you already run, so it's easy not to notice it working. Four surfaces show or steer routing:\n\n| surface | what it shows | where |\n|---|---|---|\n| Model dropdown | the routing picker (`auto` / `prefer-local` / `prefer-hosted` / a pinned endpoint) |\nyour client, from `GET /v1/models` |\n| Response headers | where each request went and why (`-model` / `-score` / `-mode` / `-request-id` ) |\nevery response |\n| Debug body field | the decision inside the response body, opt-in | request header `X-Wayfinder-Debug: true` |\n| Dashboard | recent decisions, per-model counts, scores — metadata only, never prompt text | `GET /router` (JSON at `/router/recent` ) |\n\nThe dashboard is separate from the off-path `wayfinder-router ui`\n\nconsole, which is\nfor tuning, not production traffic.\n\nDon't guess the cut, learn it from your own judgment of local versus hosted output. The loop is: collect judgments, calibrate, route automatically.\n\nBootstrap it with A/B onboarding. For each sample prompt, `wayfinder-router onboard`\n\nruns both arms and asks which was good enough; the answer is a label:\n\n```\nwayfinder-router onboard prompts.jsonl --arms local,cloud --calibrate > wayfinder-router.toml\n```\n\nThe comparison goes to stderr; `--calibrate`\n\nprints the resulting config to stdout.\nEach judgment appends a `{\"text\", \"label\"}`\n\nline to a feedback log, which is itself\nthe `calibrate`\n\ndataset, so the log turns straight into a config.\n\nOnce you're routing automatically, keep it honest by recording which model was actually good enough:\n\n```\ncurl localhost:8088/v1/feedback -d '{\"text\": \"...\", \"label\": \"cloud\"}'\n```\n\nThen re-fit on a schedule from cron, a k8s CronJob, or a click in the UI.\nRecalibration rewrites only the `[routing]`\n\nsection and preserves your `[gateway]`\n\nendpoints, and a running gateway hot-reloads the result with no restart:\n\n``` php\nwayfinder-router recalibrate                  # log -> calibrate -> write config\nwayfinder-router recalibrate --min-labels 50  # no-op until you have enough signal\n```\n\nThe judging runs models, so it lives in the gateway layer (with your key); the scoring core stays untouched and the log carries no secrets.\n\nThe CLI, onboarding, and UI are for operators and bootstrapping. In production, prompts flow through the gateway (transparent) or the library (in-process), so routing happens where prompts already are.\n\nRun the gateway as a service, sidecar or standalone:\n\n```\ndocker build -t wayfinder-router . && docker run -p 8088:8088 -v \"$PWD/data:/data\" wayfinder-router\n# or: docker compose up gateway   (see docker-compose.example.yml)\n```\n\nPoint your existing client at it with no app change. Anything that speaks the\nOpenAI API takes a `base_url`\n\n, including agent frameworks (LangChain, LlamaIndex),\nIDE assistants with a custom endpoint (Cursor, Continue), and gateways like LiteLLM:\n\n```\nclient = openai.OpenAI(base_url=\"http://localhost:8088/v1\", api_key=\"unused\")\n```\n\nSee ** Integration recipes** for copy-paste setup across chat UIs\n(Open WebUI, LibreChat, Jan), editors (Continue, Cline, Zed, JetBrains), agent frameworks\n(LangChain, LlamaIndex, CrewAI, AutoGen, the OpenAI Agents SDK, the Vercel AI SDK), and\nCLIs (aider, Copilot CLI) — plus the canonical\n\n`OPENAI_BASE_URL`\n\n/ `OPENAI_API_KEY`\n\npair.**Claude Code** speaks Anthropic's Messages API rather than OpenAI's, so the gateway exposes a\n`POST /v1/messages`\n\nadapter (WF-DESIGN-0011) that translates Anthropic ⇄ OpenAI in both\ndirections — streaming and tool use included. Point it at the gateway root and Claude Code\nroutes through Wayfinder like any other client:\n\n```\nexport ANTHROPIC_BASE_URL=\"http://localhost:8088\"   # client appends /v1/messages\nexport ANTHROPIC_API_KEY=\"unused\"                   # the gateway uses each upstream's own key\nclaude\n```\n\nWire feedback from wherever your users are. Your app, IDE, or chat shows a thumbs-up or thumbs-down and posts the judgment; the next recalibration learns from it:\n\n```\nfetch(\"http://localhost:8088/v1/feedback\", {\n  method: \"POST\",\n  body: JSON.stringify({ text: prompt, label: wasGoodEnough ? \"local\" : \"cloud\" }),\n});\n```\n\nThe gateway forwards asynchronously and streams: a request with `stream: true`\n\ncomes back as Server-Sent-Events, so chat clients render tokens as they arrive. An\nupstream timeout or connection failure returns an OpenAI-shaped error instead of a\nbare 500, every response carries a request id for tracing, and routing decisions\nand reload failures are logged. The knobs:\n\n| setting | effect |\n|---|---|\n`WAYFINDER_ROUTER_TIMEOUT` / `serve --timeout` |\nupstream timeout in seconds (default 60) |\n`WAYFINDER_ROUTER_FEEDBACK_TOKEN` |\nwhen set, `/v1/feedback` requires `Authorization: Bearer <token>` |\n`serve --dry-run` |\nreturn routing decisions without calling any upstream |\n`GET /healthz` |\nreports `degraded` and lists `missing_keys` when a configured `api_key_env` is unset |\n`GET /router` |\nread-only dashboard of recent decisions, with `X-Wayfinder-Debug: true` surfacing one in the body |\n`GET /v1/savings?period=today|7d|30d|all` |\nrealized vs always-frontier cost and the savings between them, per route (WF-DESIGN-0007) |\n`WAYFINDER_ROUTER_SAVINGS_FILE` |\nwhere the savings ledger is persisted (default `<config-dir>/wayfinder-savings.json` ) |\n`[gateway] retries` / `breaker_threshold` / `breaker_cooldown` |\nreliability: bounded retries on transport/`429` /`5xx` , and a per-target circuit breaker (WF-ADR-0031) |\n`[gateway] failover = same-tier|degrade|escalate` |\non exhaustion, stay on the tier (default), fall to a cheaper one (never raises cost), or a dearer one (opt-in); per-request `X-Wayfinder-Failover` |\n`[gateway.models.<name>] fallbacks = [...]` / `context_window` |\nsame-tier endpoints to try on failure; skip a target whose window can't fit the prompt. Responses carry `x-wayfinder-router-served-by` |\n`[gateway.budget] limit` / `window = day|month|all` / `on_breach = degrade|block` |\nspend cap: once `limit` realized cost is reached, degrade to the cheapest tier (default, never raises cost) or block with HTTP 402. Surfaced via `x-wayfinder-router-budget` ; needs real `cost_per_1k` prices (WF-ADR-0032) |\n\nTo see why a prompt routed where it did, ask for the per-feature breakdown: each feature's value, its normalized level, its weight, and its share of the score.\n\n```\nwayfinder-router route prompt.md --explain\n```\n\nFor interactive tuning there's a local web UI:\n\n**Explain**— paste a prompt; see the score, the tier ladder, and contribution bars, and drag a threshold slider to watch routing change live.**Calibrate**— paste a labeled dataset, run a mode, and see accuracy, the sweep curve, and the resulting config fragment.** Configure**— edit`wayfinder-router.toml`\n\nwith live validation and save.**Onboard**— A/B a local and a hosted model in the browser, judge each, and calibrate from the log (needs`[gateway]`\n\nfor the model calls).\n\n```\npip install \"wayfinder-router[ui]\"\nwayfinder-router ui --port 8099    # then open http://localhost:8099\n```\n\nThe UI is a thin wrapper over the same pure functions; it never calls a model, and no secret appears in it.\n\n``` python\nfrom wayfinder_router import score_complexity, RoutingConfig, explain_score\n\nresult = score_complexity(prompt_text, config=RoutingConfig.binary(threshold=0.7))\nprint(result.recommendation, result.score, result.features)\nfor fc in explain_score(result.features, RoutingConfig().weights):\n    print(fc.name, fc.contribution)\n```\n\nWayfinder started as a `route`\n\nexperiment inside a larger requirements tool and was\nsplit out because routing is a runtime concern, not a knowledge one: a prompt router\nshouldn't make you install an engine you don't need. The result is a small, focused\ntool whose scoring core stays dependency-free — you can `import wayfinder_router`\n\nand\nscore prompts with nothing but the standard library (WF-ADR-0001, WF-ADR-0029).\n\n```\nwayfinder-router/\n  wayfinder_router/   the package: scorer, tiers + classifier, config loader/writer,\n                      offline calibration (Newton/IRLS), explain, the feedback log and\n                      onboarding harness, recalibration, CLI, and the optional gateway\n                      and local UI (the impure layers, behind their extras)\n  tests/              scorer, config, calibration, explain, feedback, onboard,\n                      recalibrate, CLI, gateway, and UI coverage\n  decisions/          design notes behind the tool's own choices\n  docs/               the FAQ and the lexical-routing guide\n  Dockerfile, docker-compose.example.yml   deploy the gateway as a service\npip install -e .[dev]   # or: pip install pytest\nmake test\n```\n\n", "url": "https://wpnews.pro/news/i-built-an-llm-router-that-doesn-t-use-an-llm", "canonical_source": "https://github.com/itsthelore/wayfinder-router", "published_at": "2026-06-24 06:20:00+00:00", "updated_at": "2026-06-24 06:44:15.629395+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "developer-tools"], "entities": ["Wayfinder", "RouteLLM", "NotDiamond", "Martian", "OpenRouter", "LiteLLM", "RouterBench", "RouterArena"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-llm-router-that-doesn-t-use-an-llm", "markdown": "https://wpnews.pro/news/i-built-an-llm-router-that-doesn-t-use-an-llm.md", "text": "https://wpnews.pro/news/i-built-an-llm-router-that-doesn-t-use-an-llm.txt", "jsonld": "https://wpnews.pro/news/i-built-an-llm-router-that-doesn-t-use-an-llm.jsonld"}}