cd /news/ai-tools/show-hn-smart-model-routing-directly… · home topics ai-tools article
[ARTICLE · art-41043] src=github.com ↗ pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Smart model routing directly in Claude, Codex and Cursor

Weave released a smart model routing proxy that works with Claude, Codex, and Cursor, ranking #1 on the RouterArena leaderboard. The open-source tool uses an on-box embedder to select the best model per request across Anthropic, OpenAI, Gemini, and open-source providers, with support for streaming, tools, and vision. It is available as a hosted service via npx or self-hosted with Docker and Postgres.

read4 min views1 publishedJun 26, 2026
Show HN: Smart model routing directly in Claude, Codex and Cursor
Image: source

One endpoint. Every model. Always the right one.

A drop-in proxy for Anthropic, OpenAI, and Gemini that picks the best model for every request: using a tiny on-box embedder, not a vibes-based prompt.

🥇 #1 on the RouterArena leaderboard

— Acc-Cost Arena

176.09.

Built by Weave: The #1 engineering intelligence platform, loved by Robinhood, PostHog, Reducto, and hundreds of others.

Point Claude Code, Codex, Cursor, or your own app at localhost:8080

. The router:

  • 🎯 Routes per request. A cluster scorer derived fromAvengers-Propicks the right model from your enabled providers, every turn.2 - 🔌 Speaks everyone's API. Anthropic Messages, OpenAI Chat Completions, Gemini native. Streaming, tools, vision, the works. - 🧠 Knows OSS too. DeepSeek, Kimi, GLM, Qwen, Llama, Mistral via OpenRouter (or any OpenAI-compatible endpoint). - 🔒 BYOK by default. Provider keys stay on your box, encrypted at rest. - 📊 Observable. OTLP traces out of the box. See your dashboard in the Weave dashboard (http://localhost:8080/ui/dashboard) or drop in Honeycomb, Datadog, Grafana, whatever.

The fastest way: point Claude Code, Codex, or opencode at the hosted Weave Router with one command. No clone, no Docker, no Postgres.

npx @workweave/router

That's it. The installer asks which tool (Claude Code, Codex, or opencode), walks you through scope (user vs. project), grabs a router key, and wires the right config file. Other flavors:

npx @workweave/router --claude              # skip the picker, Claude Code
npx @workweave/router --codex               # skip the picker, OpenAI Codex CLI
npx @workweave/router --opencode            # skip the picker, opencode
npx @workweave/router --scope project       # per-repo, commits settings.json (or .codex/ / opencode.json)
npx @workweave/router --local               # self-hosted localhost:8080
npx @workweave/router --base-url https://router.acme.internal
npx @workweave/router@0.1.0                 # pin a version

Requires Node ≥ 18 (Claude Code and opencode paths also need jq

). Full flag reference: install/npm/README.md.

If you want the router (and dashboard) running on your own box:

echo "OPENROUTER_API_KEY=sk-or-v1-..." >> .env.local

make full-setup

The router is up at http://localhost:8080, the dashboard at http://localhost:8080/ui/ (password: admin

), and your rk_...

key prints in the logs.

curl -sS http://localhost:8080/v1/messages \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"claude-sonnet-4-5","max_tokens":256,
       "messages":[{"role":"user","content":"hi"}]}'

curl -sS http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer rk_..." \
  -d '{"model":"gpt-4o-mini",
       "messages":[{"role":"user","content":"hi"}]}'

curl -sS http://localhost:8080/v1/route -H "Authorization: Bearer rk_..." -d '...'

Claude Code. Run make install-cc

to wire Claude Code at the local self-hosted router (it's also invoked automatically at the end of make full-setup

). For the hosted router, use npx @workweave/router

above.

Codex (OpenAI CLI). npx @workweave/router --codex

patches ~/.codex/config.toml

(or <repo>/.codex/config.toml

with --scope project

) with a managed [model_providers.weave]

block and sets model_provider = "weave"

. Codex's existing OPENAI_API_KEY

flows through to api.openai.com for the plan-based passthrough; the router key rides in an X-Weave-Router-Key

HTTP header. Re-install and --uninstall --codex

rewrite/remove only the managed block, leaving the rest of your Codex config untouched.

opencode. npx @workweave/router --opencode

merges a provider.weave

entry into ~/.config/opencode/opencode.json

(or <repo>/opencode.json

with --scope project

). It uses opencode's bundled @ai-sdk/anthropic

provider pointed at the router's /v1

endpoint — the router speaks the Anthropic Messages API natively, so opencode works unmodified. The router key and identity headers ride alongside the provider config; re-install rewrites only the managed block and --uninstall --opencode

strips it.

Cursor (early beta, performance may not be the best). Settings → Models → Override OpenAI Base URLhttp://localhost:8080/v1

, paste rk_...

as the API key.

Switching on/off. After installing, npx @workweave/router off --claude

(or --codex

/ --opencode

) routes that client straight to its provider again without discarding the router config; on

flips it back, and status

reports which way it's pointing. Claude Code also gets /router-off

, /router-on

, and /router-status

slash commands. Cursor toggles via the same Settings → Models override above. See install/README.md.

Two keys, don't mix them up:

sk-or-...

/sk-ant-...

/sk-...

= yourupstreamprovider key. Lives in.env.local

.rk_...

= yourrouterkey. Clients send this as a Bearer token.

Endpoint Format
POST /v1/messages
Anthropic Messages, routed
POST /v1/chat/completions
OpenAI Chat Completions, routed
POST /v1beta/models/:action
Gemini generateContent , routed
POST /v1/route
Returns the decision, no upstream call
GET /v1/models · POST /v1/messages/count_tokens
Anthropic passthrough
GET /health · GET /validate
liveness + key check
  • 📐 : every env var, BYOK encryption, OTel knobs, cluster routing.Configuration reference - 🛠️ : layering rules, hot-reload dev, migrations, tests, the whole engineering loop.Contributing - 🏗️ : package layout, import contracts, recipes for adding endpoints / providers / strategies.Architecture

  • Token-aware rate limiting (Redis sliding window per installation)

  • Sub-installations for tenant hierarchies

  • Speculative dispatch + hedging for tail latency

Footnotes #

Lu, Y., Liu, R., Yuan, J., Cui, X., Zhang, S., Liu, H., & Xing, J.

*RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers.*arXiv:2510.00202, 2025.https://arxiv.org/abs/2510.00202 - Zhang, Y. et al.

Beyond GPT-5: Making LLMs Cheaper and Better via Performance–Efficiency Optimized Routing(Avengers-Pro). arXiv:2508.12631, 2025.https://arxiv.org/abs/2508.12631

── more in #ai-tools 4 stories · sorted by recency
── more on @weave 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-smart-model-…] indexed:0 read:4min 2026-06-26 ·