Save Claude Code Tokens with Smart Routing Brick, a Mixture-of-Models routing gateway, reads each prompt's capability and complexity to route queries to the best backend LLM, matching top model quality at lower cost. It offers a drop-in OpenAI-compatible endpoint for tools like Claude Code and Codex, cutting costs without losing quality. Brick is a Mixture-of-Models MoM routing gateway . It reads each prompt's capability and complexity , then routes it to the best backend in a pool of open- and closed-weight LLMs, matching the strongest single model's quality at a fraction of its cost. No cascades. No wasted calls. Drop-in model: "brick" . When to use Brick -when-can-i-use-brick · Quickstart -quickstart · Why Brick -why-brick · Claude Code -brick--claude-code · Codex -use-it-on-codex · FAQ -faq · Benchmarks -results-dataset-a-n5504 · How it works -how-it-works · Paper -paper Brick is for anyone running against more than one model, or paying flat rate for a single strong one. Three common cases: - You have a pool of models and want each query to reach the right one. Cheap prompts should not burn your most expensive model, and hard prompts should not be starved on a small one. Brick reads capability and complexity per query and dispatches accordingly, so the pool works as one graded system instead of a manual pick. - You want to cut Claude Code / Codex costs without losing quality. Put Brick in front of your coding agent and every request is routed to the cheapest model that can actually do the job, escalating only when the task needs it. You keep the same UX and pay for the hard turns, not the easy ones. - You want to unify different models behind one tool. Use OpenAI models, GLM, DeepSeek, Kimi, Qwen and others from inside Claude Code or Codex through a single OpenAI-compatible endpoint. Define the pool once in config.yaml and call model: "brick" everywhere. The fastest working path today is the CLI, which self-hosts the router and wires it into Claude Code for you. Requires Node = 18 and Docker. git clone https://github.com/regolo-ai/brick-SR1.git cd brick-SR1/apps/cli && npm install && npm run build && npm link brick claude on starts the router + wires ANTHROPIC BASE URL in ~/.claude/settings.json Then open a new Claude Code session and pick brick-claude in the /model picker. Every request now routes to haiku / sonnet / opus by capability and complexity. See Brick + Claude Code -brick--claude-code for modes, the effort picker, and the live brick claude status dashboard. Prefer a raw OpenAI-compatible gateway no CLI ? Once the Docker image is published see Distribution channels -develop , you'll be able to run the gateway directly: docker run --rm -p 18000:18000 \ -e REGOLO API KEY=$REGOLO API KEY \ ghcr.io/regolo-ai/brick:latest published at the next v2.1.0 tag Then call it like any OpenAI endpoint, just set "model": "brick" : curl http://localhost:18000/v1/chat/completions \ -H "Authorization: Bearer $REGOLO API KEY" \ -H "Content-Type: application/json" \ -d '{"model":"brick","messages": {"role":"user","content":"Prove that sqrt 2 is irrational"} }' The x-selected-model response header tells you which backend Brick picked. That math prompt routes to a reasoning model; "Hello" routes to the cheapest one. Until then, brick serve from the CLI above runs the same router locally from source. | Single model | RouteLLM | FrugalGPT / Cascade | Brick | | |---|---|---|---|---| | One call per query no cascade waste | ✅ | ✅ | ❌ | ✅ | | Capability-aware 6 dimensions | n/a | ❌ binary | ❌ | ✅ | | Complexity-aware | n/a | partial | ✅ | ✅ | | Pool of N open + closed models | n/a | 2 | few | ✅ | | Continuous cost ↔ quality knob | ❌ | ❌ | threshold | ✅ r ∈ -1, 1 | | Native multimodal image / audio | varies | ❌ | ❌ | ✅ | | Drop-in OpenAI-compatible | n/a | n/a | n/a | ✅ | Cascade routers FrugalGPT, Cascade Routing call models one after another until a confidence check passes, paying for every miss in tokens and latency. Brick makes a single forward decision per query, so there is nothing to waste. gosmiulator.mp4 Put one OpenAI/Anthropic-compatible endpoint in front of Claude Code, and Brick routes every request to haiku , sonnet , or opus based on capability and complexity. You keep the Claude Code UX; Brick picks the cheapest model that can do the job. brick claude on wires ANTHROPIC BASE URL in ~/.claude/settings.json, auto-starts the router Then: - Open a new Claude Code session your current session is unaffected . - In the /model picker, select brick-claude it sits alongside the built-in opus/sonnet/haiku aliases, which it does not replace . To revert: brick claude off restores ANTHROPIC BASE URL, optionally stops the router Use brick claude on --no-start to require an already-healthy router instead of auto-starting one, and brick claude off --stop / --keep to control the router without a prompt. A mode is how you tell Brick how much to spend. Each one maps easy/medium/hard queries to a model tier, from cheapest eco , always haiku to strongest max , always opus , with lite , mid and pro in between. Pick one and Brick handles the per-query routing inside it. 2026-07-03.23-55-05.mp4 You switch mode straight from the thinking effort slider in Claude Code's /model picker: low picks eco , medium lite , high mid , xhigh pro , and max max . So the effort control does not set a thinking budget, it selects the model tier. You can also switch explicitly with brick claude mode or brick claude