cd /news/large-language-models/maslul-smart-llm-router-one-call-the… · home topics large-language-models article
[ARTICLE · art-32160] src=github.com ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Maslul – Smart LLM router – one call, the right model

Maslul, a new open-source Python library, provides smart LLM routing and provider normalization across Anthropic, Gemini, Grok, and OpenAI, allowing developers to route each request to the right model tier by difficulty without hardcoding model choices or rewriting plumbing for each provider. The library is async, fully typed, and embeddable as a library rather than a gateway, with features including difficulty-based routing, tool use, structured output, web search, and caching.

read7 min views1 publishedJun 18, 2026

Smart LLM router — one call, the right model.

Async and fully typed, across Anthropic, Gemini, xAI Grok, and OpenAI — routing each request to the right model tier by difficulty. Stop hardcoding model choices and stop re-writing the tool-use / structured-output / web-search / retry plumbing for every provider.

maslul

(Hebrew מסלול, "route / lane") is a small library that does exactly two things: routing (pick a model tier per request, or pin one) and provider normalization (one Request

/Response

shape for every SDK). No server, no CLI, no heavy ML deps — providers live behind extras, and the core is stdlib-only.

import asyncio
from maslul import Router, Request, Message

router = Router.from_toml("maslul.toml")           # tiers + classifier + providers, from config

async def main() -> None:
    resp = await router.complete(Request(messages=[Message(role="user", content="Hello!")]))
    print(resp.text, "·", resp.level_used, "·", resp.usage.output_tokens, "tokens")

asyncio.run(main())
pip install "maslul[anthropic,gemini,grok]"     # or just the providers you use

Each provider's SDK lives behind an extra, so import maslul

pulls in none of them — you only install what you route to. maslul[anthropic]

anthropic

; maslul[gemini]

google-genai

; maslul[grok]

xai-sdk

; maslul[openai]

openai

.

maslul is a library, not a gateway — you embed the routing brain in your app, you don't run a proxy in front of it.

maslul | RouteLLM | LiteLLM | | |---|---|---|---| | Shape | async library you embed (no server) | research framework / trained router | unified SDK + proxy server | | Routing | difficulty tiers + swappable strategies (route_default / classify / classify_and_answer / verify_cascade ) + injectable bypass / classifier / verifier hooks | a trained strong-vs-weak router | manual config / fallback lists, load-balancing | | Providers | Anthropic · Gemini · Grok · OpenAI, normalized | model-agnostic (you wire models) | 100+ providers | | Tools / structured / vision | one normalized loop for all | — | per-provider | Web search | one flag, every provider → Response.sources | — | per-provider | | Caching | exact + semantic (in-process) | — | exact + semantic (proxy) | | Typing / footprint | fully typed, py.typed ; stdlib core, SDKs behind extras | research code | larger; server to operate |

Choose maslul when you want a typed async library you embed — difficulty routing with your own strategy + hooks, and one Request

/Response

over several providers (tools, structured output, vision, web search, retries, cost cache) — without standing up a gateway. Reach for LiteLLM when you want a provider proxy across 100+ models, or RouteLLM when you specifically want a trained router.

flowchart LR
    R["complete(req)"] --> M{"model= pin?"}
    M -- yes --> RUN["run that model"]
    M -- no --> L{"level= pin?"}
    L -- yes --> RUN
    L -- no --> B{"bypass_predicate?"}
    B -- "tier" --> RUN
    B -- "None" --> H{"hard_signal?<br/>(media · code · long · intent verbs)"}
    H -- "yes" --> HARD["HARD tier"] --> RUN
    H -- "no" --> S["strategy<br/>route_default · classify ·<br/>classify_and_answer · verify_cascade"] --> RUN
    RUN --> X["tool loop · web search ·<br/>retry / fallback · usage breakdown"]

Difficulty is not readable from surface features — a short prompt can be very hard, a long paste trivial — so maslul never applies a short ⇒ simple

rule. You choose how each request is routed, in this precedence order:

from maslul import Level

await router.complete(req, model="anthropic:claude-opus-4-8")  # 0. pin an exact model
await router.complete(req, level=Level.HARD)                   # 1. pin a difficulty tier
await router.complete(req)                                     # 2-4. let the router decide

When you don't pin, the routing brain runs: a deterministic bypass (your fast-path, e.g. greetings → SIMPLE) → a hard-signal detector (intent verbs, code, attachments, long context → HARD, up-only) → the configured strategy for the ambiguous middle:

Strategy Cost for the middle What it does
ROUTE_DEFAULT
0 calls Default-to-capable (default_level ). Best for low volume.
CLASSIFY
1 classify + 1 answer A cheap dedicated classifier model labels the level (cached + budget-guarded), then dispatch.
CLASSIFY_AND_ANSWER
1 call The classifier model answers directly, or emits an escalation sentinel to bump to a stronger tier.
VERIFY_CASCADE
1 cheap + verify Answer cheap, run your verifier, escalate if it rejects — catches silent under-escalation.

All three injection points are yours to supply:

def my_classifier(req):      # your own difficulty call (sync or async); None defers to the strategy
    return Level.SIMPLE if is_trivial(req) else None

def my_verifier(req, resp):  # VERIFY_CASCADE: True keeps the cheap answer, False escalates
    return "I don't know" not in resp.text

router = Router.from_toml("maslul.toml", classifier=my_classifier, verifier=my_verifier)

The same Request

/Response

works across all three providers:

from maslul import Request, Message, ToolDef, ToolCall, MediaPart

async def get_weather(call: ToolCall) -> str:
    return f"18°C in {call.input['city']}"

req = Request(
    messages=[Message(role="user", content="Weather in Paris?")],
    tools=[ToolDef(name="get_weather", description="Current weather for a city.",
                   input_schema={"type": "object", "properties": {"city": {"type": "string"}},
                                 "required": ["city"]})],
    tool_executor=get_weather,
)

req = Request(messages=[Message(role="user", content="Extract name + age")],
              response_format={"type": "object", "properties": {"name": {"type": "string"},
                                                                "age": {"type": "integer"}}})

req = Request(messages=[Message(role="user", content="What's in this image?")],
              media=[MediaPart(mime_type="image/png", data=png_bytes)])

req = Request(messages=[Message(role="user", content="Latest news on X?")], web_search=True)
python
def on_usage(resp):                         # per-model token breakdown for monitoring
    for rec in resp.usage_records:
        metrics.incr(f"{rec.provider}:{rec.model}", rec.usage.output_tokens)

router = Router.from_toml("maslul.toml", on_complete=on_usage)

Transient errors (RateLimited

, Timeout

) retry with exponential backoff; on persistent failure the request falls back to the next-higher tier — which may be a different provider, giving you cross-provider failover for free. AuthError

fails fast. Hooks: on_route

(the RoutingDecision

), on_complete

(the final Response

with usage_records

), on_error

(each failed attempt).

Build a router with missing_provider="degrade"

and any tier whose provider isn't configured (e.g. a Grok tier with no XAI_API_KEY

) falls back to the nearest available tier instead of erroring — so one config runs across deploys that have different keys.

A [maslul.cache]

config returns a prior Response

instead of calling a model — exact

(identical request) or semantic

(nearest request above a cosine threshold, using an embedder you inject, since maslul ships no embeddings). A hit comes back with cached=True

and zeroed usage, so monitoring sees the saving. Tool-using requests are never cached.

[maslul.cache]
mode = "semantic"          # off | exact | semantic
max_entries = 1000
ttl_seconds = 86400
similarity_threshold = 0.95
router = Router.from_toml("maslul.toml", embed=my_async_embed)   # embed only needed for semantic

A TOML file (or a plain dict

Router(config={...})

):

[maslul]
strategy = "route_default"        # route_default | classify | classify_and_answer | verify_cascade
default_level = "hard"            # default-to-capable for the ambiguous middle
min_tokens_to_classify = 40       # CLASSIFY budget guard
request_timeout = 60              # per-call seconds (optional)
max_retries = 2
fallback = true                   # escalate to a higher tier on persistent failure

[maslul.tiers.simple]
provider = "gemini"
model = "gemini-2.5-flash-lite"
[maslul.tiers.medium]
model = "anthropic:claude-haiku-4-5"   # or the provider:model shorthand
[maslul.tiers.hard]
model = "anthropic:claude-sonnet-4-6"

[maslul.classifier]               # required for the classify strategies
model = "anthropic:claude-haiku-4-5"

[maslul.providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"      # secrets by env-var name, never inlined
[maslul.providers.gemini]
vertex_project = "my-gcp-project"      # Vertex AI + Application Default Credentials (no key)
vertex_location = "global"
[maslul.providers.grok]
api_key_env = "XAI_API_KEY"

Pointing a capability at a different model or provider is a one-line config change — no code deploy. Providers can also be injected directly (Router(config, providers={...})

) for tests or custom wiring.

Provider SDK (extra) Auth
anthropic
anthropic
ANTHROPIC_API_KEY
gemini
google-genai
Vertex AI + ADC (vertex_project ), or a Gemini Developer API key
grok
xai-sdk
XAI_API_KEY
openai
openai
OPENAI_API_KEY

Beta (0.2.x

), fully typed (py.typed

), async-first. Routing, tool use, structured output, vision, web search across all three providers (web_search=True

), the four strategies, and retry/fallback resilience are implemented and exercised against live APIs.

MIT © Ilia Tankelevich

── more in #large-language-models 4 stories · sorted by recency
── more on @maslul 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/maslul-smart-llm-rou…] indexed:0 read:7min 2026-06-18 ·