Maslul – Smart LLM router – one call, the right model

Maslul, a new open-source Python library, provides smart LLM routing and provider normalization across Anthropic, Gemini, Grok, and OpenAI, allowing developers to route each request to the right model tier by difficulty without hardcoding model choices or rewriting plumbing for each provider. The library is async, fully typed, and embeddable as a library rather than a gateway, with features including difficulty-based routing, tool use, structured output, web search, and caching.

Smart LLM router — one call, the right model. Async and fully typed, across Anthropic, Gemini, xAI Grok, and OpenAI — routing each request to the right model tier by difficulty. Stop hardcoding model choices and stop re-writing the tool-use / structured-output / web-search / retry plumbing for every provider. maslul Hebrew מסלול , "route / lane" is a small library that does exactly two things: routing pick a model tier per request, or pin one and provider normalization one Request / Response shape for every SDK . No server, no CLI, no heavy ML deps — providers live behind extras, and the core is stdlib-only. python import asyncio from maslul import Router, Request, Message router = Router.from toml "maslul.toml" tiers + classifier + providers, from config async def main - None: resp = await router.complete Request messages= Message role="user", content="Hello " print resp.text, "·", resp.level used, "·", resp.usage.output tokens, "tokens" asyncio.run main pip install "maslul anthropic,gemini,grok " or just the providers you use Each provider's SDK lives behind an extra, so import maslul pulls in none of them — you only install what you route to. maslul anthropic → anthropic ; maslul gemini → google-genai ; maslul grok → xai-sdk ; maslul openai → openai . maslul is a library, not a gateway — you embed the routing brain in your app, you don't run a proxy in front of it. maslul | RouteLLM | LiteLLM | | |---|---|---|---| | Shape | async library you embed no server | research framework / trained router | unified SDK + proxy server | | Routing | difficulty tiers + swappable strategies route default / classify / classify and answer / verify cascade + injectable bypass / classifier / verifier hooks | a trained strong-vs-weak router | manual config / fallback lists, load-balancing | | Providers | Anthropic · Gemini · Grok · OpenAI, normalized | model-agnostic you wire models | 100+ providers | | Tools / structured / vision | one normalized loop for all | — | per-provider | Web search | one flag, every provider → Response.sources | — | per-provider | | Caching | exact + semantic in-process | — | exact + semantic proxy | | Typing / footprint | fully typed, py.typed ; stdlib core, SDKs behind extras | research code | larger; server to operate | Choose maslul when you want a typed async library you embed — difficulty routing with your own strategy + hooks, and one Request / Response over several providers tools, structured output, vision, web search , retries, cost cache — without standing up a gateway. Reach for LiteLLM when you want a provider proxy across 100+ models, or RouteLLM when you specifically want a trained router. php flowchart LR R "complete req " -- M{"model= pin?"} M -- yes -- RUN "run that model" M -- no -- L{"level= pin?"} L -- yes -- RUN L -- no -- B{"bypass predicate?"} B -- "tier" -- RUN B -- "None" -- H{"hard signal?<br/ media · code · long · intent verbs "} H -- "yes" -- HARD "HARD tier" -- RUN H -- "no" -- S "strategy<br/ route default · classify ·<br/ classify and answer · verify cascade" -- RUN RUN -- X "tool loop · web search ·<br/ retry / fallback · usage breakdown" Difficulty is not readable from surface features — a short prompt can be very hard, a long paste trivial — so maslul never applies a short ⇒ simple rule. You choose how each request is routed, in this precedence order: python from maslul import Level await router.complete req, model="anthropic:claude-opus-4-8" 0. pin an exact model await router.complete req, level=Level.HARD 1. pin a difficulty tier await router.complete req 2-4. let the router decide When you don't pin, the routing brain runs: a deterministic bypass your fast-path, e.g. greetings → SIMPLE → a hard-signal detector intent verbs, code, attachments, long context → HARD, up-only → the configured strategy for the ambiguous middle: | Strategy | Cost for the middle | What it does | |---|---|---| ROUTE DEFAULT | 0 calls | Default-to-capable default level . Best for low volume. | CLASSIFY | 1 classify + 1 answer | A cheap dedicated classifier model labels the level cached + budget-guarded , then dispatch. | CLASSIFY AND ANSWER | 1 call | The classifier model answers directly, or emits an escalation sentinel to bump to a stronger tier. | VERIFY CASCADE | 1 cheap + verify | Answer cheap, run your verifier, escalate if it rejects — catches silent under-escalation. | All three injection points are yours to supply: python def my classifier req : your own difficulty call sync or async ; None defers to the strategy return Level.SIMPLE if is trivial req else None def my verifier req, resp : VERIFY CASCADE: True keeps the cheap answer, False escalates return "I don't know" not in resp.text router = Router.from toml "maslul.toml", classifier=my classifier, verifier=my verifier The same Request / Response works across all three providers: python from maslul import Request, Message, ToolDef, ToolCall, MediaPart Tools — the router runs a provider-agnostic tool-use loop async def get weather call: ToolCall - str: return f"18°C in {call.input 'city' }" req = Request messages= Message role="user", content="Weather in Paris?" , tools= ToolDef name="get weather", description="Current weather for a city.", input schema={"type": "object", "properties": {"city": {"type": "string"}}, "required": "city" } , tool executor=get weather, Structured output — response format → resp.structured parsed req = Request messages= Message role="user", content="Extract name + age" , response format={"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}} Vision — images / PDFs req = Request messages= Message role="user", content="What's in this image?" , media= MediaPart mime type="image/png", data=png bytes Web search — one flag, grounded on ANY provider Anthropic web search / Gemini Google Search / Grok Agent Tools ; citations land in resp.sources regardless of which model answers. req = Request messages= Message role="user", content="Latest news on X?" , web search=True python def on usage resp : per-model token breakdown for monitoring for rec in resp.usage records: metrics.incr f"{rec.provider}:{rec.model}", rec.usage.output tokens router = Router.from toml "maslul.toml", on complete=on usage Transient errors RateLimited , Timeout retry with exponential backoff; on persistent failure the request falls back to the next-higher tier — which may be a different provider, giving you cross-provider failover for free. AuthError fails fast. Hooks: on route the RoutingDecision , on complete the final Response with usage records , on error each failed attempt . Build a router with missing provider="degrade" and any tier whose provider isn't configured e.g. a Grok tier with no XAI API KEY falls back to the nearest available tier instead of erroring — so one config runs across deploys that have different keys. A maslul.cache config returns a prior Response instead of calling a model — exact identical request or semantic nearest request above a cosine threshold, using an embedder you inject, since maslul ships no embeddings . A hit comes back with cached=True and zeroed usage , so monitoring sees the saving. Tool-using requests are never cached. maslul.cache mode = "semantic" off | exact | semantic max entries = 1000 ttl seconds = 86400 similarity threshold = 0.95 router = Router.from toml "maslul.toml", embed=my async embed embed only needed for semantic A TOML file or a plain dict — Router config={...} : maslul strategy = "route default" route default | classify | classify and answer | verify cascade default level = "hard" default-to-capable for the ambiguous middle min tokens to classify = 40 CLASSIFY budget guard request timeout = 60 per-call seconds optional max retries = 2 fallback = true escalate to a higher tier on persistent failure maslul.tiers.simple provider = "gemini" model = "gemini-2.5-flash-lite" maslul.tiers.medium model = "anthropic:claude-haiku-4-5" or the provider:model shorthand maslul.tiers.hard model = "anthropic:claude-sonnet-4-6" maslul.classifier required for the classify strategies model = "anthropic:claude-haiku-4-5" maslul.providers.anthropic api key env = "ANTHROPIC API KEY" secrets by env-var name, never inlined maslul.providers.gemini vertex project = "my-gcp-project" Vertex AI + Application Default Credentials no key vertex location = "global" maslul.providers.grok api key env = "XAI API KEY" Pointing a capability at a different model or provider is a one-line config change — no code deploy. Providers can also be injected directly Router config, providers={...} for tests or custom wiring. | Provider | SDK extra | Auth | |---|---|---| anthropic | anthropic | ANTHROPIC API KEY | gemini | google-genai | Vertex AI + ADC vertex project , or a Gemini Developer API key | grok | xai-sdk | XAI API KEY | openai | openai | OPENAI API KEY | Beta 0.2.x , fully typed py.typed , async-first. Routing, tool use, structured output, vision, web search across all three providers web search=True , the four strategies, and retry/fallback resilience are implemented and exercised against live APIs. MIT /iliatankelevich/maslul/blob/main/LICENSE © Ilia Tankelevich