If you build on top of LLMs, you've probably hit this: you ship a feature, traffic spikes, and the API bill comes back way higher than you expected. Per-token pricing makes costs hard to predict — you're billed by how verbose the model is, not by the value you ship.
I got tired of that (plus juggling three API keys), so here's a setup that fixes both: one OpenAI-compatible endpoint that auto-picks the best model and charges a flat price per call.
Instead of calling each provider directly, you point your existing OpenAI SDK at a single gateway and send one model name: modelis-auto
. It routes each request to the best model for the task (GPT-5.5, Claude Opus 4.8, Gemini 3.1, Grok, DeepSeek…) and bills a flat per-call rate — so your cost is predictable regardless of which model handled it.
If you already use the OpenAI SDK, this is a one-line change.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_MODELIS_KEY",
base_url="https://modelishub.com/v1", # the only change
)
resp = client.chat.completions.create(
model="modelis-auto", # let it pick the best model
messages=[{"role": "user", "content": "Explain CRDTs in two sentences."}],
)
print(resp.choices[0].message.content)
Or with curl:
curl https://modelishub.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_MODELIS_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"modelis-auto","messages":[{"role":"user","content":"Hi"}]}'
That's it. Your existing code, SDKs, and OpenAI-compatible tools keep working.
Fair question — auto-routing shouldn't be a black box. Every response returns a header telling you exactly which model handled the request:
X-Modelis-Routed-Model: claude-opus-4-8
And if you want control, you can stay in a quality tier or call a specific model directly:
model: "modelis-auto:premium" # stay in a quality tier
model: "gpt-5.5" # or pin a specific model
The point isn't "cheaper than everyone" — it's predictable. With a flat per-call price:
If your workload is steady, you control prompt/response sizes tightly, and you've already optimized model choice per route, per-token billing can be cheaper. Flat per-call shines when traffic is bursty, prompts vary, or you just don't want to babysit model selection and cost. Pick what fits your reality.
There's a free tier: modelishub.com. I'd genuinely love feedback — especially whether predictable pricing actually matters for how you build, or if you prefer per-token control.