cd /news/large-language-models/stop-guessing-your-ai-bill-one-endpo… · home topics large-language-models article
[ARTICLE · art-32839] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Stop guessing your AI bill: one endpoint for GPT-5.5, Claude & Gemini at a flat per-call price

A developer created ModelisHub, a single OpenAI-compatible endpoint that auto-selects the best LLM (GPT-5.5, Claude Opus 4.8, Gemini 3.1, Grok, DeepSeek) for each request and charges a flat per-call price, eliminating unpredictable per-token billing and the need to manage multiple API keys. The service routes requests via a 'modelis-auto' model name and returns a header indicating which model handled the call, with options to pin specific models or quality tiers.

read2 min views2 publishedJun 18, 2026

If you build on top of LLMs, you've probably hit this: you ship a feature, traffic spikes, and the API bill comes back way higher than you expected. Per-token pricing makes costs hard to predict — you're billed by how verbose the model is, not by the value you ship.

I got tired of that (plus juggling three API keys), so here's a setup that fixes both: one OpenAI-compatible endpoint that auto-picks the best model and charges a flat price per call.

Instead of calling each provider directly, you point your existing OpenAI SDK at a single gateway and send one model name: modelis-auto

. It routes each request to the best model for the task (GPT-5.5, Claude Opus 4.8, Gemini 3.1, Grok, DeepSeek…) and bills a flat per-call rate — so your cost is predictable regardless of which model handled it.

If you already use the OpenAI SDK, this is a one-line change.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MODELIS_KEY",
    base_url="https://modelishub.com/v1",   # the only change
)

resp = client.chat.completions.create(
    model="modelis-auto",                    # let it pick the best model
    messages=[{"role": "user", "content": "Explain CRDTs in two sentences."}],
)
print(resp.choices[0].message.content)

Or with curl:

curl https://modelishub.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_MODELIS_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"modelis-auto","messages":[{"role":"user","content":"Hi"}]}'

That's it. Your existing code, SDKs, and OpenAI-compatible tools keep working.

Fair question — auto-routing shouldn't be a black box. Every response returns a header telling you exactly which model handled the request:

X-Modelis-Routed-Model: claude-opus-4-8

And if you want control, you can stay in a quality tier or call a specific model directly:

model: "modelis-auto:premium"     # stay in a quality tier
model: "gpt-5.5"                   # or pin a specific model

The point isn't "cheaper than everyone" — it's predictable. With a flat per-call price:

If your workload is steady, you control prompt/response sizes tightly, and you've already optimized model choice per route, per-token billing can be cheaper. Flat per-call shines when traffic is bursty, prompts vary, or you just don't want to babysit model selection and cost. Pick what fits your reality.

There's a free tier: modelishub.com. I'd genuinely love feedback — especially whether predictable pricing actually matters for how you build, or if you prefer per-token control.

── more in #large-language-models 4 stories · sorted by recency
── more on @modelishub 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/stop-guessing-your-a…] indexed:0 read:2min 2026-06-18 ·