Stop guessing your AI bill: one endpoint for GPT-5.5, Claude & Gemini at a flat per-call price

A developer created ModelisHub, a single OpenAI-compatible endpoint that auto-selects the best LLM (GPT-5.5, Claude Opus 4.8, Gemini 3.1, Grok, DeepSeek) for each request and charges a flat per-call price, eliminating unpredictable per-token billing and the need to manage multiple API keys. The service routes requests via a 'modelis-auto' model name and returns a header indicating which model handled the call, with options to pin specific models or quality tiers.

If you build on top of LLMs, you've probably hit this: you ship a feature, traffic spikes, and the API bill comes back way higher than you expected. Per-token pricing makes costs hard to predict — you're billed by how verbose the model is, not by the value you ship. I got tired of that plus juggling three API keys , so here's a setup that fixes both: one OpenAI-compatible endpoint that auto-picks the best model and charges a flat price per call. Instead of calling each provider directly, you point your existing OpenAI SDK at a single gateway and send one model name: modelis-auto . It routes each request to the best model for the task GPT-5.5, Claude Opus 4.8, Gemini 3.1, Grok, DeepSeek… and bills a flat per-call rate — so your cost is predictable regardless of which model handled it. If you already use the OpenAI SDK, this is a one-line change. python from openai import OpenAI client = OpenAI api key="YOUR MODELIS KEY", base url="https://modelishub.com/v1", the only change resp = client.chat.completions.create model="modelis-auto", let it pick the best model messages= {"role": "user", "content": "Explain CRDTs in two sentences."} , print resp.choices 0 .message.content Or with curl: curl https://modelishub.com/v1/chat/completions \ -H "Authorization: Bearer YOUR MODELIS KEY" \ -H "Content-Type: application/json" \ -d '{"model":"modelis-auto","messages": {"role":"user","content":"Hi"} }' That's it. Your existing code, SDKs, and OpenAI-compatible tools keep working. Fair question — auto-routing shouldn't be a black box. Every response returns a header telling you exactly which model handled the request: X-Modelis-Routed-Model: claude-opus-4-8 And if you want control, you can stay in a quality tier or call a specific model directly: model: "modelis-auto:premium" stay in a quality tier model: "gpt-5.5" or pin a specific model The point isn't "cheaper than everyone" — it's predictable . With a flat per-call price: If your workload is steady, you control prompt/response sizes tightly, and you've already optimized model choice per route, per-token billing can be cheaper. Flat per-call shines when traffic is bursty, prompts vary, or you just don't want to babysit model selection and cost. Pick what fits your reality. There's a free tier: modelishub.com https://modelishub.com . I'd genuinely love feedback — especially whether predictable pricing actually matters for how you build, or if you prefer per-token control.