Otari: Own Your AI Stack

wpnews.pro

Announcement Meet Otari, an open-source LLM gateway powered by any-llm, and Otari.ai, the hosted platform built on the same foundation. Run frontier or open-weights models through one API with usage tracking, budget controls, routing policies, observability, and team management.

Closed-source frontier providers offer what looks like a complete stack: tools, MCP server integrations, execution environments, web search, spend controls, etc. Choosing one for your next project feels like a no-brainer.

Then you decide to run an open-weights model, for cost, sovereignty, or simply because you can. Most of that stack disappears. You get a chat endpoint. The rest is yours to rebuild.

That's the gap Otari closes.

Today we're launching [Otari](https://github.com/mozilla-ai/otari?ref=blog.mozilla.ai), an open-source LLM gateway built on top of

[, and](https://github.com/mozilla-ai/any-llm?ref=blog.mozilla.ai)

any-llm, the hosted platform built around it. Together, they let you choose any model, whether it is frontier or open-weights, hosted or self-served, without giving up the developer experience and capabilities you expect and, most importantly, without compromising your privacy.

Otari.ai## What Otari Is

Otari brings the missing pieces to your stack: user management, provider key management, usage and budget tracking, and a set of tools to make open source models more capable.

Better cost and privacy without compromising on capabilities. And you are not locked into Python, you can connect via one of our SDKs or by hitting the API directly.

Closing the Capability Gap

Frontier providers ship more than just weights. They ship code execution, web search, transcription, image generation, and batching. When you switch a workload from Claude or GPT to an open-weights model, those tools do not come with you. The model regresses to a simple chat endpoint, and your application code must grow a layer it did not need before.

Otari ships those capabilities as server-side, model-agnostic tools. The gateway dispatches them to any model that supports tool calls:

Sandboxed code execution. A Docker-isolated Python REPL, invoked server-side when a model needs to run code. Any tool-using model now has a code interpreter. You don't fine-tune for it; you don't write the sandbox; it's just there.
Web search. Current-information retrieval via SearXNG out of the box, with the option to plug in Tavily, Brave, or Exa. Your open-weights model is no longer stuck at its training cutoff.
Audio in, images out. OpenAI-compatible transcription and image generation endpoints, so multimodal pipelines keep working when you swap the model behind them.
Reranking. LLM-powered document reranking for RAG, independent of your generation model.
Batch processing. OpenAI-compatible asynchronous batch API for workloads where latency doesn't matter and cost does.

Choosing open-source models shouldn't mean losing capabilities. Otari levels the playing field. The same tools you use with closed-source providers are attached to whatever model you choose. Pair an open-weights chat model with Otari and you get a fully equipped agent runtime, not a stripped-down one.

And we're not stopping here. Guardrails powered by [ llamafile](https://github.com/mozilla-ai/llamafile?ref=blog.mozilla.ai),

[, and](https://github.com/mozilla-ai/encoderfile?ref=blog.mozilla.ai)

encoderfileare next, so the safety and classification layers around your model run fast and locally, even without a GPU.

any-guardrail### The Operational Layer

The other half of why a gateway exists is the boring, important stuff every team ends up building for itself. Otari ships it:

Virtual API keys: Hashed, named, optionally-expiring keys bound to a user, so clients never see your upstream provider credentials.
User management and budgets: Per-user spending caps with configurable reset windows.
Usage and spend tracking: Real-time cost calculation across providers.
Rate limiting: Configurable RPM caps per user, with hits exported as Prometheus metrics.
Health and Prometheus metrics.
Platform mode: Delegation-based multi-tenant authorization, which is the seam Otari.ai is built on.

Otari.ai: The Hosted Platform #

Otari is the engine. Otari.ai is what you get when you don't want to run it yourself. It is the managed, team-oriented surface built on top of the OSS gateway.

Identity and teams. User accounts, organizations with role-based access (owner, admin, member), workspaces scoped to organizations, each with their own keys, members, playground, and spending dashboards.
Routing Policies. Define how requests flow across providers and models at the workspace level. We are starting with a simple fallback system and we will be expanding on more elaborate routers in the near future.
Secure vault. Provider credentials encrypted at rest.
Managed providers. Reach frontier models through Otari.ai without bringing your own API key. Billed against your wallet at transparent per-token pricing.
Mozilla.ai provider. A first-party managed provider routes to open-weights models. Auto-provisioned per organization. Same gateway, same budgets, same traces. Open-weights as a first-class citizen.
Multi-level budgets and wallets. Spend limits per provider key, plus per-member-per-provider-key caps for fine-grained control, each with their own reset cadence.
Declarative configuration. Describe an entire organization — workspaces, provider keys, routing policies, budgets, member budget policies, custom pricing, platform keys — in a single YAML document. Plan a diff against the current state, commit it to your Git, and re-create your environment with a single click.
Observability. OTLP trace ingest for any OpenTelemetry-instrumented application, OpenSearch-backed analytics, a session explorer with filtering and per-session metrics, and usage dashboards with cost visualization.

Why We Built It This Way #

Otari is open core. Otari.ai is a transparent business layer on top, same engine, same API surface, just hosted and operated for you. Use the platform for velocity. Self-host to keep your privacy: Your prompts, your completions, your traces, and your usage logs never touch us. We don't see what your users type. We don't see what your models say back. Switch directions later without rewriting your application code: the wire format is the same.

The other design choice we care about is making open-weights models a first-class citizen. Not a checkbox, not a fallback, not a thing you have to bring your own infrastructure for. Same dashboards, same budgets, same tool calls, same managed provider experience. That's the bet behind both the OSS project and the platform, and behind Mozilla.ai's broader work.

Getting Started #

[Otari.ai](http://otari.ai/?ref=blog.mozilla.ai) (hosted): sign up, top up your wallet, start calling frontier or open-weights models. Bring your own keys or use the managed providers.

[ Otari](https://github.com/mozilla-ai/otari?ref=blog.mozilla.ai) (open source): clone the repo, run docker compose up, point your OpenAI client at the gateway URL.

We'd love your feedback. File issues on GitHub, find us in the Mozilla.ai community channels (X, LinkedIn, Bluesky, Mastodon, Discord), or just start building and tell us what breaks.

source & further reading

blog.mozilla.ai — original article You cannot sell AI written software

Otari: Own Your AI Stack

Closing the Capability Gap

Otari.ai: The Hosted Platform #

Why We Built It This Way #

Getting Started #

Run your AI side-project on zahid.host