# Build a Unified AI Gateway with LiteLLM and Ollama

> Source: <https://dev.to/everylocalai/build-a-unified-ai-gateway-with-litellm-and-ollama-387a>
> Published: 2026-06-14 21:54:58+00:00

Unify all your AI models - local and cloud - behind a single OpenAI-compatible API with LiteLLM and Ollama.

LiteLLM is a proxy server that exposes 100+ LLM providers through one endpoint. Connect it to Ollama for local inference, and you get load balancing, cost tracking, rate limits, and automatic fallback routing.

```
pip install 'litellm[proxy]'
model_list:
  - model_name: qwen3-local
    litellm_params:
      model: ollama/qwen3:14b
      api_base: http://localhost:11434
      rpm: 30
  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

general_settings:
  master_key: sk-your-key
litellm --config config.yaml --port 4000
python
from openai import OpenAI
client = OpenAI(api_key="sk-your-key",
  base_url="http://localhost:4000/v1")
response = client.chat.completions.create(
  model="qwen3-local",
  messages=[{"role": "user", "content": "Hello!"}])
```

| LiteLLM + Ollama | Direct Cloud APIs | |
|---|---|---|
| Gateway | Free, self-hosted | Free |
| Local inference | $0 | N/A |
| Model switching | One endpoint | Multiple SDKs |
| Failover | Automatic | Manual |

Full guide with advanced config examples: [https://everylocalai.com/stack/litellm-ollama-gateway](https://everylocalai.com/stack/litellm-ollama-gateway)