Unify all your AI models - local and cloud - behind a single OpenAI-compatible API with LiteLLM and Ollama.
LiteLLM is a proxy server that exposes 100+ LLM providers through one endpoint. Connect it to Ollama for local inference, and you get load balancing, cost tracking, rate limits, and automatic fallback routing.
pip install 'litellm[proxy]'
model_list:
- model_name: qwen3-local
litellm_params:
model: ollama/qwen3:14b
api_base: http://localhost:11434
rpm: 30
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
general_settings:
master_key: sk-your-key
litellm --config config.yaml --port 4000
python
from openai import OpenAI
client = OpenAI(api_key="sk-your-key",
base_url="http://localhost:4000/v1")
response = client.chat.completions.create(
model="qwen3-local",
messages=[{"role": "user", "content": "Hello!"}])
| LiteLLM + Ollama | Direct Cloud APIs | |
|---|---|---|
| Gateway | Free, self-hosted | Free |
| Local inference | $0 | N/A |
| Model switching | One endpoint | Multiple SDKs |
| Failover | Automatic | Manual |
Full guide with advanced config examples: https://everylocalai.com/stack/litellm-ollama-gateway