Build a Unified AI Gateway with LiteLLM and Ollama A developer built a unified AI gateway using LiteLLM and Ollama, enabling a single OpenAI-compatible API endpoint for both local and cloud LLMs. The setup provides load balancing, cost tracking, rate limits, and automatic fallback routing across 100+ providers. Unify all your AI models - local and cloud - behind a single OpenAI-compatible API with LiteLLM and Ollama. LiteLLM is a proxy server that exposes 100+ LLM providers through one endpoint. Connect it to Ollama for local inference, and you get load balancing, cost tracking, rate limits, and automatic fallback routing. pip install 'litellm proxy ' model list: - model name: qwen3-local litellm params: model: ollama/qwen3:14b api base: http://localhost:11434 rpm: 30 - model name: gpt-4o-mini litellm params: model: openai/gpt-4o-mini api key: os.environ/OPENAI API KEY general settings: master key: sk-your-key litellm --config config.yaml --port 4000 python from openai import OpenAI client = OpenAI api key="sk-your-key", base url="http://localhost:4000/v1" response = client.chat.completions.create model="qwen3-local", messages= {"role": "user", "content": "Hello "} | LiteLLM + Ollama | Direct Cloud APIs | | |---|---|---| | Gateway | Free, self-hosted | Free | | Local inference | $0 | N/A | | Model switching | One endpoint | Multiple SDKs | | Failover | Automatic | Manual | Full guide with advanced config examples: https://everylocalai.com/stack/litellm-ollama-gateway https://everylocalai.com/stack/litellm-ollama-gateway