{"slug": "build-a-unified-ai-gateway-with-litellm-and-ollama", "title": "Build a Unified AI Gateway with LiteLLM and Ollama", "summary": "A developer built a unified AI gateway using LiteLLM and Ollama, enabling a single OpenAI-compatible API endpoint for both local and cloud LLMs. The setup provides load balancing, cost tracking, rate limits, and automatic fallback routing across 100+ providers.", "body_md": "Unify all your AI models - local and cloud - behind a single OpenAI-compatible API with LiteLLM and Ollama.\n\nLiteLLM is a proxy server that exposes 100+ LLM providers through one endpoint. Connect it to Ollama for local inference, and you get load balancing, cost tracking, rate limits, and automatic fallback routing.\n\n```\npip install 'litellm[proxy]'\nmodel_list:\n  - model_name: qwen3-local\n    litellm_params:\n      model: ollama/qwen3:14b\n      api_base: http://localhost:11434\n      rpm: 30\n  - model_name: gpt-4o-mini\n    litellm_params:\n      model: openai/gpt-4o-mini\n      api_key: os.environ/OPENAI_API_KEY\n\ngeneral_settings:\n  master_key: sk-your-key\nlitellm --config config.yaml --port 4000\npython\nfrom openai import OpenAI\nclient = OpenAI(api_key=\"sk-your-key\",\n  base_url=\"http://localhost:4000/v1\")\nresponse = client.chat.completions.create(\n  model=\"qwen3-local\",\n  messages=[{\"role\": \"user\", \"content\": \"Hello!\"}])\n```\n\n| LiteLLM + Ollama | Direct Cloud APIs | |\n|---|---|---|\n| Gateway | Free, self-hosted | Free |\n| Local inference | $0 | N/A |\n| Model switching | One endpoint | Multiple SDKs |\n| Failover | Automatic | Manual |\n\nFull guide with advanced config examples: [https://everylocalai.com/stack/litellm-ollama-gateway](https://everylocalai.com/stack/litellm-ollama-gateway)", "url": "https://wpnews.pro/news/build-a-unified-ai-gateway-with-litellm-and-ollama", "canonical_source": "https://dev.to/everylocalai/build-a-unified-ai-gateway-with-litellm-and-ollama-387a", "published_at": "2026-06-14 21:54:58+00:00", "updated_at": "2026-06-14 22:10:43.730727+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "developer-tools", "ai-products"], "entities": ["LiteLLM", "Ollama", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/build-a-unified-ai-gateway-with-litellm-and-ollama", "markdown": "https://wpnews.pro/news/build-a-unified-ai-gateway-with-litellm-and-ollama.md", "text": "https://wpnews.pro/news/build-a-unified-ai-gateway-with-litellm-and-ollama.txt", "jsonld": "https://wpnews.pro/news/build-a-unified-ai-gateway-with-litellm-and-ollama.jsonld"}}