Turn ~800M Free AI Tokens Into a Single OpenAI API with FreeLLMAPI

FreeLLMAPI, a self-hosted proxy that aggregates free tiers from 14 AI providers (such as Gemini, Groq, and Mistral) into a single OpenAI-compatible API endpoint, offering approximately 800 million free tokens per month combined. It features automatic failover, per-key rate tracking, and an admin dashboard, but is intended for personal use only, with limitations including no tool calling or vision support and unpredictable latency. The project is open-source under the MIT license and aims to simplify prototyping for developers and researchers without upfront costs.

The Problem Nobody Talks About Every major AI lab now offers a free tier. Gemini, Groq, Mistral, Cerebras — they all give you a few million tokens a month, a few thousand requests a day. On paper, that's generous. In practice, you end up juggling 14 different SDKs, 14 rate limits, and 14 places a request can silently fail. FreeLLMAPI solves exactly that. What It Does It's a self-hosted proxy that aggregates free tiers from 14 providers behind a single /v1/chat/completions endpoint — fully compatible with the OpenAI SDK. Supported providers: | Provider | Notable Models | |---|---| | Google Gemini | 2.5 Pro / Flash | | Groq | Llama 4, Qwen, Kimi | | Cerebras | Llama 3.3, Qwen | | SambaNova | Llama 3.3 70B | | NVIDIA NIM | Full catalog | | Mistral | La Plateforme | | OpenRouter | Free-tier models | | GitHub Models | GPT-4o, Llama, Phi | | Hugging Face | Inference Providers | | Cloudflare | Workers AI | | Zhipu | GLM-4 series | | Moonshot | Kimi | | MiniMax | abab / hailuo | Combined: roughly ~800M tokens/month across all providers. Zero Code Changes Point your existing OpenAI SDK at localhost:3001/v1 : python from openai import OpenAI client = OpenAI base url="http://localhost:3001/v1", api key="freellmapi-your-unified-key", resp = client.chat.completions.create model="auto", router picks the best available messages= {"role": "user", "content": "Summarise the fall of Rome in one sentence."} , print resp.choices 0 .message.content print "Routed via:", resp.headers.get "x-routed-via" That's it. Every response includes an X-Routed-Via header so you know which provider actually served the request. Technical Highlights Automatic failover — On 429 / timeout / 5xx, the router cools down the key and retries the next provider in your chain, up to 20 attempts. Sticky sessions — Multi-turn conversations stay on the same model for 30 minutes. This matters more than it sounds — switching models mid-conversation causes subtle hallucination spikes. Per-key rate tracking — RPM, RPD, TPM, and TPD counters per platform, model, key . The router always picks a key that's under its caps. Encrypted key storage — AES-256-GCM before hitting SQLite. Upstream provider keys never leave your machine. Admin dashboard — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and test prompts in a playground. Lightweight — Runs on a Raspberry Pi 4 at ~40MB RAM idle. Setup in 3 Lines git clone https://github.com/tashfeenahmed/freellmapi cd freellmapi && npm install cp .env.example .env && npm run dev Open localhost:5173 , add your provider API keys, grab your unified key → done. The Honest Part A few things the README says clearly, and you should know upfront: Intelligence degrades throughout the day. Gemini 2.5 Pro and GPT-4o via GitHub Models have the lowest daily caps. Once they're exhausted, the router falls back to smaller models. Expect effective quality to drop in the late hours — then reset at UTC midnight. Tool calling and vision are not yet supported. Text-only for now. PRs are welcome. Latency is unpredictable. Cerebras and Groq are extremely fast. Others are not. You get whichever one is available. Personal use only. No multi-tenant auth. Don't expose this to the internet. Free tiers change without notice. When a provider tightens limits, you'll see 429s until the catalog is updated. Who This Is For ✅ Building AI agents or coding assistants and want to prototype without spending money upfront ✅ Researchers and students who hit rate limits on one provider and want seamless fallback ✅ Anyone tired of maintaining multiple SDK integrations ❌ Production workloads — use a paid API with an SLA Quick ToS Note The project includes a detailed review of each provider's terms. Most are fine for single-user personal use. Notable exceptions: Cohere's trial ToS explicitly forbids personal/household use , and NVIDIA NIM's free tier is scoped to evaluation only . Read the full table in the README before adding keys. FreeLLMAPI is MIT licensed and actively welcoming contributors — especially for adding embeddings, tool calling, and new providers.