# Turn ~800M Free AI Tokens Into a Single OpenAI API with FreeLLMAPI

> Source: <https://dev.to/mervindublin/turn-800m-free-ai-tokens-into-a-single-openai-api-with-freellmapi-2gm9>
> Published: 2026-05-21 08:21:17+00:00

## The Problem Nobody Talks About

Every major AI lab now offers a free tier. Gemini, Groq, Mistral, Cerebras — they all give you a few million tokens a month, a few thousand requests a day.

On paper, that's generous. In practice, you end up juggling 14 different SDKs, 14 rate limits, and 14 places a request can silently fail.

**FreeLLMAPI** solves exactly that.

## What It Does

It's a self-hosted proxy that aggregates free tiers from 14 providers behind a **single /v1/chat/completions endpoint** — fully compatible with the OpenAI SDK.

Supported providers:

| Provider | Notable Models |
|---|---|
| Google Gemini | 2.5 Pro / Flash |
| Groq | Llama 4, Qwen, Kimi |
| Cerebras | Llama 3.3, Qwen |
| SambaNova | Llama 3.3 70B |
| NVIDIA NIM | Full catalog |
| Mistral | La Plateforme |
| OpenRouter | Free-tier models |
| GitHub Models | GPT-4o, Llama, Phi |
| Hugging Face | Inference Providers |
| Cloudflare | Workers AI |
| Zhipu | GLM-4 series |
| Moonshot | Kimi |
| MiniMax | abab / hailuo |

Combined: roughly **~800M tokens/month** across all providers.

## Zero Code Changes

Point your existing OpenAI SDK at `localhost:3001/v1`

:

``` python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",
)

resp = client.chat.completions.create(
    model="auto",  # router picks the best available
    messages=[{"role": "user", "content": "Summarise the fall of Rome in one sentence."}],
)

print(resp.choices[0].message.content)
print("Routed via:", resp.headers.get("x-routed-via"))
```

That's it. Every response includes an `X-Routed-Via`

header so you know which provider actually served the request.

## Technical Highlights

**Automatic failover** — On 429 / timeout / 5xx, the router cools down the key and retries the next provider in your chain, up to 20 attempts.

**Sticky sessions** — Multi-turn conversations stay on the same model for 30 minutes. This matters more than it sounds — switching models mid-conversation causes subtle hallucination spikes.

**Per-key rate tracking** — RPM, RPD, TPM, and TPD counters per `(platform, model, key)`

. The router always picks a key that's under its caps.

**Encrypted key storage** — AES-256-GCM before hitting SQLite. Upstream provider keys never leave your machine.

**Admin dashboard** — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and test prompts in a playground.

**Lightweight** — Runs on a Raspberry Pi 4 at ~40MB RAM idle.

## Setup in 3 Lines

```
git clone https://github.com/tashfeenahmed/freellmapi
cd freellmapi && npm install
cp .env.example .env && npm run dev
```

Open `localhost:5173`

, add your provider API keys, grab your unified key → done.

## The Honest Part

A few things the README says clearly, and you should know upfront:

**Intelligence degrades throughout the day.** Gemini 2.5 Pro and GPT-4o (via GitHub Models) have the lowest daily caps. Once they're exhausted, the router falls back to smaller models. Expect effective quality to drop in the late hours — then reset at UTC midnight.

**Tool calling and vision are not yet supported.** Text-only for now. PRs are welcome.

**Latency is unpredictable.** Cerebras and Groq are extremely fast. Others are not. You get whichever one is available.

**Personal use only.** No multi-tenant auth. Don't expose this to the internet.

**Free tiers change without notice.** When a provider tightens limits, you'll see 429s until the catalog is updated.

## Who This Is For

✅ Building AI agents or coding assistants and want to prototype without spending money upfront

✅ Researchers and students who hit rate limits on one provider and want seamless fallback

✅ Anyone tired of maintaining multiple SDK integrations

❌ Production workloads — use a paid API with an SLA

## Quick ToS Note

The project includes a detailed review of each provider's terms. Most are fine for single-user personal use. Notable exceptions: **Cohere's trial ToS explicitly forbids personal/household use**, and **NVIDIA NIM's free tier is scoped to evaluation only**.

Read the full table in the README before adding keys.

FreeLLMAPI is MIT licensed and actively welcoming contributors — especially for adding embeddings, tool calling, and new providers.