cd /news/artificial-intelligence/turn-800m-free-ai-tokens-into-a-sing… Β· home β€Ί topics β€Ί artificial-intelligence β€Ί article
[ARTICLE Β· art-4697] src=dev.to pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Turn ~800M Free AI Tokens Into a Single OpenAI API with FreeLLMAPI

FreeLLMAPI, a self-hosted proxy that aggregates free tiers from 14 AI providers (such as Gemini, Groq, and Mistral) into a single OpenAI-compatible API endpoint, offering approximately 800 million free tokens per month combined. It features automatic failover, per-key rate tracking, and an admin dashboard, but is intended for personal use only, with limitations including no tool calling or vision support and unpredictable latency. The project is open-source under the MIT license and aims to simplify prototyping for developers and researchers without upfront costs.

read3 min views7 publishedMay 21, 2026

The Problem Nobody Talks About #

Every major AI lab now offers a free tier. Gemini, Groq, Mistral, Cerebras β€” they all give you a few million tokens a month, a few thousand requests a day.

On paper, that's generous. In practice, you end up juggling 14 different SDKs, 14 rate limits, and 14 places a request can silently fail.

FreeLLMAPI solves exactly that.

What It Does #

It's a self-hosted proxy that aggregates free tiers from 14 providers behind a single /v1/chat/completions endpoint β€” fully compatible with the OpenAI SDK.

Supported providers:

Provider Notable Models
Google Gemini 2.5 Pro / Flash
Groq Llama 4, Qwen, Kimi
Cerebras Llama 3.3, Qwen
SambaNova Llama 3.3 70B
NVIDIA NIM Full catalog
Mistral La Plateforme
OpenRouter Free-tier models
GitHub Models GPT-4o, Llama, Phi
Hugging Face Inference Providers
Cloudflare Workers AI
Zhipu GLM-4 series
Moonshot Kimi
MiniMax abab / hailuo

Combined: roughly ~800M tokens/month across all providers.

Zero Code Changes #

Point your existing OpenAI SDK at localhost:3001/v1

:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",
)

resp = client.chat.completions.create(
    model="auto",  # router picks the best available
    messages=[{"role": "user", "content": "Summarise the fall of Rome in one sentence."}],
)

print(resp.choices[0].message.content)
print("Routed via:", resp.headers.get("x-routed-via"))

That's it. Every response includes an X-Routed-Via

header so you know which provider actually served the request.

Technical Highlights #

Automatic failover β€” On 429 / timeout / 5xx, the router cools down the key and retries the next provider in your chain, up to 20 attempts.

Sticky sessions β€” Multi-turn conversations stay on the same model for 30 minutes. This matters more than it sounds β€” switching models mid-conversation causes subtle hallucination spikes.

Per-key rate tracking β€” RPM, RPD, TPM, and TPD counters per (platform, model, key)

. The router always picks a key that's under its caps.

Encrypted key storage β€” AES-256-GCM before hitting SQLite. Upstream provider keys never leave your machine.

Admin dashboard β€” React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and test prompts in a playground.

Lightweight β€” Runs on a Raspberry Pi 4 at ~40MB RAM idle.

Setup in 3 Lines #

git clone https://github.com/tashfeenahmed/freellmapi
cd freellmapi && npm install
cp .env.example .env && npm run dev

Open localhost:5173

, add your provider API keys, grab your unified key β†’ done.

The Honest Part #

A few things the README says clearly, and you should know upfront:

Intelligence degrades throughout the day. Gemini 2.5 Pro and GPT-4o (via GitHub Models) have the lowest daily caps. Once they're exhausted, the router falls back to smaller models. Expect effective quality to drop in the late hours β€” then reset at UTC midnight.

Tool calling and vision are not yet supported. Text-only for now. PRs are welcome.

Latency is unpredictable. Cerebras and Groq are extremely fast. Others are not. You get whichever one is available.

Personal use only. No multi-tenant auth. Don't expose this to the internet.

Free tiers change without notice. When a provider tightens limits, you'll see 429s until the catalog is updated.

Who This Is For #

βœ… Building AI agents or coding assistants and want to prototype without spending money upfront

βœ… Researchers and students who hit rate limits on one provider and want seamless fallback

βœ… Anyone tired of maintaining multiple SDK integrations

❌ Production workloads β€” use a paid API with an SLA

Quick ToS Note #

The project includes a detailed review of each provider's terms. Most are fine for single-user personal use. Notable exceptions: Cohere's trial ToS explicitly forbids personal/household use, and NVIDIA NIM's free tier is scoped to evaluation only.

Read the full table in the README before adding keys.

FreeLLMAPI is MIT licensed and actively welcoming contributors β€” especially for adding embeddings, tool calling, and new providers.

── more in #artificial-intelligence 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/turn-800m-free-ai-to…] indexed:0 read:3min 2026-05-21 Β· β€”