Run your own local LLM with rate limits via API-keys

wpnews.pro

cd /news/large-language-models/run-your-own-local-llm-with-rate-lim… · home › topics › large-language-models › article

[ARTICLE · art-15630] src=github.com ↗ pub=2026-05-27T18:39Z topic=large-language-models verified=true sentiment=· neutral

Run your own local LLM with rate limits via API-keys

A developer released a small Ruby prototype for an OpenAI-compatible LLM proxy that enforces per-user rate limits using a refillable token bucket system. The proxy, built entirely with Ruby standard libraries and no external dependencies, assigns each bearer token its own bucket and returns an OpenAI-style limit message when tokens are exhausted. The tool allows users to control token refill rates and costs, making it suitable for managing access to local LLM instances.

read1 min views15 publishedMay 27, 2026

Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket.

It uses only Ruby standard libraries: no gems, no Rack, no WEBrick.

BASE_API_URL=http://192.168.0.124:8888/v1 \
BASE_API_KEY=1mmer \
BASE_MODEL=gemma4 \
ruby llm_proxy.rb

The proxy listens on 0.0.0.0:8899

by default.

For your local LLM at 192.168.0.124:8888

, run the saved local setup:

./run_local_proxy.sh

That starts the Ruby proxy at http://127.0.0.1:8899/v1

and forwards to http://192.168.0.124:8888/v1

The saved local curl check is:

./curl_local_proxy.sh

Manual equivalent:

curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gemma4",
    "messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}],
    "max_tokens": 16
  }'

Verified result through the proxy: the upstream replied with proxy ok

and the proxy returned X-RateLimit-Remaining: 0

with the local test bucket.

Run the smoke test:

ruby test_llm_proxy.rb
MAX_TOKENS=10                 # max saved tokens per user
REFILL_TOKENS=2               # tokens added each refill
REFILL_INTERVAL_SECONDS=300   # 5 minutes
REQUEST_TOKEN_COST=1          # cost per accepted completion request

Each bearer token gets its own bucket. Requests without a bearer token are bucketed by remote IP. Set PROXY_API_KEYS=key1,key2

if the proxy should reject unknown client keys.

When the bucket is empty, /v1/chat/completions

and /v1/completions

return a normal OpenAI-style assistant response:

limit reached, wait 5 min
curl http://localhost:8888/v1/chat/completions \
  -H 'Authorization: Bearer user-a' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "anything",
    "messages": [{"role": "user", "content": "hello"}]
  }'

By default, one completion request costs REQUEST_TOKEN_COST

bucket tokens. To charge roughly by prompt size plus expected output:

TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb

This is only an approximation for the prototype.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/run-your-own-local-llm-w…

Read original on github.com → github.com/skorotkiewicz/llm-rt

mentioned entities

Ruby

OpenAI

metadata

slugrun-your-own-local-llm-with-rate-limits-via-api-keys

topic#large-language-models

secondary2 topics

sentimentneutral

canonicalgithub.com

navigation

← prevYour SEO strategy is optimized f…

next →Open-Sourcing FastVideo Dreamver…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 12 Jul · #large-language-models

How to Stop AI Agent Cost Blowups Before They Happen

gist.github.com · 12 Jul · #large-language-models

vllm locally on 5060Ti 16GB x 2

dev.to · 12 Jul · #large-language-models

The 3-step smoke test I use for any OpenAI-compatible API

dev.to · 12 Jul · #large-language-models

Open-Weight LLM API Integration: Your Practical Guide to Connecting and Calling Community Models

── more on @ruby 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 8 Jul · #artificial-intelligence

xAI Launches Grok 4.5 With Pricing Built to Undercut Anthropic's Opus 4.8

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required