cd /news/ai-tools/show-hn-built-ai-gateway-reverse-pro… Β· home β€Ί topics β€Ί ai-tools β€Ί article
[ARTICLE Β· art-38828] src=github.com β†— pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Built AI-Gateway reverse proxy to reduce LLM API costs and token burn

A developer released AI-Gateway, an open-source reverse proxy that uses semantic caching to reduce LLM API costs by 40-70% with no code changes. The tool sits between apps and providers like OpenAI and Groq, caching responses to similar questions so repeated queries don't incur API calls. It supports multi-tenant isolation, Redis caching, and features like rate limiting and circuit breakers.

read5 min views1 publishedJun 25, 2026
Show HN: Built AI-Gateway reverse proxy to reduce LLM API costs and token burn
Image: source

Cut your LLM API costs by 40-70% with zero code changes.

A semantic caching layer that sits between your app and AI providers (OpenAI, Groq, etc.). When you ask a similar question twice, it returns the cached answer instantly instead of calling the API again.

You're building an AI app and your API bill is $500/month. 40-70% of that is for repeat questions:

  • "What is RAG?" asked 100 times = 100 API calls
  • "How do I reset my password?" asked 50 times = 50 API calls

With AI Gateway: Those 150 calls become 2 calls (one for each unique question). You save $200-350/month.

How was your deployment experience?

Takes 30 seconds. Helps us improve AI Gateway for everyone.

What we want to know:

  • ⭐ How did deployment go? (Excellent / Average / Bad)
  • πŸ› Any problems you faced?
  • πŸ’‘ What features would you like to see?
  • πŸ“Š How much are you saving on API costs?

Your feedback directly shapes the roadmap.

Steps:

  • Click the button above
  • Sign in with GitHub
  • Enter your API key (Groq or OpenAI)
  • Click "Deploy"
  • Done! Your gateway is live at https://your-app.up.railway.app

What you get:

  • βœ… Hosted gateway (no server management)
  • βœ… Redis included (persistent cache)
  • βœ… Auto-scaling
  • βœ… HTTPS enabled
  • βœ… $5/month free credit

Steps:

  • Click the button

  • Sign in with GitHub

  • Add environment variable: UPSTREAM_API_KEY=your_key

  • Click "Create Web Service"

  • Done!

Note: You'll need to add a Redis addon separately in Render dashboard.

Prerequisites:

  • Docker installed
  • Docker Compose installed
  • A Groq or OpenAI API key

Steps:

git clone https://github.com/Arnab758/ai-gateway.git
cd ai-gateway

export UPSTREAM_API_KEY=gsk_your_groq_key_here

docker compose up -d

curl http://localhost:8080/health

That's it! Your gateway is now running at http://localhost:8080

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Gateway-Token: my-app" \
  -H "Authorization: Bearer sk-your-openai-or-groq-key" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "What is RAG?"}]
  }'

python
import requests

GATEWAY_URL = "https://your-app.up.railway.app"
API_KEY = "sk-your-key"

response = requests.post(
    f"{GATEWAY_URL}/v1/chat/completions",
    headers={
        "Content-Type": "application/json",
        "X-Gateway-Token": "my-app",
        "Authorization": f"Bearer {API_KEY}"
    },
    json={
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "What is RAG?"}]
    }
)

print(response.json())
js
const response = await fetch('https://your-app.up.railway.app/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-Gateway-Token': 'my-app',
    'Authorization': 'Bearer sk-your-key'
  },
  body: JSON.stringify({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'What is RAG?' }]
  })
});

const data = await response.json();
console.log(data);

Semantic Caching- Matches similar questions, not just exact duplicates- "What is RAG?" = "Explain RAG" = "RAG definition"

Multi-Tenant- Each customer gets their own isolated cache** 4-Tier Matching:**- Exact match (100% identical)

  • Template match ("weather in London" = "weather in Paris")
  • Semantic match (similar meaning)
  • Word overlap (partial matches)

Redis + In-Memory Fallback- Works with or without Redis** Request Deduplication**- 100 concurrent identical requests = 1 API call** Rate Limiting**- Prevent abuse per tenant** Circuit Breaker**- Automatically stops calling if provider is down** Cost Tracking**- See how much you saved

Scenario: Customer support chatbot with 10,000 users

Without AI Gateway:

  • 10,000 users ask 100 common questions each
  • 1,000,000 API calls/month
  • Cost: $500/month (at $0.0005/call)

With AI Gateway:

  • First 100 questions: 100 API calls (cache miss)
  • Next 9,900 users asking same questions: 0 API calls (cache hit)
  • Total: 100 API calls/month
  • Cost: $0.05/month Savings: $499.95/month (99.99%)

Even with 30% unique questions:

  • 300,000 API calls
  • Cost: $150/month Savings: $350/month (70%)

Edit gateway.yaml

to customize:

cache:
  redis_url: "redis://localhost:6379"  # Or your Redis URL
  vector:
    enabled: true
    similarity_threshold: 0.85  # 85% similar = cache hit
  ttl_hours: 24  # Cache entries expire after 24 hours

rate_limiter:
  enabled: true
  max_requests: 60  # Per minute per tenant
Endpoint Method Description
/v1/chat/completions
POST Main proxy endpoint with caching
/health
GET Health check
/stats
GET Cache statistics
/metrics
GET Prometheus metrics
curl http://localhost:8080/stats

Response:

{
  "uptime": 1234567890,
  "cache": {
    "local_index_entries": 150,
    "vector_dimensions": 128,
    "vector_threshold": 0.85,
    "jaccard_threshold": 0.75,
    "template_enabled": true,
    "dedup_enabled": true,
    "ttl_hours": 24
  }
}

Every response includes cache information:

X-Gateway-Cache: HIT          # or MISS
X-Gateway-Similarity: 0.95    # 95% similar (if HIT)
X-Gateway-Time-Saved: 1234ms  # Time saved (if HIT)

Solution: Redis is optional! The gateway will fall back to in-memory cache automatically. For production, add Redis:

Railway: Add Redis from the "New" button Render: Add Redis from the "New" β†’ "Database" β†’ "Redis" Docker: Already included in docker-compose.yml

Cause: You're hitting rate limits on free tier (Groq/OpenAI)

Solutions:

  • Wait 1-2 minutes and try again
  • Upgrade to paid tier ($0.002/request vs free limits)
  • Add your own API key with higher limits

Cause: Too many requests from one tenant

Solution: Increase rate limits in gateway.yaml

:

rate_limiter:
  max_requests: 120  # Increase from 60
  window_minutes: 1

Cause: Prompts are too different

Solution: Lower the similarity threshold in gateway.yaml

:

cache:
  vector:
    similarity_threshold: 0.75  # Lower from 0.85
  jaccard:
    threshold: 0.65  # Lower from 0.75
Your App β†’ AI Gateway β†’ [Cache Check] β†’ Redis
                ↓
            [Cache HIT] β†’ Return cached response (instant, $0)
                ↓
            [Cache MISS] β†’ Call LLM Provider β†’ Cache response β†’ Return

Contributions are welcome! Please:

  • Fork the repo
  • Create a feature branch
  • Make your changes
  • Submit a pull request

MIT License - feel free to use this commercially!

Issues:GitHub Issues** Discussions:GitHub Discussions Demo:**Live Demo

If this project helps you, please give it a star! It helps others find it.

Built with ❀️ for the AI community

Questions? Open an issue and I'll respond within 24 hours.

── more in #ai-tools 4 stories Β· sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-built-ai-gat…] indexed:0 read:5min 2026-06-25 Β· β€”