Show HN: Wyolet Relay – high throughput, open source LLM router

wpnews.pro

cd /news/large-language-models/show-hn-wyolet-relay-high-throughput… · home › topics › large-language-models › article

[ARTICLE · art-34005] src=github.com ↗ pub=2026-06-19T13:36Z topic=large-language-models verified=true sentiment=↑ positive

Show HN: Wyolet Relay – high throughput, open source LLM router

Wyolet released Wyolet Relay, an open-source, self-hosted LLM router that provides a single OpenAI- and Anthropic-compatible endpoint for multiple providers, enabling automatic failover, rate-limit pooling, and cost tracking. The tool, available under Apache-2.0, supports 400+ models and can be deployed via Docker with sub-2ms added latency.

read3 min views2 publishedJun 19, 2026

Show HN: Wyolet Relay – high throughput, open source LLM router — Image: source

One endpoint in front of every LLM provider.

Self-hosted, bring-your-own-keys, built for scale.

Quickstart ·

Docs·

Discord·

X·

Bluesky·

Wyolet Relay puts a single OpenAI- and Anthropic-compatible endpoint in front of every provider you use. Pool your own API keys for automatic failover and higher effective rate limits, see exactly what every request costs, and run the whole thing on your own infrastructure — a drop-in for the SDK code you already have.

Start a full relay — API, admin UI, database, and a pre-seeded model catalog — in one command:

docker run -p 8080:8080 -p 8081:8081 wyolet/relay:standalone

Open the admin UI at ** http://localhost:8081**, then let the setup wizard walk you through adding a provider key and minting a relay key. Now call it like the OpenAI API:

curl http://localhost:8080/openai/v1/chat/completions \
  -H "Authorization: Bearer <your-relay-key>" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

That's it. The full walkthrough, configuration, and production deployment guides live at ** docs.wyolet.com**.

One API, every provider. OpenAI- and Anthropic-shape endpoints in front of OpenAI, Anthropic, Bedrock, Vertex, Azure, Ollama, Groq — anything speaking either wire format. No code changes to switch upstreams.Disposable, rate-limited keys. Mint relay keys scoped to whatever limits you set. Hand them out freely — even if one leaks, the damage is capped at those limits and your real provider keys are never exposed.Pool accounts and providers. Combine many keys, accounts, or providers into one pool behind a single endpoint. Relay load-balances and fails over across them, so per-account rate limits stop being your ceiling.Per-key access control. Decide exactly which models and providers each relay key may reach — allow or deny at the key level via policies.400+ models, open catalog. Ships knowing 400+ models out of the box, and thecatalogis open and extensible — we add hosts and models on demand.Batch processing*(in progress)*. Batch requests against any provider — Relay simulates batching where there's no native API, and routes through the native one (OpenAI, Gemini, Anthropic) where it exists, passing the cost discount straight through. Configure a webhook to fire when a batch completes.Proxy mode. Point Relay at a provider with your own upstream key and use it as a transparent proxy — no policy enforcement, just full usage, cost, and payload logging.Usage & cost tracking. Every request is metered and stored in Postgres or ClickHouse. Optional full request/response payload capture (off by default).Metrics & logs. First-class Prometheus/metrics

and structured JSON logs. (OpenTelemetry tracing is on the way.)Self-hostable, built for scale. Bring your own keys; nothing phones home. Sub-2 ms added latency, thousands of requests/sec per pod, Kubernetes-native.

Relay runs two listeners: a data plane that accepts your inference requests and a control plane that serves the admin UI and API. Each request is authenticated by a relay key, matched to a policy that decides which models and providers it may reach, rate-limited, and routed to a healthy upstream key from the pool — then streamed straight back to you. Provider, model, and pricing data comes from an open, versioned catalog, so a fresh container already knows hundreds of models.

Want the full architecture, API reference, and configuration? → docs.wyolet.com

Relay is Apache-2.0 — free to use, self-host, and build on, in commercial and closed-source products alike. Want managed hosting, enterprise builds, or priority support instead of running it yourself? We're happy to talk: ** business@wyolet.com**.

Issues and pull requests are welcome. See CONTRIBUTING.md for the build, test, and PR workflow.

Apache-2.0. Use it in anything — commercial, closed-source, hosted, or embedded — no copyleft strings attached. See Commercial support if you'd rather we run or support it for you.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-wyolet-relay-hig…

Read original on github.com → github.com/wyolet/relay

mentioned entities

Wyolet

OpenAI

Anthropic

Bedrock

Vertex

Azure

Ollama

Groq

metadata

slugshow-hn-wyolet-relay-high-throughput-open-source-llm-router

topic#large-language-models

secondary2 topics

sentimentpositive

canonicalgithub.com

navigation

← prevThe Root Cause of Never Learning

next →Audiophile Sennheiser headphones…

── more in #large-language-models 4 stories · sorted by recency

letsdatascience.com · 19 Jun · #large-language-models

Server-Side Tools Reshape AI Agent Architecture and Latency

dev.to · 19 Jun · #large-language-models

How I Slashed AI API Costs 60% as a Cloud Architect

runtimewire.com · 19 Jun · #large-language-models

Jack Dorsey's Block says Builderbot now accounts for 15% of its production code changes

transformernews.ai · 19 Jun · #large-language-models

Is Fable the wakeup call DC needed?

── more on @wyolet 3 stories trending now

wpnews · 18 Jun · #ai-chips

Apple and Intel join forces in Trump’s push to bring chipmaking home

wpnews · 18 Jun · #ai-agents

How to Automate Business Reports With an AI Agent Instead of Dashboards

wpnews · 18 Jun · #large-language-models

ICYMI: ZAI launches GLM-5.2 open model with 1M context

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required