{"slug": "show-hn-wyolet-relay-high-throughput-open-source-llm-router", "title": "Show HN: Wyolet Relay – high throughput, open source LLM router", "summary": "Wyolet released Wyolet Relay, an open-source, self-hosted LLM router that provides a single OpenAI- and Anthropic-compatible endpoint for multiple providers, enabling automatic failover, rate-limit pooling, and cost tracking. The tool, available under Apache-2.0, supports 400+ models and can be deployed via Docker with sub-2ms added latency.", "body_md": "**One endpoint in front of every LLM provider.**\n\nSelf-hosted, bring-your-own-keys, built for scale.\n\n[ Quickstart](https://docs.wyolet.com/quickstart) ·\n\n[Docs](https://docs.wyolet.com)·\n\n[Discord](https://discord.gg/KUhJ8X3w)·\n\n[X](https://x.com/wyolethq)·\n\n[Bluesky](https://bsky.app/profile/wyolet.bsky.social)·\n\nWyolet Relay puts a single **OpenAI- and Anthropic-compatible** endpoint in front\nof every provider you use. Pool your own API keys for automatic failover and\nhigher effective rate limits, see exactly what every request costs, and run the\nwhole thing on your own infrastructure — a drop-in for the SDK code you already\nhave.\n\nStart a full relay — API, admin UI, database, and a pre-seeded model catalog — in one command:\n\n```\ndocker run -p 8080:8080 -p 8081:8081 wyolet/relay:standalone\n```\n\nOpen the admin UI at ** http://localhost:8081**, then let the setup wizard walk you\nthrough adding a provider key and minting a relay key. Now call it like the\nOpenAI API:\n\n```\ncurl http://localhost:8080/openai/v1/chat/completions \\\n  -H \"Authorization: Bearer <your-relay-key>\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"gpt-4o\",\"messages\":[{\"role\":\"user\",\"content\":\"hello\"}]}'\n```\n\nThat's it. The full walkthrough, configuration, and production deployment guides\nlive at ** docs.wyolet.com**.\n\n**One API, every provider.** OpenAI- and Anthropic-shape endpoints in front of OpenAI, Anthropic, Bedrock, Vertex, Azure, Ollama, Groq — anything speaking either wire format. No code changes to switch upstreams.**Disposable, rate-limited keys.** Mint relay keys scoped to whatever limits you set. Hand them out freely — even if one leaks, the damage is capped at those limits and your real provider keys are never exposed.**Pool accounts and providers.** Combine many keys, accounts, or providers into one pool behind a single endpoint. Relay load-balances and fails over across them, so per-account rate limits stop being your ceiling.**Per-key access control.** Decide exactly which models and providers each relay key may reach — allow or deny at the key level via policies.**400+ models, open catalog.** Ships knowing 400+ models out of the box, and the[catalog](https://github.com/wyolet/relay-catalog)is open and extensible — we add hosts and models on demand.**Batch processing***(in progress)*. Batch requests against any provider — Relay simulates batching where there's no native API, and routes through the native one (OpenAI, Gemini, Anthropic) where it exists, passing the cost discount straight through. Configure a webhook to fire when a batch completes.**Proxy mode.** Point Relay at a provider with your own upstream key and use it as a transparent proxy — no policy enforcement, just full usage, cost, and payload logging.**Usage & cost tracking.** Every request is metered and stored in Postgres or ClickHouse. Optional full request/response payload capture (off by default).**Metrics & logs.** First-class Prometheus`/metrics`\n\nand structured JSON logs. (OpenTelemetry tracing is on the way.)**Self-hostable, built for scale.** Bring your own keys; nothing phones home. Sub-2 ms added latency, thousands of requests/sec per pod, Kubernetes-native.\n\nRelay runs two listeners: a **data plane** that accepts your inference requests\nand a **control plane** that serves the admin UI and API. Each request is\nauthenticated by a relay key, matched to a **policy** that decides which models\nand providers it may reach, rate-limited, and routed to a healthy upstream key\nfrom the **pool** — then streamed straight back to you. Provider, model, and\npricing data comes from an open, versioned\n[catalog](https://github.com/wyolet/relay-catalog), so a fresh container already\nknows hundreds of models.\n\nWant the full architecture, API reference, and configuration?\n→ [docs.wyolet.com](https://docs.wyolet.com)\n\nRelay is Apache-2.0 — free to use, self-host, and build on, in commercial and\nclosed-source products alike. Want managed hosting, enterprise builds, or priority\nsupport instead of running it yourself? We're happy to talk:\n** business@wyolet.com**.\n\nIssues and pull requests are welcome. See [CONTRIBUTING.md](/wyolet/relay/blob/main/CONTRIBUTING.md) for\nthe build, test, and PR workflow.\n\n[Apache-2.0](/wyolet/relay/blob/main/LICENSE). Use it in anything — commercial, closed-source, hosted, or\nembedded — no copyleft strings attached. See\n[Commercial support](#commercial-support) if you'd rather we run or support it for\nyou.", "url": "https://wpnews.pro/news/show-hn-wyolet-relay-high-throughput-open-source-llm-router", "canonical_source": "https://github.com/wyolet/relay", "published_at": "2026-06-19 13:36:00+00:00", "updated_at": "2026-06-19 14:08:37.340806+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "developer-tools"], "entities": ["Wyolet", "OpenAI", "Anthropic", "Bedrock", "Vertex", "Azure", "Ollama", "Groq"], "alternates": {"html": "https://wpnews.pro/news/show-hn-wyolet-relay-high-throughput-open-source-llm-router", "markdown": "https://wpnews.pro/news/show-hn-wyolet-relay-high-throughput-open-source-llm-router.md", "text": "https://wpnews.pro/news/show-hn-wyolet-relay-high-throughput-open-source-llm-router.txt", "jsonld": "https://wpnews.pro/news/show-hn-wyolet-relay-high-throughput-open-source-llm-router.jsonld"}}