Show HN: LLMhop – A tiny, stateless router for LLMs with a NixOS module A developer has released LLMhop, a stateless HTTP router that directs OpenAI-compatible API requests to the appropriate LLM inference backend based on the model name in the request body. The single-binary tool, built in pure Go with no external dependencies, can route requests to multiple backends including vLLM, Ollama, and OpenAI itself, and ships with a NixOS module for hardened deployment. The project addresses the need for a lightweight, model-aware gateway when running multiple single-model inference servers behind a unified endpoint. One port, many models: A tiny, stateless HTTP router for OpenAI-compatible LLM inference backends. LLMhop peeks at the model field of an incoming OpenAI-compatible request and reverse-proxies it to the matching backend. It is primarily designed for single-model inference servers like vLLM https://github.com/vllm-project/vllm and sglang https://github.com/sgl-project/sglang that serve one model per process and need a thin model-aware gateway in front of them, but it works with any OpenAI-compatible backend including multi-model servers and hosted providers whenever you want to consolidate several upstreams behind a single endpoint. - OpenAI-compatible reverse proxy, model router and request dispatcher for self-hosted LLM inference. - Stateless single-binary HTTP service: no database, no cache, no background workers, safe behind any load balancer. - Zero external dependencies: pure Go, no third-party packages, no CGO. - Works with any OpenAI API-compatible backend, self-hosted or remote: vLLM, sglang, TabbyAPI, Aphrodite, Ollama, LocalAI, OpenRouter, together.ai, DeepInfra, etc. - Ships as a static binary, a minimal Docker image and a hardened NixOS module that can optionally spin up llama.cpp, sglang or vLLM workers alongside the router. - Client sends a request with a JSON body containing {"model": "..."} . - LLMhop reads the model field and looks it up in its config. - The request is forwarded verbatim to the configured backend URL. - Unknown models return 404 . LLMhop can optionally gate incoming requests with a list of bearer tokens and inject per-model Authorization or any other headers when forwarding to the backend. Both sides are opt-in: leave authTokens and models. .headers unset and headers are forwarded verbatim. When authTokens is set, the router validates the incoming Authorization: Bearer