Show HN: Self hosting a modern LLM stack Llmaker, an open-source platform for self-hosting a complete LLM stack including models, vector databases, embeddings, caching, observability, and an agent layer, launched on Hacker News. The platform provisions and manages the entire stack from a single CLI command, enabling private retrieval-augmented chatbots and recommendation engines without third-party APIs. It aims to eliminate the complexity of assembling and networking multiple containerized services for production LLM applications. llmaker is an open-source platform for running the complete modern LLM stack on your own infrastructure — large language models, vector databases, embeddings, caching, observability, and a built-in retrieval & agent layer — provisioned, networked, and production-shaped from a single command. Build private retrieval-augmented chatbots, FAQ assistants, and recommendation engines locally. No third-party API keys. No data leaving your machine. Quickstart quickstart · Why llmaker why-self-host-your-llm-stack · Stacks stacks · The agent the-agent · Architecture architecture · CLI cli-reference · Roadmap roadmap Running a model locally is easy. Shipping an application is not. A production retrieval system needs a vector database, an embeddings service, a caching layer, an orchestration layer, and observability — each containerized, networked, and configured to discover the others. Assembling that is a recurring tax: a sprawl of docker run flags, a brittle Compose file, and hundreds of lines of framework glue. llmaker removes that tax. One CLI provisions the entire stack on a private network and operates it as a single fleet — live status, logs, and a resource dashboard across every model and service. Stacks are declarative and reconcilable apply --prune , models are OpenAI-compatible , and retrieval is traced out of the box . From a single model to a complete application: ── Build a complete application stack ────────────────────────── llmaker stack up assistant one command → a private ChatGPT-style UI over a local model llmaker stack init rag …or scaffold any stack to edit, then apply it: llmaker apply assistant · voice · rag · research · code · chatbot · faq · recommend · sql ── …or run a single model OpenAI-compatible ────────────────── llmaker up --model llama3:8b a local endpoint — explicit, or a preset: llmaker up chat chat · code · small · embed · vision llmaker chat