llama-dash - Local LLM Ops

A developer built llama-dash, a dashboard and logging proxy for self-hosted local LLM inference stacks. It proxies OpenAI/Anthropic-compatible endpoints, logs requests with token counts and cost estimates, and adds API key management, rate limits, model allow-lists, and UI-based model control. The tool ships as a Docker Compose stack and can be used with tools like Claude Code.

I've been building llama-dash, a single-pane dashboard and logging proxy for a self-hosted local inference stack. I run llama-swap + llama.cpp on a box at home and got tired of having zero visibility — no request log, no idea which model was loaded when, no way to hand out scoped access without exposing the raw backend. So llama-dash sits in front as one public port: it proxies the OpenAI/Anthropic-compatible /v1/ endpoints unchanged streaming SSE passes straight through , logs every request with token counts and cost estimates, and adds the stuff llama-swap doesn't have — hashed API keys, per-key rate limits and model allow-lists, routing rules, and model load/unload from the UI. The bit I like most is that you can point Claude Code at it via ANTHROPIC BASE URL and watch your own usage flow through. It ships as a Docker Compose stack with the backend hidden internally.