Llama.cpp now has an official website: llama.app The open-source AI inference engine llama.cpp has launched an official website at llama.app, providing users with a streamlined installation process via a single curl command. The platform enables local AI model execution without API keys, telemetry, or usage limits, and supports optimized performance across hardware from laptops to clusters. llama.app ./ GitHub 112.2K https://github.com/ggml-org/llama.cpp curl -LsSf https://llama.app/install.sh | sh Prefer Brew or Winget? Package managers https://github.com/ggml-org/llama.cpp/blob/master/docs/install.md Rather build from source? Follow instructions https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md AI that lives on your computer. Open-source, private, always local. Run frontier AI entirely on your machine. No API keys, no telemetry, no limits. Take AI back. 1. Serve a model llama serve 2. Install the pi-llama plugin pi install git:github.com/huggingface/pi-llama 3. Run Pi, everything is set pi Pair it with a local coding agent. Run llama serve , then launch Pi https://github.com/badlogic/pi-mono . It auto-discovers your local model. No config, no API keys. Files stay on your machine, requests never leave it. Optimized for any hardware. From your laptop to a cluster, llama.cpp runs on whatever you have. Same binary, same models, same hand-tuned kernels for every GPU and CPU. Apple Silicon M Ultra RTX 5090 H100 MI300 RTX 4090 M Max A100 DGX Spark T4 Jetson B200 Intel Arc CPU Radeon RX M Pro RTX 3090