Getting Started with Ollama: Run LLMs Locally in 10 Minutes Ollama provides a tool for running large language models locally on macOS, Linux, and Windows without requiring an API key or cloud service. The tool packages model weights, a runtime based on llama.cpp, and a CLI/REST API, enabling users to download and run models like Llama 3.2 with a single command. Ollama's library includes hundreds of models for various use cases, and it exposes a REST API on localhost:11434 for integration with other applications. If you've ever wanted to run a large language model on your own machine — no API key, no cloud bill, no data leaving your laptop — Ollama is the easiest way to get there. It packages model weights, a runtime built on llama.cpp , and a simple CLI/REST API into one tool that works the same way on macOS, Linux, and Windows. This guide covers installation, running your first model, the core commands you'll actually use, picking a model for your hardware, and hooking Ollama into your own code via its API. The tradeoff: local models are generally smaller and slightly behind frontier cloud models GPT, Claude, Gemini on raw capability — though the gap keeps shrinking fast. Download the app from ollama.com/download https://ollama.com/download , or use Homebrew: brew install ollama curl -fsSL https://ollama.com/install.sh | sh This installs the ollama binary and sets up a systemd service so it runs in the background. Check it's alive: systemctl status ollama Download OllamaSetup.exe from ollama.com/download https://ollama.com/download and run it — no admin rights required. Recent versions ship a full desktop app with a chat window, so you can skip the terminal entirely if you prefer. A native ARM64 build is also available for Windows-on-Arm devices. docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama Add --gpus=all if you have an NVIDIA GPU and the NVIDIA Container Toolkit installed. ollama --version ollama list An empty list is expected on a fresh install — it just confirms the daemon is up and responding. ollama run llama3.2 This pulls the model a few GB, one-time download and drops you into an interactive chat session. Type a prompt, hit enter, get a response. Ctrl+D or /bye exits. | Command | What it does | |---|---| ollama run