Getting Started with Ollama: Run LLMs Locally in 10 Minutes

wpnews.pro

cd /news/large-language-models/getting-started-with-ollama-run-llms… · home › topics › large-language-models › article

[ARTICLE · art-42206] src=dev.to ↗ pub=2026-06-28T01:18Z topic=large-language-models verified=true sentiment=↑ positive

Getting Started with Ollama: Run LLMs Locally in 10 Minutes

Ollama provides a tool for running large language models locally on macOS, Linux, and Windows without requiring an API key or cloud service. The tool packages model weights, a runtime based on llama.cpp, and a CLI/REST API, enabling users to download and run models like Llama 3.2 with a single command. Ollama's library includes hundreds of models for various use cases, and it exposes a REST API on localhost:11434 for integration with other applications.

read5 min views1 publishedJun 28, 2026

If you've ever wanted to run a large language model on your own machine — no API key, no cloud bill, no data leaving your laptop — Ollama is the easiest way to get there. It packages model weights, a runtime (built on llama.cpp

), and a simple CLI/REST API into one tool that works the same way on macOS, Linux, and Windows.

This guide covers installation, running your first model, the core commands you'll actually use, picking a model for your hardware, and hooking Ollama into your own code via its API.

The tradeoff: local models are generally smaller and slightly behind frontier cloud models (GPT, Claude, Gemini) on raw capability — though the gap keeps shrinking fast.

Download the app from ollama.com/download, or use Homebrew:

brew install ollama
curl -fsSL https://ollama.com/install.sh | sh

This installs the ollama

binary and sets up a systemd service so it runs in the background. Check it's alive:

systemctl status ollama

Download OllamaSetup.exe

from ollama.com/download and run it — no admin rights required. Recent versions ship a full desktop app with a chat window, so you can skip the terminal entirely if you prefer. A native ARM64 build is also available for Windows-on-Arm devices.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Add --gpus=all

if you have an NVIDIA GPU and the NVIDIA Container Toolkit installed.

ollama --version
ollama list

An empty list is expected on a fresh install — it just confirms the daemon is up and responding.

ollama run llama3.2

This pulls the model (a few GB, one-time download) and drops you into an interactive chat session. Type a prompt, hit enter, get a response. Ctrl+D

or /bye

exits.

Command	What it does
`ollama run <model>`
Pull (if needed) and chat with a model
`ollama pull <model>`
Download a model without starting a chat
`ollama list`
Show models you have installed
`ollama ps`
Show models currently loaded in memory
`ollama show <model>`
Show details/parameters for a model
`ollama rm <model>`
Delete a model to free disk space
`ollama stop <model>`
Unload a model from memory
`ollama create <name> -f Modelfile`
Build a custom model from a Modelfile

Always pull with an explicit tag for anything you depend on (ollama pull qwen2.5-coder:7b

), since :latest

can change under you.

Ollama's library has hundreds of models. As a starting point:

Use case	Try	Rough RAM/VRAM
General daily driver, light hardware	`llama3.2:3b`
~4 GB
General daily driver, mid hardware
`llama3.1:8b` or `qwen3:8b`

~6–8 GB
Coding
`qwen2.5-coder:7b` or `qwen3-coder:30b` (MoE, runs lighter than its size suggests)
6–20 GB
Reasoning / math / step-by-step logic
`deepseek-r1:7b` or `:14b`

6–12 GB
Best quality you can fit on a single consumer GPU
`qwen3.6:27b` or `gpt-oss:20b`

~16–24 GB
Vision (images + text)
`llava` or `gemma3:12b`

8–16 GB
Embeddings (for RAG / semantic search)	`nomic-embed-text`
<1 GB

Rule of thumb for sizing: a 7–8B model at Q4 quantization needs roughly 5–6 GB of memory; rough numbers, not gospel. Mixture-of-experts models (the ones with an "active/total" split, like qwen3-coder:30b

) only run a fraction of their listed size at inference time, so they're often faster than their parameter count implies — but they still need the full model in memory, not just the active slice. Always check ollama.com/library

for the current tag list, since model lineups change weekly.

If you're not sure where to start: pull a small model, use it for a week on your actual tasks, and let what it struggles with point you toward the next one.

Ollama exposes a REST API on localhost:11434

— this is how every IDE plugin, chat UI, and framework talks to it under the hood.

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{ "role": "user", "content": "Explain Ollama in one sentence." }],
  "stream": false
}'

It also exposes an OpenAI-compatible endpoint, so anything built for the OpenAI SDK can point at Ollama with a base URL change:

http://localhost:11434/v1/chat/completions
pip install ollama
python
from ollama import chat

response = chat(model='llama3.2', messages=[
    {'role': 'user', 'content': 'Why is the sky blue?'}
])
print(response.message.content)

Want a model with a fixed system prompt or different default parameters? Create a Modelfile

FROM llama3.2

PARAMETER temperature 0.7
PARAMETER num_ctx 4096

SYSTEM """
You are a terse code reviewer. Point out bugs and style issues only — no praise, no fluff.
"""

Build it:

ollama create code-reviewer -f Modelfile
ollama run code-reviewer

Now code-reviewer

is its own model in ollama list

, with your settings baked in.

127.0.0.1

. Setting OLLAMA_HOST=0.0.0.0

exposes the API to your whole network with OLLAMA_NUM_PARALLEL

and OLLAMA_MAX_LOADED_MODELS

control concurrency if you're serving more than one model.num_ctx

deliberately in a Modelfile instead of leaving it at whatever default your VRAM tier triggers.ollama ps

— it shows whether a model is running on CPU or GPU. Driver issues (CUDA/ROCm) are the most common cause of silent CPU fallback.http://localhost:11434/v1

to swap in local models with minimal code changes.nomic-embed-text

) with a chat model to build a local RAG pipeline with zero API cost.That's the whole loop: install, pull, run, integrate. Everything else is just picking the right model for the job.

source & further reading

dev.to — original article Nyra – Memory safety of Rust with the simplicity of JS A Design Document vs a Design Chain AI Agents Won’t Replace Humans — But a Bad Agent Can Break Production

~/api · this article 200

$curl api.wpnews.pro/v1/news/getting-started-with-oll…

Read original on dev.to → dev.to/mohitkumar4/getting-started-with-ollama-r…

mentioned entities

Ollama

llama.cpp

Llama 3.2

Qwen

DeepSeek

Gemma

Nomic

NVIDIA

metadata

sluggetting-started-with-ollama-run-llms-locally-in-10-minutes

topic#large-language-models

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevBuilding an AI Agent That Respon…

next →How I Debugged and Fixed Memory …

── more in #large-language-models 4 stories · sorted by recency

dev.to · 28 Jun · #large-language-models

DeepSeek's DSpark Brings Speculative Decoding Back Into the Spotlight — Here's What Developers Need to Know

dev.to · 27 Jun · #large-language-models

Local AI - How to Run Open Source AI Models Locally

github.com · 27 Jun · #large-language-models

Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache

github.com · 27 Jun · #large-language-models

GitHub DeepSeek-AI/DeepSpec

── more on @ollama 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required