cd /news/large-language-models/why-i-run-ai-locally-instead-of-usin… · home topics large-language-models article
[ARTICLE · art-37810] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Why I Run AI Locally Instead of Using ChatGPT for Client Work

A developer explains why they run AI locally using Ollama instead of cloud-based ChatGPT for client work involving sensitive data. The approach keeps data on local machines, addressing compliance concerns for law firms, accounting practices, and medical offices. The developer has deployed local AI solutions for client intake summaries, document Q&A, and proprietary process assistants.

read4 min views5 publishedJun 24, 2026

Let me start with a question my clients ask me a lot:

"Can't we just use ChatGPT for this?"

My answer is always the same: it depends on what "this" is.

When "this" involves client intake forms for a law firm, tax documents for an accounting practice, or patient records for a medical office — the answer is no. And once I explain why, they always get it.

This post is about that explanation, and the toolchain I actually use instead.

When you send a prompt to ChatGPT or Claude via the API, that data leaves your network. It travels to a third-party server, gets processed, and comes back. The companies have policies about how they handle it — and you should read them — but the fundamental truth is: you handed your client's sensitive information to someone else.

For a lot of use cases, that's totally fine. Write me a landing page? Sure, use whatever.

But when the prompt contains:

...you're in a different conversation. One that involves client trust, potential legal exposure, and in some industries, real regulatory obligations. HIPAA doesn't care that the AI gave a good answer.

Ollama is the cleanest tool I've found for running large language models locally. It runs on Mac, Linux, and Windows, wraps model management into a simple CLI, and exposes a local REST API. That API is compatible with the OpenAI format — which means most integrations you'd build against ChatGPT work against Ollama with one line changed.

Getting started takes about five minutes:

curl -fsSL https://ollama.com/install.sh | sh

ollama pull llama3.2

ollama serve

Once it's running, you have a local API at http://localhost:11434

. No API key. No rate limits. No bill at the end of the month. Here's a basic Python call:

import requests

def ask_local_llm(prompt: str, model: str = "llama3.2") -> str:
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

intake_text = "Client Jane Doe, referred by attorney Martinez, is seeking..."

summary = ask_local_llm(
    f"Summarize this client intake in 3 concise bullet points:\n\n{intake_text}"
)

print(summary)

That's it. The model runs on the local machine. The data never leaves the building.

Here's where it gets concrete. These are the kinds of deployments I've built:

Law office — client intake summaries

Attorneys were drowning in intake forms and needed quick summaries before consultations. The obvious fix is AI. The blocker: those forms contain PII, case details, and sometimes confidential disclosures that flat-out cannot go to a cloud provider.

Solution: Ollama running on a local machine in their office, a Python script that reads the intake PDF, summarizes it with llama3.2

, and outputs a clean brief. Setup time: half a day. Data never leaves their network.

Accounting firm — document Q&A

Staff needed to locate specific information across large financial documents and past filings quickly. Paired Ollama with a basic RAG (retrieval-augmented generation) pipeline — documents get chunked and embedded locally, queries get answered against the local vector store. The client's financial data stays on their server. As a bonus, it's actually faster than cloud solutions for this use case because there's zero round-trip latency.

Small business — proprietary process assistant

This one was less about compliance and more about competitive advantage. The client had a pricing model they'd refined over ten years. They were not interested in that logic ending up anywhere near a third-party training pipeline. Local deployment was the only acceptable option, full stop.

I'm not going to oversell this. Here's what you give up going local:

Model capabilityllama3.2

is impressive for its size. It is not GPT-4o. For pure reasoning tasks with no sensitivity concerns, the frontier cloud models still have an edge on harder problems.

Hardware requirements — Running a useful model locally needs real resources. I typically recommend at least 16GB of RAM and, ideally, a dedicated GPU. Clients who already have a server are usually fine. Clients on thin hardware turn into a hardware conversation first.

Setup and maintenance overhead — There's no sign-up-and-get-a-key path here. You're managing software, models, and updates. For non-technical clients, that means building something bulletproof or staying on the hook for maintenance.

For the right client, these trade-offs are absolutely worth it.

The clients who care most about local deployment aren't always the most technical. They're often the ones who've been in business long enough to be careful. When I tell them their data stays in-house — no monthly API bill that scales with usage, no third-party terms of service to worry about, they own the whole stack — that lands differently than any feature comparison I could make.

Local AI isn't for everyone. But when the fit is right, it's a genuinely different value proposition than "here's your ChatGPT wrapper with some prompt engineering on top."

If you're building for clients who handle sensitive data, have this conversation before you default to the cloud. You might be surprised how often they've already been thinking about it.

I'm stickytr33 — I build AI integrations, local LLM deployments, and IT infrastructure for small businesses. If this is relevant to what you're working on, find me on GitHub or drop a comment.

── more in #large-language-models 4 stories · sorted by recency
── more on @ollama 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-i-run-ai-locally…] indexed:0 read:4min 2026-06-24 ·