Local Inference

wpnews.pro

cd /news/large-language-models/local-inference · home › topics › large-language-models › article

[ARTICLE · art-35564] src=av.codes ↗ pub=2026-06-21T12:26Z topic=large-language-models verified=true sentiment=↓ negative

Local Inference

A user describes the frustrations and hidden costs of running large language models locally, including waking up at 2 a.m. to kill a runaway process, wincing at electricity bills, and dealing with broken updates. The piece portrays local inference as a quiet, labor-intensive commitment that offers privacy and independence from cloud services.

read2 min views1 publishedJun 21, 2026

I woke up at 2 a.m. because the fans were screaming. The sound was different from the soft whoosh of a GPU under load. It was the high-pitched panic of a model that had swallowed its own context and was eating the swap file. I sat up for a second, then walked to the other room and killed the process. The house was quiet again. I went back to bed angry but not surprised. That is local inference now. The romance of rebellion against the cloud wore off long ago, now it's mostly quiet labor of checking on a sick animal at night.

For a couple of years it was a hobby. I downloaded weights the way some people "download" vinyl. I ran quants I barely understood and felt clever when a reply came back. Then agents arrived, and the model stopped being a toy I visit. It became the room my work lives in. OpenCode drafts here and Hermes listens here. When it breaks, the damage is immediate, my notes stop, my drafts stall, ideas do not validate themselves, and I have to decide whether to fix it now or in the morning. I do not think about the tools when things are good. I think about llama.cpp or a container frontend when I am staring at wrong quant at midnight, or when an update breaks two years of chat history. They are the broom and the bucket I use to clean up the mess.

Last week the electricity bill arrived and I winced before opening it. I sometimes only notice a silent break when replies get dumb or when the bill is higher than expected.

I keep it alive because the alternative is renting intelligence that forgets where it lives. My rig is slow and loud. The answers are sometimes dumb enough to laugh at (but mostly to be angry). When I type at 3 a.m. and the answer comes back in my own room, no one else is there. That is local inference in 2026, a room that is kept warm with a machine thinking. The work is invisible and the reward is smaller than I admit, but I keep the room warm anyway.

source & further reading

av.codes — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/local-inference

Read original on av.codes → av.codes/blog/on-local-inference/

mentioned entities

llama.cpp

OpenCode

Hermes

metadata

sluglocal-inference

topic#large-language-models

secondary2 topics

sentimentnegative

canonicalav.codes

navigation

← prevTokoscope – Automatic LLM token …

next →AI Made Me Braver

── more in #large-language-models 4 stories · sorted by recency

vettedconsumer.com · 21 Jun · #large-language-models

Show HN: Local LLM Hardware Calculator

github.com · 21 Jun · #large-language-models

Agentic Systems Course: Learn AI Agents with an AI Coding Agent

vettedconsumer.com · 21 Jun · #large-language-models

Three RTX 3060s vs One RTX 3090 for Local AI: What a $1,500 Build Actually Measured

github.com · 20 Jun · #large-language-models

Show HN: Maccha – Cross Agent Brain for Antigravity, Claude Code, OpenCode etc.

── more on @llama.cpp 3 stories trending now

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #artificial-intelligence

Microsoft is rewriting the economics of enterprise AI and the bill shock is just getting started

wpnews · 20 Jun · #artificial-intelligence

Big Tech redirects buybacks into AI capital spending

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required