Show HN: Local LLM Hardware Calculator

wpnews.pro

cd /news/large-language-models/show-hn-local-llm-hardware-calculato… · home › topics › large-language-models › article

[ARTICLE · art-35543] src=vettedconsumer.com ↗ pub=2026-06-21T11:47Z topic=large-language-models verified=true sentiment=· neutral

Show HN: Local LLM Hardware Calculator

A new Local LLM Hardware Calculator helps users estimate memory requirements for running large language models on their own hardware, factoring in weights, KV cache, and overhead. The tool also compares buying hardware versus renting cloud GPUs or using APIs, and provides guidance on quantization and model fit.

read2 min views1 publishedJun 21, 2026

Show HN: Local LLM Hardware Calculator — Image: Vettedconsumer (auto-discovered)

📎 Run a site or newsletter? Use the Cite or Embed buttons just above to link to this tool or embed the live version on your own page, free, no signup, just keep the credit.

One step earlier: not sure you should buy hardware at all? Our cost calculator compares buying vs renting cloud GPUs vs just paying for an API, with break-even math for your usage.

Two ways to use it: leave "Your machine" empty to shop across everything we track, or pick the hardware you already own (or enter its memory) to get a personal verdict, including, when it doesn't fit, the exact quant, context, or KV-cache change that would make it fit.

How the estimate works #

The tool uses the same math from our guides, shown in the open because that's the point of this site. A model's memory cost has three parts:

Weights, parameters × bits-per-weight ÷ 8. A 70B model at Q4_K_M (~4.8 bits/weight) is about 42 GB. Quantization choices are covered in ourplain-English quantization guide.KV cache, grows with every token of context. We assume a GQA-typical attention shape and an FP16 cache; the KV-precision selector in the tool shows exactly what a Q8 or Q4 cache saves. Full math inThe KV cache, explained.Overhead, a flat ~1.5 GB buffer for the runtime and activations.

For Mixture-of-Experts models, memory follows total parameters but speed follows active parameters, that's why a 120B MoE can be fast on a box that would crawl on a dense 70B. The one-line rule: buy memory for the total, expect speed from the active (MoE, explained). The "gen ceiling" column is memory bandwidth ÷ bytes streamed per token, a theoretical upper bound from the fact that token generation is bandwidth-bound, not compute-bound (why that is). Real speeds come in below it.

Honest limits #

These are estimates, not lab measurements. Real usage varies by runtime (llama.cpp vs vLLM vs MLX), KV-cache precision, batch settings, and model architecture. Unified-memory machines share RAM with the OS, so we subtract an 8 GB reserve; discrete GPUs lose ~1 GB to the desktop. When a result says "tight fit," believe it, within 10% of capacity means long context or background apps will push you over. Hardware listings come from our methodology; affiliate links never influence what appears or how it ranks.

source & further reading

vettedconsumer.com — original article Three RTX 3060s vs One RTX 3090 for Local AI: What a $1,500 Build Actually Measured Qwen3-30B-A3B: The Open Model Most People Should Actually Run RAG on a Local LLM, Explained: Give Your Model Your Documents Without Drowning in Context

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-local-llm-hardwa…

Read original on vettedconsumer.com → vettedconsumer.com/can-i-run-it/

mentioned entities

llama.cpp

vLLM

MLX

Vetted Consumer

metadata

slugshow-hn-local-llm-hardware-calculator

topic#large-language-models

secondary2 topics

sentimentneutral

canonicalvettedconsumer.com

navigation

← prevHow the Bay Area’s barbecue roya…

next →Show HN: An agentic control pane…

── more in #large-language-models 4 stories · sorted by recency

discuss.huggingface.co · 19 Jun · #large-language-models

Gemma 4 bug fixes and Research Request

dev.to · 21 Jun · #large-language-models

AMD ATOM + ATOMesh: Prefill/decode Disaggregation on ROCm

startupfortune.com · 21 Jun · #large-language-models

Nvidia's stock boom is quietly minting a new generation of AI startup founders

dev.to · 21 Jun · #large-language-models

The CTO Playbook for AI Agent Data Analysis on a Budget

── more on @llama.cpp 3 stories trending now

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #artificial-intelligence

Microsoft is rewriting the economics of enterprise AI and the bill shock is just getting started

wpnews · 20 Jun · #artificial-intelligence

Big Tech redirects buybacks into AI capital spending

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required