Ask HN: How do we measure software in LLM era?

wpnews.pro

cd /news/large-language-models/ask-hn-how-do-we-measure-software-in… · home › topics › large-language-models › article

[ARTICLE · art-41742] src=news.ycombinator.com ↗ pub=2026-06-27T10:18Z topic=large-language-models verified=true sentiment=↓ negative

Ask HN: How do we measure software in LLM era?

A Hacker News user laments the difficulty of measuring software quality in the LLM era, citing high variability in metrics like accuracy, cost, and latency even with the same model and provider. The post questions how developers can triage issues or assure users when probabilistic AI components introduce unpredictable behavior.

read1 min views1 publishedJun 27, 2026

A bit of a rant. Sorry!

With the probablistic pluggable 'brain' existing in parts of the solution how are you measuring anything is better or worse?

I am at a loss to quantify whether anything is improving or worsening anything. It probably is also because of the various metrics that keeps popping up

* Accuracy

* Cost of running

* Context

* Size

* Time

* Turns

all these vary in a large band even with the same 'brain' on the same 'provider'. It is not so different than a database running strained under load - drawing from a simpler times. But here, which elastiuc band is getting pulled in which direction is worse than playing 3D Tetris.

Then there is the harness side variability of tool choices. Which seems to be the only knob the developer these days seems to have some control over. Other than the deterministic parts of the system.

How are we even going to triage a ticket with so many variabilities. In a runtime. That apparently is still called a software.

Do we just tell the users that you are on your own and whatever you need to solve is between you and your brain of choice?

What are you doing?

Comments URL: [https://news.ycombinator.com/item?id=48696916](https://news.ycombinator.com/item?id=48696916)

Points: 1

source & further reading

news.ycombinator.com — original article What GUI or desktop app do you use to keep track of different AI sessions? Apple is skipping M6 Pro and Max to focus on M7 chips Blue print to let machines think like humans

── more in #large-language-models 4 stories · sorted by recency

dev.to · 27 Jun · #large-language-models

Cheap AI tokens need request-level receipts

dev.to · 27 Jun · #large-language-models

Structured Outputs: How We Stopped Parsing LLM Responses by Hand

dev.to · 27 Jun · #large-language-models

Before you sell an AI connector, map the trust boundary

zdnet.com · 27 Jun · #large-language-models

13 Google Photos settings I always change on every new device - and why

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required