cd /news/ai-research/how-hard-is-your-idea-strength-is-wh… · home topics ai-research article
[ARTICLE · art-37806] src=dev.to ↗ pub= topic=ai-research verified=true sentiment=· neutral

How hard is your idea? Strength is what survives an honest attempt to break it

A developer argues that the strength of an idea is measured by how well it survives deliberate attempts to break it, drawing on Karl Popper's concept of corroboration and Deborah Mayo's severity principle. The post distinguishes between skeptics, who propose testable conditions that can strengthen a claim, and haters, who attack conclusions without offering tests. It advises against using absolute language like 'impossible' and instead recommends stating positional limitations.

read5 min views1 publishedJun 24, 2026

TL;DR. A hypothesis is strong only to the degree it has survived deliberate, competent attempts to break it. Call that its hardness. Confirmation cannot establish a claim, since one counterexample can refute it while no amount of agreement ever finishes proving it, so strength has to be read off the breaking side. This restates Popper's corroboration (a claim earns standing by surviving refutation and is never verified) and Mayo's severity (a test counts only if it would very probably have caught the error, had one been present).

Here is a trap that gets easier to fall into every year. You have a claim. You check it against a few sources, or you ask a couple of capable assistants, and they agree. It reads cleanly. It sounds settled. And none of that means the claim is any good, because agreement is cheap and you never put the claim at risk.

The deeper you work by synthesis, pulling many sources and instruments together, often with AI in the loop, the sharper this trap gets. A good synthesis is coherent by design, and coherence feels like strength. So you need a way to tell a claim that has earned its standing from one that merely sounds finished. The way is to stop asking how settled a claim looks and start asking how hard it is to break.

Testing is lopsided. One solid counterexample can sink a general claim, while no amount of agreement ever finishes proving it. That sounds like bad news, and it is actually the whole method: since you can never finish confirming, you measure strength by how hard you have tried to break the thing and failed.

Philosophy of science has two precise versions of this. Karl Popper called it corroboration: a claim earns standing by surviving serious attempts to refute it, and it never graduates to "proven," it only ever reaches "has held so far under hard testing." Deborah Mayo sharpened the bar with severity: a claim is well tested only to the degree it passed a check that very probably would have caught the error, if there were one. Passing a weak check, one that would have waved the error through, tells you almost nothing.

Put those together and you get a single, usable word. Call a claim hard to the degree it has resisted competent attempts to break it. A claim that was restated, agreed with, or fitted to numbers you already had is not hard at all, no matter how confident it sounds. Hardness is bought only by passing tests the claim could have failed.

Once you measure strength this way, two kinds of attack on a claim stop looking alike, and the difference has nothing to do with how polite either one is.

The skeptic cuts toward a test. They tell you the condition under which your claim would fail, and that condition can be checked. Run it. If the claim falls, good, it deserved to. If it survives, it is now harder than it was an hour ago, because a real chance to break it has been spent and it held. A skeptic leaves you better off either way.

The hater cuts toward the conclusion. They want the thing to collapse, and they hand you no test it could have passed or failed. Survive a hater and you have gained nothing, because there was no test to survive. So the question to put to any objection is plain: what test does this propose, and could my claim have failed it? If there is no answer, you are looking at rhetoric in skeptic's clothing, and you can set it aside without guilt.

This also tells you how to be useful to other people's ideas. Offering a way the claim could fail, with a way to check it, is a gift. Swinging at the conclusion is noise. The first builds, the second only knocks down.

There is one more habit that hardness discipline forces, and it is about language. When you get stuck, it is tempting to write the stall down as a fact about the world: this is impossible, this is a wall. But "impossible" is one of the strongest claims you can make, and being tired is not evidence for it.

The honest version is positional: "I do not see a way through from here." That keeps the limit where it actually sits, in your current vantage, which can move with a new method, a fresh instrument, or a better question. A wall declared from exhaustion is just an untested guess wearing the costume of a result. This matters most in exactly the setting that tempts the trap at the top: when you can always ask one more assistant and get one more confident answer, the difference between a charted limit and a tired one is the only thing keeping you honest.

A severe test is one your idea could have flunked. When general relativity predicted that starlight passing the Sun would bend by about double what older physics allowed, the 1919 eclipse could have come back with the smaller, older number, and the theory would have been in real trouble. It came back the other way. The test counted because failure was on the table.

A non-test is one nothing could have flunked. Tuning knobs until your model matches data you already hold cannot fail by construction, so it buys no hardness. The version of this that bites synthetic work is agreement among assistants that were trained alike: that can be one test wearing several hats, and treating it as several independent confirmations is the mistake to watch for.

So the prediction, and the bet, is this: sort your claims by how many real attempts to break them they have survived, and separately by how many soft confirmations they have collected. The first sort will track which ones hold up later. The second will not. If soft confirmations predicted durability just as well, hardness would be an empty word, and you could go back to counting nods. I do not think you can.

── more in #ai-research 4 stories · sorted by recency
── more on @karl popper 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-hard-is-your-ide…] indexed:0 read:5min 2026-06-24 ·