# The Decision Subtraction Framework: How to Evaluate Any AI Tool

> Source: <https://dev.to/harryfloyd/the-decision-subtraction-framework-how-to-evaluate-any-ai-tool-1o1l>
> Published: 2026-05-28 10:39:34+00:00

Last week someone asked me which AI tools they should be using. The question hides a problem that costs real money: there are more capable AI tools available than any single person can evaluate.

ChatGPT Plus at $20/month. Claude at $20. Grok at $30. Cursor at $20. Copilot at $10. Each with a $100, $200, or $300 variant underneath. Each claims to earn its place.

The real question is not which tool is best. The real question is: which tools subtract more decisions than they add?

**Formula:** decisions replaced by the tool ÷ decisions it creates.

List every decision the tool makes for you. Then list every new decision it forces you to make. Divide the first by the second.

**Thresholds:**

**Example:** A code completion tool that writes a function body (replaces 5 decisions about syntax, structure, naming) but requires review (adds 2 decisions about correctness) has a ratio of 2.5. It passes.

A meeting summariser that replaces 1 decision (should I re-listen?) but creates 3 (verify accuracy, add context, decide what to share) has a ratio of 0.33. It fails.

**Formula:** time without the tool ÷ time with the tool.

Include onboarding time amortised over your first 10 uses. A tool that saves 30 minutes per use but took 2 hours to learn breaks even at 4 uses. After that, it is pure gain.

**Threshold:** Break-even within 5 uses.

**Catch:** This lens breaks for tools that enable tasks you could not do at all before. A drug discovery simulation has infinite Friction Delta because the alternative is impossible. Score those as "can't evaluate on this lens" and rely on the others.

**Formula:** output quality ÷ attention consumed.

Estimate cognitive load per use on a simple scale: 1 (fire and forget) to 4 (full attention required). Track whether it goes up or down over 10 uses.

**Threshold:** Attention per use should decrease over time. If you need to watch it more closely after ten uses than after one, something is wrong.

I tested this framework against the hardest cases I could find. It failed in five ways. Knowing them makes it useful:

**Decision quality matters more than quantity.** One high-stakes judgment (should I deploy?) outweighs 10 trivial picks (camelCase or snake_case?). Weight strategically.

**Friction Delta can't measure capability expansion.** If a tool lets you do something new rather than just faster, skip this lens.

**Attention ROI rewards deskilling.** The descending attention threshold is a Goodhart target — it rewards tools that train you to rubber-stamp.

**Erasure cost is invisible.** The framework never asks: if I use this for a year, what can I no longer do without it?

**Error asymmetry is invisible.** Two tools can score identically while producing catastrophically different results when they fail.

Ask: "If I use this tool for six months and then stop, what skill will I have lost?"

Score it: 1 (nothing lost) to 4 (core competency outsourced). Score 1-2 is safe. Score 3 is a deliberate trade. Score 4 is dependency, not tooling.

*This framework connects to a deeper structural principle: a tool's value is the difficulty it removes. If it creates new difficulty of a different kind, it is not a tool. It is a job.*

*Full framework with diagram: https://telegra.ph/The-Decision-Subtraction-Framework-How-to-Evaluate-Any-AI-Tool-05-28*
