Last week someone asked me which AI tools they should be using. The question hides a problem that costs real money: there are more capable AI tools available than any single person can evaluate.
ChatGPT Plus at $20/month. Claude at $20. Grok at $30. Cursor at $20. Copilot at $10. Each with a $100, $200, or $300 variant underneath. Each claims to earn its place.
The real question is not which tool is best. The real question is: which tools subtract more decisions than they add?
Formula: decisions replaced by the tool ÷ decisions it creates.
List every decision the tool makes for you. Then list every new decision it forces you to make. Divide the first by the second.
Thresholds:
Example: A code completion tool that writes a function body (replaces 5 decisions about syntax, structure, naming) but requires review (adds 2 decisions about correctness) has a ratio of 2.5. It passes.
A meeting summariser that replaces 1 decision (should I re-listen?) but creates 3 (verify accuracy, add context, decide what to share) has a ratio of 0.33. It fails.
Formula: time without the tool ÷ time with the tool.
Include onboarding time amortised over your first 10 uses. A tool that saves 30 minutes per use but took 2 hours to learn breaks even at 4 uses. After that, it is pure gain.
Threshold: Break-even within 5 uses.
Catch: This lens breaks for tools that enable tasks you could not do at all before. A drug discovery simulation has infinite Friction Delta because the alternative is impossible. Score those as "can't evaluate on this lens" and rely on the others.
Formula: output quality ÷ attention consumed.
Estimate cognitive load per use on a simple scale: 1 (fire and forget) to 4 (full attention required). Track whether it goes up or down over 10 uses.
Threshold: Attention per use should decrease over time. If you need to watch it more closely after ten uses than after one, something is wrong.
I tested this framework against the hardest cases I could find. It failed in five ways. Knowing them makes it useful:
Decision quality matters more than quantity. One high-stakes judgment (should I deploy?) outweighs 10 trivial picks (camelCase or snake_case?). Weight strategically.
Friction Delta can't measure capability expansion. If a tool lets you do something new rather than just faster, skip this lens.
Attention ROI rewards deskilling. The descending attention threshold is a Goodhart target — it rewards tools that train you to rubber-stamp.
Erasure cost is invisible. The framework never asks: if I use this for a year, what can I no longer do without it?
Error asymmetry is invisible. Two tools can score identically while producing catastrophically different results when they fail.
Ask: "If I use this tool for six months and then stop, what skill will I have lost?" Score it: 1 (nothing lost) to 4 (core competency outsourced). Score 1-2 is safe. Score 3 is a deliberate trade. Score 4 is dependency, not tooling.
This framework connects to a deeper structural principle: a tool's value is the difficulty it removes. If it creates new difficulty of a different kind, it is not a tool. It is a job.
Full framework with diagram: https://telegra.ph/The-Decision-Subtraction-Framework-How-to-Evaluate-Any-AI-Tool-05-28