cd /news/artificial-intelligence/how-well-does-current-ai-find-errors… · home topics artificial-intelligence article
[ARTICLE · art-23850] src=marginalrevolution.com ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

How well does current AI find errors in economics papers?

A new study by Alexis Akira Toda tested four AI models—Gemini, Refine, Claude, and ChatGPT—on their ability to detect errors in published economics papers, with ChatGPT Pro performing best but still failing to identify any error without substantial human guidance. The findings suggest that while a human paired with a frontier model can outperform current peer review, AI alone cannot yet refute economic theory, raising questions about its reliability in academic verification.

read1 min publishedJun 9, 2026

Can artificial intelligence (AI) refute economic theory? I document experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory, each containing an error that I helped identify or correct. ChatGPT Pro performed best, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. I argue that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.

That is from a new piece by Alexis Akira Toda.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-well-does-curren…] indexed:0 read:1min 2026-06-09 ·