cd /news/large-language-models/critics-highlight-ai-failures-on-sim… · home topics large-language-models article
[ARTICLE · art-42012] src=letsdatascience.com ↗ pub= topic=large-language-models verified=true sentiment=↓ negative

Critics Highlight AI Failures on Simple Tasks

A peer-reviewed PNAS Nexus study found that leading large language models, including GPT-4o, Claude 3.5 Sonnet, GPT-5, Claude Opus 4.1, and Gemini 2.5, fail catastrophically on simple cognitive tasks like the color Stroop test, with accuracy dropping to near-zero under extended cognitive load. Critics cite these failures to challenge claims that artificial general intelligence has been achieved.

read1 min views1 publishedJun 27, 2026

A peer-reviewed PNAS Nexus study (Patel, Wang, and Fan, CUNY) documents a structural gap in transformer architecture that practitioners should understand: LLMs lack the hard top-down inhibitory mechanism needed to suppress strongly trained priors under extended cognitive load. The study used the color Stroop task - naming ink color when word text conflicts with that color - to measure executive control. GPT-4o held 91 percent accuracy at five incongruent words, then collapsed to near-zero (approximately 1 percent per researcher quotes to PsyPost, or 15 percent in the pure-incongruent condition per Neuroscience News) by 40 words; Claude 3.5 Sonnet dropped to roughly 10-24 percent at 40 words depending on condition. The same catastrophic failure replicated on frontier models GPT-5, Claude Opus 4.1, and Gemini 2.5. A separately viral "carwash prompt" - where ChatGPT gives opposite walk-or-drive answers to near-identical questions about a 100-meter trip - illustrates the same surface phenomenon informally. A WND/RealClearWire opinion piece by Ross Pomeroy used these examples to dispute Marc Andreessen's claim that AGI is already here.

── more in #large-language-models 4 stories · sorted by recency
── more on @gpt-4o 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/critics-highlight-ai…] indexed:0 read:1min 2026-06-27 ·