{"slug": "critics-highlight-ai-failures-on-simple-tasks", "title": "Critics Highlight AI Failures on Simple Tasks", "summary": "A peer-reviewed PNAS Nexus study found that leading large language models, including GPT-4o, Claude 3.5 Sonnet, GPT-5, Claude Opus 4.1, and Gemini 2.5, fail catastrophically on simple cognitive tasks like the color Stroop test, with accuracy dropping to near-zero under extended cognitive load. Critics cite these failures to challenge claims that artificial general intelligence has been achieved.", "body_md": "A peer-reviewed PNAS Nexus study (Patel, Wang, and Fan, CUNY) documents a structural gap in transformer architecture that practitioners should understand: LLMs lack the hard top-down inhibitory mechanism needed to suppress strongly trained priors under extended cognitive load. The study used the color Stroop task - naming ink color when word text conflicts with that color - to measure executive control. GPT-4o held 91 percent accuracy at five incongruent words, then collapsed to near-zero (approximately 1 percent per researcher quotes to PsyPost, or 15 percent in the pure-incongruent condition per Neuroscience News) by 40 words; Claude 3.5 Sonnet dropped to roughly 10-24 percent at 40 words depending on condition. The same catastrophic failure replicated on frontier models GPT-5, Claude Opus 4.1, and Gemini 2.5. A separately viral \"carwash prompt\" - where ChatGPT gives opposite walk-or-drive answers to near-identical questions about a 100-meter trip - illustrates the same surface phenomenon informally. A WND/RealClearWire opinion piece by Ross Pomeroy used these examples to dispute Marc Andreessen's claim that AGI is already here.", "url": "https://wpnews.pro/news/critics-highlight-ai-failures-on-simple-tasks", "canonical_source": "https://letsdatascience.com/news/critics-highlight-ai-failures-on-simple-tasks-c7d60ac9", "published_at": "2026-06-27 18:12:30+00:00", "updated_at": "2026-06-27 19:38:36.178092+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-research", "ai-safety"], "entities": ["GPT-4o", "Claude 3.5 Sonnet", "GPT-5", "Claude Opus 4.1", "Gemini 2.5", "CUNY", "Marc Andreessen", "Ross Pomeroy"], "alternates": {"html": "https://wpnews.pro/news/critics-highlight-ai-failures-on-simple-tasks", "markdown": "https://wpnews.pro/news/critics-highlight-ai-failures-on-simple-tasks.md", "text": "https://wpnews.pro/news/critics-highlight-ai-failures-on-simple-tasks.txt", "jsonld": "https://wpnews.pro/news/critics-highlight-ai-failures-on-simple-tasks.jsonld"}}