# Critics Highlight AI Failures on Simple Tasks

> Source: <https://letsdatascience.com/news/critics-highlight-ai-failures-on-simple-tasks-c7d60ac9>
> Published: 2026-06-27 18:12:30+00:00

A peer-reviewed PNAS Nexus study (Patel, Wang, and Fan, CUNY) documents a structural gap in transformer architecture that practitioners should understand: LLMs lack the hard top-down inhibitory mechanism needed to suppress strongly trained priors under extended cognitive load. The study used the color Stroop task - naming ink color when word text conflicts with that color - to measure executive control. GPT-4o held 91 percent accuracy at five incongruent words, then collapsed to near-zero (approximately 1 percent per researcher quotes to PsyPost, or 15 percent in the pure-incongruent condition per Neuroscience News) by 40 words; Claude 3.5 Sonnet dropped to roughly 10-24 percent at 40 words depending on condition. The same catastrophic failure replicated on frontier models GPT-5, Claude Opus 4.1, and Gemini 2.5. A separately viral "carwash prompt" - where ChatGPT gives opposite walk-or-drive answers to near-identical questions about a 100-meter trip - illustrates the same surface phenomenon informally. A WND/RealClearWire opinion piece by Ross Pomeroy used these examples to dispute Marc Andreessen's claim that AGI is already here.
