A peer-reviewed PNAS Nexus study (Patel, Wang, and Fan, CUNY) documents a structural gap in transformer architecture that practitioners should understand: LLMs lack the hard top-down inhibitory mechanism needed to suppress strongly trained priors under extended cognitive load. The study used the color Stroop task - naming ink color when word text conflicts with that color - to measure executive control. GPT-4o held 91 percent accuracy at five incongruent words, then collapsed to near-zero (approximately 1 percent per researcher quotes to PsyPost, or 15 percent in the pure-incongruent condition per Neuroscience News) by 40 words; Claude 3.5 Sonnet dropped to roughly 10-24 percent at 40 words depending on condition. The same catastrophic failure replicated on frontier models GPT-5, Claude Opus 4.1, and Gemini 2.5. A separately viral "carwash prompt" - where ChatGPT gives opposite walk-or-drive answers to near-identical questions about a 100-meter trip - illustrates the same surface phenomenon informally. A WND/RealClearWire opinion piece by Ross Pomeroy used these examples to dispute Marc Andreessen's claim that AGI is already here.
The AI architecture "attention" can't hold attention