Critics Highlight AI Failures on Simple Tasks

wpnews.pro

cd /news/large-language-models/critics-highlight-ai-failures-on-sim… · home › topics › large-language-models › article

[ARTICLE · art-42012] src=letsdatascience.com ↗ pub=2026-06-27T18:12Z topic=large-language-models verified=true sentiment=↓ negative

Critics Highlight AI Failures on Simple Tasks

A peer-reviewed PNAS Nexus study found that leading large language models, including GPT-4o, Claude 3.5 Sonnet, GPT-5, Claude Opus 4.1, and Gemini 2.5, fail catastrophically on simple cognitive tasks like the color Stroop test, with accuracy dropping to near-zero under extended cognitive load. Critics cite these failures to challenge claims that artificial general intelligence has been achieved.

read1 min views1 publishedJun 27, 2026

A peer-reviewed PNAS Nexus study (Patel, Wang, and Fan, CUNY) documents a structural gap in transformer architecture that practitioners should understand: LLMs lack the hard top-down inhibitory mechanism needed to suppress strongly trained priors under extended cognitive load. The study used the color Stroop task - naming ink color when word text conflicts with that color - to measure executive control. GPT-4o held 91 percent accuracy at five incongruent words, then collapsed to near-zero (approximately 1 percent per researcher quotes to PsyPost, or 15 percent in the pure-incongruent condition per Neuroscience News) by 40 words; Claude 3.5 Sonnet dropped to roughly 10-24 percent at 40 words depending on condition. The same catastrophic failure replicated on frontier models GPT-5, Claude Opus 4.1, and Gemini 2.5. A separately viral "carwash prompt" - where ChatGPT gives opposite walk-or-drive answers to near-identical questions about a 100-meter trip - illustrates the same surface phenomenon informally. A WND/RealClearWire opinion piece by Ross Pomeroy used these examples to dispute Marc Andreessen's claim that AGI is already here.

source & further reading

letsdatascience.com — original article Pax Silica pact excludes China, empowers defense firms Michael Burry buys Microsoft LEAP calls targeting 2028 gains NVIDIA Appears in Brain-Computer Interface Stock Lists

~/api · this article 200

$curl api.wpnews.pro/v1/news/critics-highlight-ai-fai…

Read original on letsdatascience.com → letsdatascience.com/news/critics-highlight-ai-fa…

mentioned entities

GPT-4o

Claude 3.5 Sonnet

GPT-5

Claude Opus 4.1

Gemini 2.5

CUNY

Marc Andreessen

Ross Pomeroy

metadata

slugcritics-highlight-ai-failures-on-simple-tasks

topic#large-language-models

secondary3 topics

sentimentnegative

canonicalletsdatascience.com

navigation

← prevOne Bee Can't Make Honey: A Guid…

next →Expect Claude Fable 5 to Be Turn…

── more in #large-language-models 4 stories · sorted by recency

scitechdaily.com · 22 Jun · #large-language-models

The AI architecture "attention" can't hold attention

eido-askayo.blogspot.com · 27 Jun · #large-language-models

GPT-5.6 Sol and Claude Mythos Show That the AI Race Has Reached a New Level

the-decoder.com · 27 Jun · #large-language-models

Anthropic's Fable 5 could return within days as Trump administration prepares to lift restrictions

netmeister.org · 27 Jun · #large-language-models

Post-Quantum Certificates

── more on @gpt-4o 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required