Why AI can solve hard math problems but can't count

wpnews.pro

cd /news/large-language-models/why-ai-can-solve-hard-math-problems-… · home › topics › large-language-models › article

[ARTICLE · art-20256] src=theargumentmag.com ↗ pub=2026-06-03T10:00Z topic=large-language-models verified=true sentiment=· neutral

Why AI can solve hard math problems but can't count

Large language models continue to fail at simple letter-counting tasks, such as identifying the number of R's in "strawberry" or P's in "Google," despite making rapid advances in complex scientific reasoning and agentic coding. The persistent failure, which has become a canonical example of AI limitations since 2024, challenges the common explanation that tokenization—the process of breaking words into subword units—is the root cause. This discrepancy raises questions about fundamental gaps in how these models process language, even as they are increasingly deployed in critical business sectors like finance.

read2 min views15 publishedJun 3, 2026

AI keeps whiffing on this simple question

There are no P’s in the word “Google,” but someone still needs to tell Gemini this — Google’s own AI Overview keeps suggesting there are P’s when asked.

Back in 2024, asking a model to count the R’s in “strawberry” became the internet’s canonical example of weird AI failure modes. Most models of that generation counted letters wrong about half of the time.

Since then, LLMs have made rapid leaps in everything from scientific reasoning to agentic coding, while major businesses have begun to rely on them in critical areas like finance. And yet, somehow, they still suffer on tasks like counting the R’s in “strawberry,” the P’s in Google, or the N’s in the days of the week, which I asked ChatGPT to do today:

How is it that models can now solve historic math problems but still fail to count letters in a word?

The typical explanation here has to do with “tokenization.” LLMs don’t read English characters; they break words into subwords, like “st -raw - berry.” Each segment of the word is a token, which is used because it’s the most efficient unit of language to compute (compared to the vast library of whole words). Under this theory, the counting issue has to do with how familiar an LLM is with a given token from its training data.

That explanation never quite made sense to me. “Strawberry” is a very common word. Even if it were tokenized several different ways within the training data (“st -raw - berry” and “straw - be -rry”), its tokens would still be quite familiar.

Keep reading with a 7-day free trial #

Subscribe to The Argument to keep reading this post and get 7 days of free access to the full post archives.

source & further reading

theargumentmag.com — original article Yes, you can trick AI into exonerating you It's not good but I want it Top 5 reasons to hate Joe Biden

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-ai-can-solve-hard-ma…

Read original on theargumentmag.com → www.theargumentmag.com/p/why-ai-can-solve-hard-m…

mentioned entities

Gemini

ChatGPT

Google

Claude

metadata

slugwhy-ai-can-solve-hard-math-problems-but-can-t-count

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaltheargumentmag.com

navigation

← prevI spent a week on regex before r…

next →OpenAI is going public as the wo…

── more in #large-language-models 4 stories · sorted by recency

dissenter.com · 20 Jul · #large-language-models

Schmitt Leads Bipartisan Push to Break Google Search Monopoly

the-decoder.com · 20 Jul · #large-language-models

Google's "Frozen v2" chip reportedly bakes Gemini's architecture directly into silicon for efficiency gains

androidauthority.com · 20 Jul · #large-language-models

You’ll soon be able to ask Gemini about your Chrome tab groups

imbue.com · 20 Jul · #large-language-models

We used Gemma to detect AI text locally on iPhone

── more on @gemini 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 28 May · #ai-tools

Grok Build introduces /remember command for persistent context across coding sessions

wpnews · 19 Jul · #large-language-models

Claude Fable 5 Developer Guide: API, Pricing, Refusals

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required