{"slug": "why-ai-can-solve-hard-math-problems-but-can-t-count", "title": "Why AI can solve hard math problems but can't count", "summary": "Large language models continue to fail at simple letter-counting tasks, such as identifying the number of R's in \"strawberry\" or P's in \"Google,\" despite making rapid advances in complex scientific reasoning and agentic coding. The persistent failure, which has become a canonical example of AI limitations since 2024, challenges the common explanation that tokenization—the process of breaking words into subword units—is the root cause. This discrepancy raises questions about fundamental gaps in how these models process language, even as they are increasingly deployed in critical business sectors like finance.", "body_md": "# Why AI can solve hard math problems but can't count\n\n### AI keeps whiffing on this simple question\n\nThere are no P’s in the word “Google,” but someone still needs to tell Gemini this — Google’s own AI Overview keeps suggesting there are P’s when asked.\n\nBack in 2024, asking a model to count the R’s in “strawberry” became the internet’s [canonical](https://www.reddit.com/r/singularity/comments/1enqk04/how_many_rs_in_strawberry_why_is_this_a_very/) [example](https://medium.com/@danisaysskol/breaking-down-llm-thought-process-the-strawberry-question-bdc564cc77a4) of weird AI failure modes. Most models of that generation counted letters [wrong](https://arxiv.org/pdf/2412.18626) about half of the time.\n\nSince then, LLMs [have made](https://metr.org/time-horizons/) rapid [leaps](https://artificialanalysis.ai/articles/claude-opus-4-8-analysis-and-benchmarks) in everything from scientific reasoning to agentic coding, while major businesses have begun to rely on them in critical areas like [finance](https://www.cfo.com/news/inside-anthropic-claude-rapid-expansion-across-corporate-finance-cfo-/820806/). And yet, somehow, they still suffer on tasks like counting the R’s in “strawberry,” the P’s in Google, or the N’s in the days of the week, which I asked ChatGPT to do today:\n\nHow is it that models can now solve [historic math problems](https://www.understandingai.org/p/openais-milestone-math-breakthrough) but still fail to count letters in a word?\n\nThe [typical](https://www.runpod.io/blog/llm-tokenization-limitations) explanation here has to do with “[tokenization](https://huggingface.co/docs/transformers/en/main_classes/tokenizer).” LLMs don’t read English characters; they break words into subwords, like “st -raw - berry.” Each segment of the word is a token, which is used because it’s the most efficient unit of language to compute (compared to the vast library of whole words). Under this theory, the counting issue has to do with how familiar an LLM is with a given token from its training data.\n\nThat explanation never quite made sense to me. “Strawberry” is a very common word. Even if it were tokenized several different ways within the training data (“st -raw - berry” and “straw - be -rry”), its tokens would still be quite familiar.\n\n## Keep reading with a 7-day free trial\n\nSubscribe to The Argument to keep reading this post and get 7 days of free access to the full post archives.", "url": "https://wpnews.pro/news/why-ai-can-solve-hard-math-problems-but-can-t-count", "canonical_source": "https://www.theargumentmag.com/p/why-ai-can-solve-hard-math-problems", "published_at": "2026-06-03 10:00:49+00:00", "updated_at": "2026-06-03 10:38:01.558454+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-research", "ai-products", "ai-tools"], "entities": ["Gemini", "ChatGPT", "Google", "Claude"], "alternates": {"html": "https://wpnews.pro/news/why-ai-can-solve-hard-math-problems-but-can-t-count", "markdown": "https://wpnews.pro/news/why-ai-can-solve-hard-math-problems-but-can-t-count.md", "text": "https://wpnews.pro/news/why-ai-can-solve-hard-math-problems-but-can-t-count.txt", "jsonld": "https://wpnews.pro/news/why-ai-can-solve-hard-math-problems-but-can-t-count.jsonld"}}