# One Ruler to Measure Them All: How Language Affects LLM Quality

> Source: <https://dev.to/__2ddbae6bb7d/one-ruler-to-measure-them-all-how-language-affects-llm-quality-5f54>
> Published: 2026-05-29 06:01:46+00:00

Most discussions about LLM performance focus on the model architecture and prompting. But there's a hidden factor: the tokenizer. It determines how much of your text fits in the context window.

Russian text consumes more tokens than English for the same information density. Some developers even switch to English prompts to save tokens and improve performance.

A recent arxiv study benchmarked multilingual long-context language models across different languages. The winner? **Polish** — 88% accuracy.

**Russian placed 5th at 84%** — ahead of English at 83.9%.

The gap widens on long-context tasks. More tokens = more opportunities for the model to lose coherence.

The test used "weaker" models by 2026 standards:

Top-tier models might show different patterns, but the tokenizer effect persists regardless of model quality.

I'm monitoring whether newer models (Kimi k2.5, GLM-5, GPT-5.2 series) show the same pattern. Early signs suggest top-tier models compress better across languages, but the gap doesn't fully disappear.

*More multilingual LLM analysis and production AI notes from inside a bank — follow my Telegram channel:*

** https://t.me/ai_tablet** (Russian, technical)

*More AI engineering notes, RAG benchmarks, and production insights from inside a bank — follow my Telegram channel:*

🚀 ** https://t.me/ai_tablet** (Russian, technical)
