04:00
2026-06-16
arxiv.org
large-language-models
Equity with Efficiency: An Empirical Study of Tokenizers for Multilingual Large Language Models
A new empirical study systematically compares tokenizers for multilingual large language models across 11 Southeast Asian languages, finding that Parity-aware BPE achieves the best balance of compressβ¦