04:00
2026-06-24
arxiv.org
natural-language-processing
QuechuaTok: Morphological Boundary Accuracy as a Necessary Metric for Tokenizer Evaluation in Agglutinative Low-Resource Languages
Researchers introduced QuechuaTok, a benchmark evaluating tokenization strategies for Southern Quechua, a low-resource agglutinative language. They found that BPE achieved the lowest fertility rate buโฆ