{"slug": "neural-machine-translation-for-low-resource-tangkhul-english", "title": "Neural Machine Translation for Low-Resource Tangkhul--English", "summary": "Researchers present a low-resource machine translation system for the Tangkhul-English language pair, achieving a BLEU score of 39.97 using a ByT5-large model fine-tuned on 38,336 parallel sentences. The study highlights orthographic challenges and domain bias in the training corpus, which consists of biblical, story, and conversational data.", "body_md": "arXiv:2606.25365v1 Announce Type: new\nAbstract: We present a study on low-resource machine translation for the Tangkhul-English (nmf-en) language pair. Tangkhul is a severely under-resourced Tibeto-Burman language spoken primarily in Manipur, India, with virtually no prior natural language processing infrastructure. We describe two systems: (1) a primary system based on ByT5-large fine-tuned on 38,336 Tangkhul-English parallel sentence pairs, and (2) a contrastive system based on mT5-small fine-tuned on the same corpus. Our primary ByT5-large system achieves a corpus BLEU score of 39.97, chrF++ of 58.07, BERTScore F1 of 0.8104, and COMET (wmt22-comet-da) of 0.7302 on a held-out test set of 3,856 sentences. We further discuss the orthographic challenges specific to Tangkhul's Latin-script diacritics, the domain bias of our training corpus (which comprises biblical text, stories, and conversational data), and avenues for future improvement through data diversification and domain adaptation.", "url": "https://wpnews.pro/news/neural-machine-translation-for-low-resource-tangkhul-english", "canonical_source": "https://arxiv.org/abs/2606.25365", "published_at": "2026-06-25 04:00:00+00:00", "updated_at": "2026-06-25 04:16:04.131200+00:00", "lang": "en", "topics": ["natural-language-processing", "machine-learning", "large-language-models"], "entities": ["Tangkhul", "English", "ByT5-large", "mT5-small", "Manipur", "India", "Tibeto-Burman"], "alternates": {"html": "https://wpnews.pro/news/neural-machine-translation-for-low-resource-tangkhul-english", "markdown": "https://wpnews.pro/news/neural-machine-translation-for-low-resource-tangkhul-english.md", "text": "https://wpnews.pro/news/neural-machine-translation-for-low-resource-tangkhul-english.txt", "jsonld": "https://wpnews.pro/news/neural-machine-translation-for-low-resource-tangkhul-english.jsonld"}}