cd /news/natural-language-processing/neural-machine-translation-for-low-r… · home topics natural-language-processing article
[ARTICLE · art-38778] src=arxiv.org ↗ pub= topic=natural-language-processing verified=true sentiment=· neutral

Neural Machine Translation for Low-Resource Tangkhul--English

Researchers present a low-resource machine translation system for the Tangkhul-English language pair, achieving a BLEU score of 39.97 using a ByT5-large model fine-tuned on 38,336 parallel sentences. The study highlights orthographic challenges and domain bias in the training corpus, which consists of biblical, story, and conversational data.

read1 min views1 publishedJun 25, 2026

arXiv:2606.25365v1 Announce Type: new Abstract: We present a study on low-resource machine translation for the Tangkhul-English (nmf-en) language pair. Tangkhul is a severely under-resourced Tibeto-Burman language spoken primarily in Manipur, India, with virtually no prior natural language processing infrastructure. We describe two systems: (1) a primary system based on ByT5-large fine-tuned on 38,336 Tangkhul-English parallel sentence pairs, and (2) a contrastive system based on mT5-small fine-tuned on the same corpus. Our primary ByT5-large system achieves a corpus BLEU score of 39.97, chrF++ of 58.07, BERTScore F1 of 0.8104, and COMET (wmt22-comet-da) of 0.7302 on a held-out test set of 3,856 sentences. We further discuss the orthographic challenges specific to Tangkhul's Latin-script diacritics, the domain bias of our training corpus (which comprises biblical text, stories, and conversational data), and avenues for future improvement through data diversification and domain adaptation.

── more in #natural-language-processing 4 stories · sorted by recency
── more on @tangkhul 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/neural-machine-trans…] indexed:0 read:1min 2026-06-25 ·