{"slug": "afrisud-a-dependency-treebank-collection-for-evaluating-models-on-african", "title": "AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages", "summary": "Researchers introduced AfriSUD, the first large-scale collection of syntactically annotated treebanks for nine African languages, using the Surface-Syntactic Universal Dependencies framework to capture typological features like agglutination and tone. The community-led effort provides native-speaker verified data to address the underrepresentation of African languages in NLP resources. Evaluations of multiple models for part-of-speech tagging and dependency parsing revealed a significant syntax gap, indicating that current architectures struggle to fully capture the structural diversity of African-language syntax.", "body_md": "arXiv:2606.12708v1 Announce Type: new\nAbstract: Despite their linguistic diversity and global significance, African languages remain underrepresented in research and resources to support NLP. We aim to bridge this gap by introducing AfriSUD, the first large-scale collection of syntactically annotated treebanks for nine diverse African languages spanning major language families and regions across Sub-Saharan Africa. Using the Surface-Syntactic Universal Dependencies (SUD) framework, our community-led effort provides high-quality, native-speaker verified data that capture typological key features such as agglutination and tone. We evaluate a range of models on AfriSUD for part-of-speech tagging and dependency parsing including non-transformer baselines, multilingual pretrained encoders, and LLMs. Our results reveal a significant syntax gap, where models still show clear limitations across the nine languages, suggesting that existing architectures may not fully capture the structural diversity of African-language syntax.", "url": "https://wpnews.pro/news/afrisud-a-dependency-treebank-collection-for-evaluating-models-on-african", "canonical_source": "https://arxiv.org/abs/2606.12708", "published_at": "2026-06-12 04:00:00+00:00", "updated_at": "2026-06-12 04:55:15.767723+00:00", "lang": "en", "topics": ["natural-language-processing", "machine-learning", "large-language-models", "ai-research"], "entities": ["AfriSUD", "Surface-Syntactic Universal Dependencies", "SUD"], "alternates": {"html": "https://wpnews.pro/news/afrisud-a-dependency-treebank-collection-for-evaluating-models-on-african", "markdown": "https://wpnews.pro/news/afrisud-a-dependency-treebank-collection-for-evaluating-models-on-african.md", "text": "https://wpnews.pro/news/afrisud-a-dependency-treebank-collection-for-evaluating-models-on-african.txt", "jsonld": "https://wpnews.pro/news/afrisud-a-dependency-treebank-collection-for-evaluating-models-on-african.jsonld"}}