{"slug": "bioelx-cross-lingual-biomedical-entity-linking-via-alias-based-retrieval-and-llm", "title": "BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking", "summary": "Researchers have developed BioELX, a two-stage cross-lingual biomedical entity linking framework that requires no task-specific annotated training data. The system improves candidate retrieval by enriching SapBERT with multilingual aliases from Wikidata and performs context-aware disambiguation using a pre-trained LLM ranker. BioELX achieved state-of-the-art performance across five benchmarks, with Recall@1 gains of up to +30.8 on low-resource languages like Thai and +22.1 on Korean.", "body_md": "arXiv:2605.27380v1 Announce Type: new\nAbstract: Cross-lingual biomedical entity linking (BEL) maps mentions in any language to unique identifiers in a biomedical knowledge base (KB), supporting clinical and biomedical NLP applications. However, expert-annotated training data for BEL are costly, especially for low-resource languages. Moreover, many cross-lingual BEL systems rely on SapBERT-based retrievers trained on predominantly English aliases in the KB, leading to poor generalization to unseen non-English mentions and limited context-aware disambiguation. We propose BioELX, a two-stage cross-lingual BEL framework that requires no task-specific annotated training corpora. In Stage~1, we enrich SapBERT training with Wikidata-derived multilingual aliases and use the resulting retriever to improve cross-lingual candidate retrieval. In Stage~2, we perform context-aware disambiguation with a pre-trained LLM ranker that jointly considers the mention context and candidate, eliminating the need for supervised training. Experiments on five benchmarks (XL-BEL, EMEA, Patent, WikiMed-DE, and MedMentions) show that BioELX achieves new state-of-the-art performance. It improves average Recall@1 on XL-BEL by +19.2, with especially large gains for low-resource languages, e.g., +21.6 on Turkish, +22.1 on Korean, +30.8 on Thai, and delivers consistent improvements on EMEA (+6.2), Patent (+5.4), and WikiMed-DE (+12.8). Code and resources will be released upon publication.", "url": "https://wpnews.pro/news/bioelx-cross-lingual-biomedical-entity-linking-via-alias-based-retrieval-and-llm", "canonical_source": "https://arxiv.org/abs/2605.27380", "published_at": "2026-05-28 04:00:00+00:00", "updated_at": "2026-05-28 04:34:20.729947+00:00", "lang": "en", "topics": ["natural-language-processing", "large-language-models", "artificial-intelligence", "machine-learning", "ai-research"], "entities": ["BioELX", "SapBERT", "XL-BEL", "EMEA", "Patent", "WikiMed-DE", "MedMentions", "Wikidata"], "alternates": {"html": "https://wpnews.pro/news/bioelx-cross-lingual-biomedical-entity-linking-via-alias-based-retrieval-and-llm", "markdown": "https://wpnews.pro/news/bioelx-cross-lingual-biomedical-entity-linking-via-alias-based-retrieval-and-llm.md", "text": "https://wpnews.pro/news/bioelx-cross-lingual-biomedical-entity-linking-via-alias-based-retrieval-and-llm.txt", "jsonld": "https://wpnews.pro/news/bioelx-cross-lingual-biomedical-entity-linking-via-alias-based-retrieval-and-llm.jsonld"}}