{"slug": "a-modular-architecture-for-typologically-controlled-lexicon-generation", "title": "A Modular Architecture for Typologically Controlled Lexicon Generation", "summary": "Researchers have developed a modular framework for generating artificial lexicons that are pronounceable, typologically plausible, and semantically structured. The system samples phoneme inventories from the PHOIBLE database, generates word forms using interchangeable phonological grammars, and assigns meanings via a Swadesh–Leipzig–Jakarta ontology. Evaluation shows probabilistic grammars outperform deterministic and random baselines on phonotactic coherence and typological realism across lexicon sizes of 100 to 5,000 forms.", "body_md": "arXiv:2605.28824v1 Announce Type: new\nAbstract: Constructing artificial lexicons that are pronounceable, typologically plausible, and semantically structured remains an open challenge in computational linguistics. Existing conlang generators either lack formal phonotactic guarantees or delegate generation to opaque, non-reproducible LLM-based pipelines. We propose a modular framework that samples phoneme inventories from PHOIBLE, generates word forms under interchangeable phonological grammars (deterministic, OT, and MaxEnt), and assigns meanings via a Swadesh--Leipzig--Jakarta ontology with explicit form--meaning alignment. Evaluation on character $n$-gram perplexity, log-likelihood, and KL divergence against PHOIBLE across lexicon sizes of 100-5,000 forms shows that probabilistic grammars consistently outperform deterministic and random baselines on both phonotactic coherence and typological realism.", "url": "https://wpnews.pro/news/a-modular-architecture-for-typologically-controlled-lexicon-generation", "canonical_source": "https://arxiv.org/abs/2605.28824", "published_at": "2026-05-29 04:00:00+00:00", "updated_at": "2026-05-29 04:23:57.679856+00:00", "lang": "en", "topics": ["natural-language-processing", "artificial-intelligence", "ai-research"], "entities": ["PHOIBLE", "Swadesh", "Leipzig", "Jakarta"], "alternates": {"html": "https://wpnews.pro/news/a-modular-architecture-for-typologically-controlled-lexicon-generation", "markdown": "https://wpnews.pro/news/a-modular-architecture-for-typologically-controlled-lexicon-generation.md", "text": "https://wpnews.pro/news/a-modular-architecture-for-typologically-controlled-lexicon-generation.txt", "jsonld": "https://wpnews.pro/news/a-modular-architecture-for-typologically-controlled-lexicon-generation.jsonld"}}