cd /news/natural-language-processing/a-modular-architecture-for-typologic… · home topics natural-language-processing article
[ARTICLE · art-17166] src=arxiv.org pub= topic=natural-language-processing verified=true sentiment=· neutral

A Modular Architecture for Typologically Controlled Lexicon Generation

Researchers have developed a modular framework for generating artificial lexicons that are pronounceable, typologically plausible, and semantically structured. The system samples phoneme inventories from the PHOIBLE database, generates word forms using interchangeable phonological grammars, and assigns meanings via a Swadesh–Leipzig–Jakarta ontology. Evaluation shows probabilistic grammars outperform deterministic and random baselines on phonotactic coherence and typological realism across lexicon sizes of 100 to 5,000 forms.

read1 min publishedMay 29, 2026

arXiv:2605.28824v1 Announce Type: new Abstract: Constructing artificial lexicons that are pronounceable, typologically plausible, and semantically structured remains an open challenge in computational linguistics. Existing conlang generators either lack formal phonotactic guarantees or delegate generation to opaque, non-reproducible LLM-based pipelines. We propose a modular framework that samples phoneme inventories from PHOIBLE, generates word forms under interchangeable phonological grammars (deterministic, OT, and MaxEnt), and assigns meanings via a Swadesh--Leipzig--Jakarta ontology with explicit form--meaning alignment. Evaluation on character $n$-gram perplexity, log-likelihood, and KL divergence against PHOIBLE across lexicon sizes of 100-5,000 forms shows that probabilistic grammars consistently outperform deterministic and random baselines on both phonotactic coherence and typological realism.

── more in #natural-language-processing 4 stories · sorted by recency
entitymap.org · · #natural-language-processing
EntityMap
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/a-modular-architectu…] indexed:0 read:1min 2026-05-29 ·