Why Definitions Need Events — and How to Build a New Kind of Dictionary from Wikipedia Using LLMs

A new knowledge representation architecture called Synapedia defines concepts using events sourced from Wikipedia, addressing a structural flaw in existing systems like WordNet and Wikidata. The system uses four layers and 15 semantic roles to produce grounded, minimal definitions that disambiguate terms. A Python bootstrapping script is published to generate validated entries via LLMs.

Every existing AI knowledge representation system — WordNet, Wikidata, FrameNet, Cyc, every ontology, every knowledge graph, every large language model — is structurally incapable of fully defining what a concept IS, because This is not a limitation that can be patched. It is an architectural flaw baked into the foundation. No amount of additional data, larger models, or better embeddings can fix it — because the fundamental unit of representation is wrong. The fix is not complicated. A definition of wagon must include the events transport , pull , and roll — because those three events are what separate a wagon from a cart, a carriage, a trailer, and a sled. Without them, the definition is incomplete. A definition of bank financial institution must include deposit and lend — because those two events are what separate it from bank river edge , which requires border and erode . Without events, every definition is ambiguous and every lexicon is structurally unreliable. Events are not metadata. Events are not optional annotation. Events are the load-bearing structure of the definition itself. But crucially: these events are not declarations of ultimate truth. They are sourced anchors — drawn from authoritative references like Wikipedia for the purpose of disambiguating concepts from their lemma-mates. The system does not claim that these events are true in every possible world. It claims that they are the established, consensus‑based identifiers that distinguish one concept from another. Truth — and falsehood, and lies, and conflicting perspectives — belongs to the ABox layer, which holds the raw claims. Synapedia is the TBox: the definitional substrate that makes those claims addressable. This essay describes a complete architecture — Synapedia — that implements this principle. It is a new kind of dictionary: every concept is defined using four layers Lexical, Ontological, Mereological, Perdurantist , a closed grammar of exactly 15 semantic roles for event representation, and six axioms that guarantee every definition is grounded, minimal, consistent, and unique. The Perdurantist events are sourced from high‑quality references; the validation axioms ensure that those events are sufficient for disambiguation. Only the minimal number of events needed to separate a term from its lemma-mates are included — nothing more. And I am publishing the Python bootstrapping script alongside this article. It downloads Wikipedia articles, generates proposed Synapedia entries using LLM calls, validates them against all six axioms, and outputs a validated JSON lexicon. It produces draft entries — a starting point for building this new kind of dictionary incrementally. The architecture is designed for scale; the script gives you the on‑ramp. Let me be precise about the failure mode. It is not that existing lexicons are wrong. It is that they are incomplete in a specific, structural way that no amount of extension can fix. WordNet defines a wagon as “a four‑wheeled wagon drawn by draft animals.” This is circular. It gives you a hypernym chain wagon → vehicle → instrumentality → artifact and a gloss. It does not tell you what a wagon does . It cannot distinguish a wagon from a cart because the difference is behavioral, not categorical. Wikidata provides structured IDs and property‑value pairs. You can assert that a wagon has part wheel, and that it is a subclass of vehicle. This is more structured, but still static. There is no standard way to represent that a wagon participates in events of pulling, transporting, and rolling. The predicate space is open — anyone can invent a new property — which means no two graphs use the same vocabulary and no machine can reliably interpret across them. FrameNet is the closest existing approach. It defines frames — event structures with participants — and maps words to them. FrameNet was a direct inspiration. But FrameNet has no grounding requirement its frames float unattached to any reality coordinate , no closed grammar new frame elements are constantly added , no minimality constraint a FrameNet frame can include dozens of participants, many of which are not definitionally necessary , and no IS‑A hierarchy. FrameNet is a valuable resource; it is not a substrate. Large language models have no explicit definitions at all. They rely on distributional semantics. This works surprisingly well for many tasks, but it fails catastrophically when precision is required. An LLM cannot tell you why a tomato is both a fruit and a vegetable — it can only report that the phrase occurs in its training data. It cannot reason about the difference between botanical and culinary classification because it has no model of classification at all. All four systems share the same root cause: behavior is absent from the definitional structure. Every system treats concepts as static things with properties and categories. None treats events as definitional. And because events are the primary way that concepts interact with the world, any lexicon that omits them is building on an incomplete foundation. Before describing the architecture, I need to state the principle that governs it. A definition does not need to be an encyclopedia entry. It does not need to list every property, every usage, every historical fact about a concept. It needs to be enough — enough to distinguish this concept from every other concept that shares its word and part of speech. I call this the Principle of Minimal Sufficient Disambiguation . For bank financial institution , the minimal sufficient definition is: it accepts deposits, it lends money. Two events. That is enough to separate it from bank river edge , which is defined by: it borders water, it erodes. For wagon , the minimal sufficient definition includes: it transports cargo, it is pulled by draft animals, it rolls on wheels. Three events. That is enough to separate it from cart pulled by humans or smaller animals , carriage carries passengers , and sled does not roll . For tomato , the minimal sufficient definition needs no events at all — the microgloss “edible red fruit” plus the dual IS‑A parents botanical fruit, culinary vegetable is enough to disambiguate it from any other concept called “tomato.” The Perdurantist layer is empty, and that is valid. For persons with identical names — say, four different men named Theodore Roosevelt — the minimal sufficient definition is a handful of sourced events: birth date, death date, profession, notable achievement. Pulled from Wikipedia or equivalent authoritative reference. Not as ultimate truth, but as the established consensus identifiers that distinguish one Theodore Roosevelt from another. The principle is simple. The implementation requires a complete rethinking of what a dictionary entry is. Before diving into the architecture, I need to make something clear. Synapedia is the TBox — the terminological box. It defines concepts. It does not assert facts about the world. The ABox is where claims live. Claims can be true, false, mistaken, or deliberately deceptive. SGF is a collection of claims. “I was going the speed limit and I braked before collision” — defendant. “The car was obviously going way too fast, and didn’t appear to brake at all. Made me wonder if driver was distracted” — witness 1. Both are stored as ABox synapses. The system does not decide which is true. Synapedia’s role is to provide the stable definitions — the reference structure — that those claims point to. The Perdurantist events are sourced from authoritative references Wikipedia, Wikidata, academic databases so that the TBox can serve as a reliable disambiguation anchor. But those events are not “the truth” in any ultimate sense. They are the best available consensus identifiers, and they can be updated via deprecation if new evidence emerges. Synapedia is the substrate layer for the Symbol Grounding Framework. It defines every concept using exactly four orthogonal layers. The Lexical Layer holds what to call the concept: lemma, part of speech, a short disambiguation tag called a microgloss, an optional gloss for human readability, and an optional vector embedding. This layer anchors the concept to language. It is not the definition. A Lexical Node has no outgoing edges. The Ontological Layer positions the concept in a type hierarchy via IS‑A parents. Multiple parents are permitted. Tomato is IS‑A fruit botanical AND IS‑A vegetable culinary . This is not a conflict to be resolved; it is a fact to be represented. The IS‑A graph must be strictly acyclic. The Mereological Layer captures part‑whole composition, but only when those parts are definitionally essential. A wagon needs wheels, axle, hitch, bed, and shaft. Remove any of them and a wagon becomes a different kind of thing. Beethoven has no mereology — his body parts are not definitional. The layer is optional. The Perdurantist Layer is the innovation. It captures the minimal set of events that distinguish this concept from every other concept sharing its word and part of speech. These events are expressed using a closed grammar of exactly 15 semantic roles. The events are sourced from authoritative references — Wikipedia for historical figures, established taxonomies for biological species, consensus definitions for artifacts. They are not declarations of absolute truth; they are the established identifiers that make disambiguation possible. Only the minimal number of events required for disambiguation are included. Here is the full Synapedia entry for wagon . Every field is load‑bearing. { "canonical id": "sgf:en.wagon.n.horse drawn cargo", "layer": "perdurantist", "lexical": { "lemma": "wagon", "pos": "n", "microgloss": "horse drawn cargo", "gloss": "A four‑wheeled vehicle for hauling goods behind draft animals." }, "ontological": { "is a": "sgf:en.vehicle.n.land transport", "sgf:en.container.n.holds cargo" }, "mereological": { "has component": "sgf:en.wheel.n.wagon wheel", "sgf:en.axle.n.wagon axle", "sgf:en.hitch.n.wagon hitch", "sgf:en.bed.n.wagon bed", "sgf:en.shaft.n.wagon shaft" }, "perdurantist": { "essential events": { "hub": "sgf:en.transport.v.carry cargo", "spokes": { "HAS THEME": "sgf:en.cargo.n.goods transported" , "HAS INSTRUMENT": "SELF" , "HAS AGENT": "sgf:en.draft animal.n.horse or ox" } }, { "hub": "sgf:en.pull.v.exert force", "spokes": { "HAS AGENT": "sgf:en.draft animal.n.horse or ox" , "HAS PATIENT": "SELF" , "HAS INSTRUMENT": "sgf:en.hitch.n.wagon hitch" } }, { "hub": "sgf:en.roll.v.rotate on axis", "spokes": { "HAS AGENT": "SELF" , "HAS THEME": "sgf:en.wheel.n.wagon wheel" , "HAS LOCATION": "sgf:en.ground.n.earth surface" } } }} Now walk through the Event Minimality test. Remove the transport event. The wagon is no longer distinguishable from a carriage — both are wheeled vehicles pulled by animals, but a wagon carries cargo and a carriage carries passengers. Remove the pull event. The wagon is no longer distinguishable from a trailer. Remove the roll event. The wagon is no longer distinguishable from a sled. Three events. That is the exact boundary. Remove any one and the definition collapses. Add any more and the definition is bloated. This is what a dictionary entry looks like when behavior is at the center. It is not a gloss. It is not a category label. It is a behavioral fingerprint that uniquely identifies the concept. Every event in Synapedia is expressed using exactly 15 semantic roles. This set is closed. It will never grow. Six core roles: • HAS AGENT — the deliberate initiator of the action, must be sentient. • HAS PATIENT — the entity that undergoes a change of state. • HAS THEME — the entity that is moved, located, or held; no state change. • HAS EXPERIENCER — the entity that experiences non‑deliberately. • HAS RECIPIENT — the entity that receives the theme. • HAS BENEFICIARY — the entity for whose benefit the action is performed. Nine circumstance roles: • HAS TIME — temporal coordinate. • HAS LOCATION — spatial region where the event occurs. • HAS SOURCE — initial state or location of the theme. • HAS DESTINATION — final state or location of the theme. • HAS MANNER — the manner or style of execution. • HAS INSTRUMENT — a non‑sentient tool used to perform the action. • HAS CAUSE — an inanimate or non‑deliberate trigger of a state change. • HAS REASON — a unified role covering both backward‑looking motive and forward‑looking purpose. • HAS ATTRIBUTE — an event‑result property assigned to a participant. Why no separate HAS PURPOSE role? The distinction between reason and purpose is real in natural language, but in the vast majority of cases they refer to the same fact viewed from different temporal directions. Requiring authors to choose at ingestion time creates boundary disputes and inconsistency. A single HAS REASON role captures both. When the distinction genuinely matters — in legal reasoning, for example — a Frame object attached to the Synapse refines the interpretation. The role grammar stays stable. The nuance lives in the frame. A definition system is only as good as its constraints. Synapedia enforces six axioms. Every entry is checked against every axiom at submission time. If an entry fails any axiom, it is rejected. Axiom I — Ontological Acyclicity. The IS‑A graph cannot contain cycles. No concept can be its own ancestor. Axiom II — Foundational Grounding. Every concept must trace via IS‑A edges to either a Prime Registry node or a specific spacetime coordinate. The chain terminates. Axiom III — Component Transitivity. If a wagon has a component axle, and the axle has a component axle pin, then the wagon has a component axle pin. This holds for HAS-COMPONENT only. Axiom IV — Microgloss Sufficiency. Within the set of all concepts sharing the same lemma and part of speech in the same language, every microgloss must be unique. Axiom V — Event Minimality. No event in a Perdurantist layer may be removed while still maintaining disambiguation from all other concepts sharing the same lemma. Every event must be necessary. Axiom VI — Identity Uniqueness. No two concepts may share the same Canonical ID. Fictional entities. Sherlock Holmes does not exist. Fictional entities trace to the Prime Registry via a dedicated branch and carry a work‑of‑origin spacetime coordinate — the publication date and location of A Study in Scarlet 1887, London . The sourcing is from the reference work that establishes the character. Borderline cases. The sorites paradox — Synapedia provides a prototypical definition sourced from standard reference works. The ABox handles boundary negotiation. Changing meanings. Synapedia handles this through deprecation and replacement. The old entry is marked DEPRECATED. A new entry is created with a new Canonical ID. A SUPERSEDED-BY edge connects them. No information is destroyed. Synapedia uses four rules applied in order: Explicit Override, Specificity depth in hierarchy , Temporal Priority newer parent wins , and Manual Annotation. FrameNet is the closest, but lacks grounding, closed grammar, minimality, and IS‑A hierarchy. Cyc collapsed under predicate explosion. Wikidata has scale but no axiomatic constraints. WordNet is circular. Synapedia is not a competitor — it is a substrate that these systems could reference. Synapedia is not a truth database. It does not decide whether “Beethoven composed Symphony №9” is true. It defines what composing is, what Beethoven is, and what Symphony №9 is — using events sourced from authoritative references. Truth and falsehood belong to the ABox. Synapedia is not an encyclopedia. The full biography belongs in Knowledge Packs. Synapedia is not a replacement for large language models. It is what they are missing: a reliable, grounded, machine‑verifiable dictionary that tells them what words mean. A four‑layer dictionary with a closed grammar of 15 roles and six axioms may sound like a solution in search of a problem. It is not. The problem costs billions of dollars every year. Consider the NASA screwdriver — a specific, real‑world procurement failure that illustrates the depth of the problem. A government agency issued a Request for Proposal RFP for a flight‑qualified torque driver — the kind used on the Mars mission. A vendor responded with a titanium torque driver with a red anodized collar. The procurement officer had no way to prove whether the offered part was equivalent to the requirement. Keyword search could not distinguish it from a hundred other torque drivers. Embedding similarity could not tell the difference between “vacuum‑rated” and “atmospheric.” The officer guessed. The cost of being wrong was not just the price of the part — it was a safety risk on flight hardware. This is the problem that Synapedia + SOAM solve structurally. But not every search has the same stakes. The system needs to handle three different magnitudes of matching , each governed by a different policy decision about how deep to dig. Magnitude 1: A single search. A customer needs one part, or a procurement officer checks one line item. The system runs a fast comparison — lemma plus embedding — to find candidates. This is the same trick two humans use in ordinary conversation: you hear the word “torque driver” and your brain activates the nearest concept. It works because the consequence of ambiguity is low — you can clarify in the next sentence. Magnitude 2: A full RFP with numerous items. A procurement officer publishes a 200‑line‑item RFP. Each item must be matched against vendor proposals. Human review takes days and misses mismatches. The system aligns all 200 items simultaneously, slot by slot, producing a compliance matrix. The scale changes the game: a single error is amplified across the entire procurement. Magnitude 3: A bulk catalog enrichment or cross‑vendor search. Hundreds of thousands of SKUs must be normalized against a shared lexicon. The system runs full structural alignment on every entry, building the ontology automatically from raw product data. How deep should the system compare? The answer is driven by one thing: the consequence of getting it wrong . Depth 1 — Lemma + Embedding. Fast, cheap, good for exploration. This is what every search engine and chatbot does today. It is also what humans do most of the time in ordinary conversation. You hear “screwdriver” and you immediately access the tool concept; you don’t decompose it into events or parts unless the context forces you. Depth 1 is sufficient when the cost of ambiguity is low — browsing, casual questions, internal staging. Depth 2 — Decompose into ontology, mereology, and events, then compare one level deeper than the entry row. The system looks at the concept itself, its immediate IS‑A parents, its components, and its minimal events. It checks every populated slot. This is the right depth for purchase commitments, safety‑critical parts, and any situation where a wrong match would trigger a return or a compliance audit. The NASA screwdriver requires at least Depth 2, because the difference between “vacuum‑rated” and “atmospheric” lives in the HAS ATTRIBUTE slot and the HAS REASON use case slot. Depth 3 — Go to the parents of the parents. Traverse the ontology two or three generations up. This catches cases where the immediate parent does not match, but the grandparent does. For example, “bell‑bottom hip huggers” might not match “trousers” directly, but both are IS A lower body garment. Depth 3 is rarely needed for product identification — the definitional events at the entry level usually suffice. But it becomes important when the lexicon is sparse or the domain has deeply nested hierarchies e.g., biological taxonomy, military specifications . How deep should we go? It is a policy decision, not a one‑size‑fits-all rule. A luxury watch retailer might set Depth 2 for every purchase, because the return cost is high. A grocery delivery service might stay at Depth 1 for most items, reserving Depth 2 only for items with allergen or dietary restrictions. The same system enforces different depths for different contexts — and the policy is published transparently so the customer knows what they are getting. Everyday conversation between two people runs on Depth 1 almost exclusively — we use the lemma+embedding trick constantly. We do not decompose every noun into its parts and events. That would be exhausting. But when the stakes rise — when you are ordering a part for flight hardware, or signing a contract — Depth 2 or even Depth 3 becomes necessary. Synapedia gives you the substrate to make that depth possible, and the alignment engine RDP/SOAM lets you choose the depth based on the cost of being wrong. Synapedia provides the lexicon; the Recursive Decompression Protocol RDP and its alignment engine, the Structural Ontology Alignment Method SOAM , are built directly on top. They work as follows: 1. Every concept in an RFP — each line item — is parsed into Synapedia’s four‑layer schema: its IS‑A parents, its parts, and the minimal events that define it. 2. Every concept in the vendor’s proposal — each offered product — is parsed into the same schema. 3. SOAM compares them recursively at the chosen depth: slot by slot, generation by generation, until every component traces to the Prime Registry or a physical constant. 4. The output is not a similarity score. It is a ProofTrace — a signed, verifiable record of every comparison — or a GapReport that tells the vendor exactly which slot failed and why. For a single item, this takes milliseconds. For a 200‑line RFP, it takes seconds. The same infrastructure handles both, because the alignment engine does not care whether it is comparing one concept or two hundred. A procurement officer using this system can publish an RFP as a set of Synapedia‑schema requirements. Vendors propose solutions in the same schema. The system aligns them automatically and outputs a compliance report with passed items ProofTraces and failed items GapReports . No guesswork. No billion‑dollar mistakes. Synapedia is not the alignment engine. It is what makes the alignment engine possible. The closed grammar, the Prime Registry, the minimal event definitions, the sourcing metadata — every axiom and every layer exists because a deterministic alignment engine requires a stable, grounded, machine‑verifiable reference. Synapedia is that reference. And because the depth of comparison is a policy decision, not a technical fix, the same system can serve both casual browsing and mission‑critical procurement, tuning its rigor to match the cost of being wrong. I am publishing the bootstrapping script alongside this article. It is a Python tool that: 1. Downloads a target set of Wikipedia articles configurable by category, size, or specific list . 2. Extracts candidate concepts — nouns, verbs, adjectives — with their definitions, hypernyms, and usage examples. 3. For each candidate, generates a proposed four‑layer Synapedia entry using LLM calls API key required; any major provider works . 4. Validates every generated entry against all six axioms. 5. Outputs a validated JSON lexicon ready for ingestion. The script does not produce a complete dictionary in a weekend — that would be unrealistic. What it does is give you the pipeline to build incrementally. Start with a small seed set — a thousand entries for your domain — and grow from there. The script supports parallelization multithreading, batching so you can scale up as your API budget allows. Why Wikipedia? Because it is the largest existing corpus of grounded, structured definitions. Every Wikipedia article is already a kind of dictionary entry — it tells you what something is, what it does, what parts it has, and what events it participates in. The script extracts that structure and maps it into the four‑layer schema. The LLM handles the mapping; the validation tool catches the hallucinations. The result is a machine‑verifiable lexicon from the first entry. The bootstrapping process is honest: it generates drafts, not final definitions. Each entry requires human review before it becomes canonical. But the architecture gives you a framework for that review: the axioms flag violations, the minimality test catches redundancy, the grounding check prevents floating definitions. You are not starting from zero. The bootstrapping scripts are on the SGF git repo at: https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/code https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/code The current state of AI knowledge representation is structurally incomplete. Not wrong — incomplete. Our systems can generate fluent text but cannot answer “What is a bank?” with certainty. They can classify millions of images but cannot define what distinguishes a wagon from a cart. They can reason about complex categories but cannot ground a single definition in behavior. This is not fixable by scaling. The flaw is that definitions do not include events — and without events, no dictionary can be complete. Synapedia fixes this. The four‑layer architecture, the closed grammar of 15 roles, the six axiomatic constraints — together they form a dictionary that is grounded, minimal, stable, and verifiable. The Perdurantist events are sourced from authoritative references, not as ultimate truth, but as the consensus anchors that make disambiguation possible. SGF is a collection of claims; truth is a separate negotiation in the ABox. Synapedia is the substrate layer of the Symbol Grounding Framework SGF — a larger architecture for transforming raw human thought into crystalline‑structured formats that can be stored in databases, transmitted between machines with zero prior integration, and governed by formal protocols. The architecture is complete. The axioms are enforced. The exemplaries are worked through. The bootstrapping script is published. The ground is ready. Build on it. These patterns, and the Synapedia substrate, are tools being used to build out the Symbol Grounding Framework SGF . SGF is an architecture for transforming raw, unstructured human thought into crystalline‑structured format that can be stored at rest in a database or knowledge graph, or sent from machine to machine using a protocol that makes machine‑to‑machine communication with zero prior integration possible and trustworthy. There is also a formal governance language that controls what a machine can, may, and must do. You can learn more about the SGF project here: SGF Language: https://sgf-lang.org https://sgf-lang.org Omega, the machine governance language: https://omega-lang.org https://omega-lang.org The six‑volume SGF book series: https://www.amazon.com/dp/B0H3FGSPK6 https://www.amazon.com/dp/B0H3FGSPK6 YouTube channel The Third Protocol : https://www.youtube.com/@The-Third-Protocol https://www.youtube.com/@The-Third-Protocol GitHub repository: https://github.com/SymbolGroundingFramework/SGF-manifest https://github.com/SymbolGroundingFramework/SGF-manifest RFC specifications: https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/specs https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/specs All books in the SGF book series free PDF : https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/books https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/books The Symbol Grounding Company: https://symbolgroundingcompany.ai https://symbolgroundingcompany.ai Bootstrapping script repository: https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/code https://github.com/SymbolGroundingFramework/SGF-manifest/tree/main/code The full Synapedia Architecture Specification is available as a companion document on the same GitHub org. For questions or proposals to contribute to the initial entry set, contact the SGF Architecture Review Board. Why Definitions Need Events — and How to Build a New Kind of Dictionary from Wikipedia Using LLMs https://pub.towardsai.net/why-definitions-need-events-and-how-to-build-a-new-kind-of-dictionary-from-wikipedia-using-llms-094efd600871 was originally published in Towards AI https://pub.towardsai.net on Medium, where people are continuing the conversation by highlighting and responding to this story.