Part 2 of 4 on building a conversational analytics engine. ~9 min read.
Point a language model at a raw database schema and ask it a real business question.
Watch it guess.
It sees Sales.SalesOrderHeader, Person.BusinessEntity, and three hundred columns named things like rowguid and TerritoryID. It has no idea that "customer" is one table, "orders" is another, and the two connect through a key it would never find on its own. So it invents a join. The SQL runs. A number comes back.
The number is wrong, and nobody notices.
That single failure mode, the confident wrong answer, is the reason this whole system exists. In Part 1, I called it the most dangerous of the seven walls that break text-to-SQL on real data. This article builds the first half of the fix.
And it is why I spent most of my time not on the chatbot, but on the unglamorous thing that comes before it: a metadata map I call the domain graph.
TL;DR
The system runs on an AdventureWorks dataset (Microsoft’s public sample, so you can verify every name here) spread across three engines:
Different engines. Different SQL dialects. Different conventions. And the numbers that should worry you:
That last row is the problem in miniature. Real databases are full of relationships that live in someone’s head and in the application code but were never written down as constraints. The CSV source had zero declared keys, because CSV files have no constraints at all.
So when a user asks “show me orders for the customer Bike World,” the planner needs to know, reliably and ahead of time:
None of that is safely derivable at query time by a model staring at a schema dump. All of it is derivable once, offline. That offline derivation is the domain graph build.
How to read it:
The rest of this article is just those five steps, in order.
Introspection is the deterministic read of each database. One introspector per engine, all returning the same typed structure.
One rule I enforced from day one: introspectors read databases and hand back plain data. They never write to the graph. Keeping the reader and the writer separate saved me later, when the graph build became its own concern.
The leverage is in the per-column record. For every column, I keep:
Two of those are where a naive introspector quietly fails.
Plenty of tables (especially the CSV ones) have no declared primary key. A planner that gives up there is useless. So when there is no database key, I score every column and pick the best candidate.
The heuristic is deliberately boring, because boring is auditable:
An accepted guess gets pk_source = "heuristic" and pk_confidence set to its score. A declared key gets pk_source = "database" and confidence exactly 1.0.
Why I like this:it never lies about its certainty. A guessed key carries its own doubt in a field you can read downstream.
Here is the non-obvious one. For most columns I grab a capped sample. But when a column’s cardinality is low, I fetch every distinct value, not a sample.
Why pull all of them?
The introspector spends a little more on low-cardinality columns specifically to make the next step possible. Everything that comes out gets written to raw_schema.json, the contract between the deterministic world and the LLM world.
The schema is captured, but it is still just structure. It does not know that SalesOrderHeader is what a human means by "orders," that a salesperson calls revenue "rev," or that "last quarter" is a date range.
This is where an LLM comes in. And the single most important decision in the whole system is when.
I bring the LLM in offline, at build time. Never on the hot path for this work.
The model enriches the schema into business knowledge once, the output gets written to versioned files, and at query time everything reads those frozen files. The intelligence is precomputed. That is the trade that makes the system both smart and fast.
Several generators run in order, each feeding the next:
A few things worth pulling out:
There is an operational trap hiding in all this generated config:
I have watched teams “solve” this by never regenerating, and living with stale config forever. My answer is an ownership split enforced at the file level:
The payoff, from a real incident:an expert once added a couple of synonyms through an override file. Months later the entire base config got regenerated from scratch, and those synonyms simplyre-merged on top, untouched. Nobody had to remember anything. That is the difference between config you can maintain and config you are afraid to touch.
Now the build turns all that validated config into an actual graph in Neo4j. This is the heart of the article.
How to read it (the four design decisions that matter): The load-bearing rule:the LLM never writes the joins. It gets them from the graph, deterministically, or the query fails loudly with “no path found” rather than hallucinating a relationship. A made-up join is a silently wrong answer, and silently wrong is the worst failure an analytics system has.
Two engineering footnotes: Neo4j properties must be flat scalars, so lists get JSON-serialized in and parsed out. And every insert is an idempotent MERGE on a stable id, so I can rerun the build over an existing graph without creating duplicates. Rebuilds are routine, not surgery.
The graph is perfect at one kind of question: given a concept, how does it connect? Given CUSTOMER, what table is it, how does it reach ORDERS? That is exact-key lookup and traversal.
It is hopeless at the question that actually shows up first: the user typed “orders,” so which concept did they mean? That is similarity, not traversal. A graph has no notion of “close.”
So I built a second index in Qdrant that does nothing but answer “what did this phrase most likely mean.” Six collections:
Each point is text embedded into a 1536-dimension vector (OpenAI text-embedding-3-small, cosine distance). The clever one is C1, where the embedded text is built from a template:
"{ENTITY_TYPE}: {description}. Also known as: {synonyms}. Common questions: {sample questions}" That template is **exactly why “orders” finds **SALESORDERHEADER at runtime. The user's word is cosine-closest to that blob of description and synonyms, so the match happens by meaning, not by spelling. The graph then does the exact lookup to the real table.
The two stores form a pipeline, and neither can do the other’s job:
Qdrant maps language to concept. Neo4j maps concept to structure."orders" → (similarity) → SALESORDERHEADER → (exact lookup) → Sales.SalesOrderHeader + columns + FK to Customer
Shipped vs. next:I will be straight about this, because pretending otherwise is the wrong kind of credibility.Four of the six collections are live(entities, metrics, glossary, vocabulary).Two are built on every bootstrap but not yet read(columns, entity values). They are deferred, with a note in the code. The build is cheap and wiring them in is a small, isolated change for later. If you assumed every collection was on the hot path, it is not, and I would rather tell you.
When a real query runs, it does not query Neo4j for schema metadata. It reads an in-memory snapshot loaded once at startup, then served entirely from RAM.
The cache is blunt about its contract, and I wrote it that way:
How to read it:
The rule: the graph is the source of truth and gets read in bulk at startup; the hot path reads a local snapshot. The domain graph is immutable between builds, so the snapshot never goes stale mid-session.
If you build one of these, steal the sequence. The ordering is the insight. That is the map. In Part 3, I get to the part everyone actually asks about: how a plain-English question becomes a safe query end to end, how the system resolves the specific things a user names (“Bike World,” “my team”), and the one architectural line that made me comfortable putting this in front of real users with real permissions.
Next up, Part 3: From Schema to Conversation, building the subject graph and the query pipeline.
Your LLM Needs a Map! was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.