Context engineering for analytics agents: six months of building and rebuilding

wpnews.pro

Analytics agents need context. Great. How should you structure it?

Over the past six months, we have built and rebuilt the context behind our analytics agents several times. We started with a structured entity graph, machine-friendly definitions, and a layer that checked each query plan against the model and refused anything it could not ground, before generating SQL.

We liked it. Then we removed most of it.

We only kept the part that mattered: when the agent cannot ground an answer, it refuses instead of guessing. The problem was not a lack of context. It was deciding where each fact belonged, how the agent would find it, and how that fact could change without leaving stale copies behind.

The rule that matters the most here is more for maintenance than pure context quality: one fact, one authoritative authored home. The same fact can appear in generated views or be referenced elsewhere. But there should only be one place where it is edited. This is what keeps maintenance manageable.

(root) Company data                              global context
├── customers                                   domain
│   ├── CUSTOMERS                               table
│   │   ├── CUSTOMER_ID                         column
│   │   └── COUNTRY                             column
│   ├── CUSTOMER_ACTIVITY                       table
│   ├── active_customers                        metric
│   └── defining-active-customers               additional context document
└── revenue                                     domain
    ├── INVOICES                                table
    ├── PAYMENTS                                table
    ├── net_revenue                             metric
    └── measuring-revenue                       additional context document

This is the navigation tree, not the complete storage format. Every element has an address, its path from the root, and any document can link to any other element by that address. So the tree is really a directed graph: a document under customer activity can point straight at a revenue metric, not only at its own parent.

The agent follows those links the way you would follow references in a wiki. Tables and metrics are assigned to domain paths, columns sit inside tables, and joins are structured links with column pairs, conditions, and cardinality. Each domain can also carry markdown for the business context that does not fit into those fields.

A synonym for COUNTRY

goes on the column. The grain of CUSTOMERS

goes on the table. The formula for active_customers

goes on the metric. A rule spanning invoices and payments goes in measuring-revenue

. Only rules that apply everywhere sit at the root.

For everything else, we move up this ladder until we find the lowest level that fully owns the fact:

column synonym
└── column description or note
    └── table description
        └── domain context
            └── root context

Markdown carries the connective tissue: how several tables work together, which path answers a question, or how to disambiguate two definitions. It should not repeat schema or joins the agent already receives in structured form.

We first used flat domains that mostly mirrored warehouse tables. It looked clean until a rule spanned three tables. Nested domains, including tableless ones, gave those rules an address. The join between customer activity and revenue stays in structured metadata; the domain context explains when crossing it is correct for the business question.

Let the agent navigate the same structure #

At query time, the agent starts with a small map of the root domains and a single tool to explore the context. It requests a path and receives that node’s context, children, table summaries, and metrics, then asks for detailed schemas and joins only when needed. This is model-directed routing over a deterministic graph, not a search or grep tool that hands back a pile of more or less relevant matches for the agent to sort through.

That makes placement matter. A table in the wrong domain is difficult to discover; an unassigned one is effectively invisible. Put everything at the root, and every question receives almost everything, which is how you get context rot: the more loosely relevant material sits in the window, the worse the model reasons over what actually matters.

In one internal test, constraining the agent to explore the relevant branches instead of giving it a full search tool reduced context-token usage by 45%.

Walking the graph also gives a guarantee that free exploration does not. To reach a low-level domain, the agent has to pass through the path that leads to it, usually its parent branch, sometimes a link from another document. Whatever it pulls arrives already framed by the context it traveled through, so a metric is never read in isolation from the domain that defines what it means.

Consider a user asking: “How many active customers did we have in France last month?”

Question
└── root rule detects that "active" is ambiguous
    └── agent asks which definition to use
        └── user selects "paid invoice in the previous 30 days"
            ├── retrieve customers/defining-active-customers
            ├── retrieve revenue and the relevant metric
            ├── inspect CUSTOMERS, INVOICES, and PAYMENTS
            ├── follow the structured joins and their cardinality
            └── generate SQL using the stored country value 'FR'

The answer can then cite the context paths and tables it used, alongside the generated SQL and context version. If the user disagrees, there is something concrete to inspect.

Ambiguity is not hypothetical here. On one test dataset, two reasonable definitions of “active” returned counts almost four times apart. In a five-question internal check, a root instruction to surface undefined qualifiers made the agent ask for clarification four times. Without it, it asked twice.

The boring details matter as much as the business definitions. If France is stored as 'FR'

, the context should record that mapping. Join cardinality belongs on the join. None of this replaces access control. The context an agent can retrieve, and the queries it can execute, still need to respect warehouse permissions, row-level policies, and restrictions on sensitive data.

Make the update path obvious #

Nobody wants another catalog that requires Friday afternoons of manual maintenance. If a column changes, a machine can detect it by inspecting the warehouse. If a conversation reveals a changed business definition, it can find the domain or metric that owns it, draft an edit, and show the diff to a reviewer.

The human stays in the loop for the decision, not the search. A model can propose that trials should no longer count as active customers. It should not make that company policy by itself.

The same tree/graph structure described above makes failures debuggable along a short, determined path. When the agent gets something wrong, the first question is whether the fact is in the context at all. If it is not, add it to the element that owns it. If it is there but the agent never reached it, the problem is upstream, in the domain description that should have routed it there, and the fix is to repair that pointer or move the fact somewhere it will be found. The failure points to one of the two, so you are not guessing where to look. It makes it way easier to optimize than RAG, specifically for each project, and more broadly as a system.

Published context versions keep the history and can be exported to Git as generated, readable files. Domain ownership helps route a review, but it does not settle every governance question. When Finance and Sales disagree, someone still needs the authority to decide.

This is also why we dropped our favorite abstraction. We had built a logical entity layer with one user

concept over several country-specific tables. It looked elegant from the agent’s side. In practice, its fields mirrored the underlying tables and drifted from them. We had created a second, manually maintained copy of the schema.

That does not make reusable semantic models a bad idea. We still use governed metrics, nested business domains, and shared context. We removed a free-floating layer that repeated physical fields without a reliable way to synchronize them. The structure is now anchored to inspectable physical tables; the manually governed layer focuses on what the warehouse cannot tell us.

In one implementation, 55% to 90% of the handwritten context, depending on the domain, repeated information already available in the schema. Larger context windows would have fitted every copy. They would not have told the agent which copy the company considered current.

Make the context auditable #

Why optimize context for humans if an agent is the primary reader? Because this is the governed record of what the company considers true for that agent: what counts as an active customer, what Finance means by net revenue, which identifier to use. It is not the operational truth for every dashboard. It is authoritative for the agents consuming it.

We initially stored much of this in structured definitions with lots of fields. Very tidy. Also difficult to review. Moving the connective business meaning into plain markdown made contradictions easier to see and changes easier to discuss.

The tree also becomes a data-discovery interface. Someone exploring revenue can see its tables and metrics, then open measuring-revenue

to understand how invoices and payments fit together. They do not need the correct table name before they start. Useful when your warehouse contains 1,400 tables.

People should not have to browse the tree before asking a question. But when an answer matters, they should be able to inspect the context version, retrieved domains and tables, metric, SQL, and execution time. That record is what you check against. Without it, there is no way to reconstruct what the agent actually retrieved, so there is no real way to verify the answer.

Author once, generate the rest #

We did not arrive at this cleanly.

In one test the agent kept counting customers with the wrong identifier, which inflated every total by about 3%. Stating the rule in plain English was not enough. It still slipped on some queries.

What fixed it was adding a worked example that showed the correct join. To be sure the agent always saw both, we copied the rule and the example into three places: a root rule, the table, and the metric.

The answers came out right, and we had just broken our own one-home rule, because the same instruction now lived in three copies we had to keep in sync by hand.

That is what changed how we think about redundancy. The copies were not the problem. A rule in plain English often fails on a specific case, and a worked example that pins it down is what makes the agent reliable, even though the example restates the rule. The problem was that we wrote those copies by hand, which is where drift starts.

So the real test is not how many times a fact appears, it is where the fact is authored. Author it once, in the place that owns it. If it should also show up as a root rule, a table note, or an example query, those copies should be generated from that one source, not typed out again. We are not there yet. But we can already ask three things of any fact: where is it authored, how are its other appearances produced, and can someone see why the agent used it?

None of this is unique to us. While we were converging on it, Google Cloud published the Open Knowledge Format, a vendor-neutral spec that represents knowledge as a directory of markdown concepts linked into a graph, with the authoring side deliberately kept separate from the consuming side. The overlap is not an accident. Once you treat an agent as something that reads context the way it reads a wiki, you end up wanting addresses, links, and a clean split between who writes a fact and who reads it. What an analytics agent needs on top is the structured part: governed metrics, joins with cardinality, and anchoring to inspectable physical tables.

That is the part of context engineering we underestimated. Giving an agent more information is easy. Giving every fact a single address, an update path, and a visible role in the answer is the actual work.

source & further reading

blog.getcassis.com — original article

Context engineering for analytics agents: six months of building and rebuilding

Our current context structure: a navigation tree #

Let the agent navigate the same structure #

Make the update path obvious #

Make the context auditable #

Author once, generate the rest #

Run your AI side-project on zahid.host