{"slug": "beyond-the-semantic-layer-building-a-context-layer-for-the-agentic-era", "title": "Beyond the Semantic Layer: Building a Context Layer for the Agentic Era", "summary": "A new context layer called ktx aims to improve AI agent accuracy for database queries by combining hard semantics (warehouse schema, joins, metric definitions) with soft semantics (business context from docs, wikis, Notion) into a git-reviewed, governed system. This approach gives agents a warm start, reducing errors and query costs compared to relying solely on metadata from git repos or cold database connections.", "body_md": "Writing SQL was never the hard part. Making it *accurate and trustworthy* against your warehouse always was. Point an AI agent like Claude or Codex at your data stack and ask a real analytics question, and the answer is usually mediocre: the agent can scrape some context from your git repos or whatever metadata it can find, but it doesn’t know your joins, your metric definitions, or the business rules that give a number its actual meaning.\n\nSo how do we make data agents reliable and accurate for database queries? Everyone talks about harnesses, evals, and context layers, but the real challenge is bringing them together with data engineering and the context you already have, such as database schemas, a semantic layer, metric definitions, plus the business knowledge that normally never reaches the agent.\n\nThat’s the question this blog tackles: how agents can work with the data stack and analytics, and how a context layer fits in. We also take an inside look at **ktx**, a new context layer that reads from the usual sources but also the less obvious ones (Markdown, Notion, etc.), driven by agentic workers.\n\nThe idea is to pull two kinds of knowledge into one reviewable place: the hard semantics (your warehouse schema, joins, and metric definitions as YAML and SQL) and the soft semantics (the business context living in docs, wikis, and Notion that agents usually never see). Both are committed to git and reviewed like code, so a human stays in the loop while agents get a *warm start* instead of a cold database connection. The payoff: more accurate answers with fewer (and cheaper) queries against the warehouse.\n\nSOURCESWarehouseschema · metricsBI toolsdashboards · joinsNotion + wikisoft semanticsDocs + Markdownnotes · contextCONTEXT LAYERAuto-built. Reviewed in git.Hard semanticsYAML + SQL the warehouse runsSoft semanticsMarkdown the team readsreviewed like codeAI agentaccurate SQL · governed\n\nA context layer ingests your warehouse, BI tools, and docs into hard and soft semantics, then serves governed SQL to your agent.\n\nEvery business or data analyst faces mediocre results when prompting Claude or Codex on their data stack. It might figure out some context by reading the git repos or other metadata it can find. Still, the hard part is teaching the internals and business context that are unique to each business domain and company. So, how do we make AI agents for data and **analytics reliable and accurate for database queries**? Do we need to manually copy and paste Google Docs and Markdown files into the prompts, or can a context layer provide a more reliable, safe, and governed way to ask an AI assistant for analytics?\n\nThese days, it’s much easier to ingest or add almost any number of new data sources with custom-built ETL data pipelines, whereas before we had to make hard decisions about what to include and what not. An AI agent, such as Claude or Codex, can merge multiple data pipelines that access the source database via CLIs and destinations via MCP, API, or CLI. But we still need an API and a process for updating source data, not just once. We need to make sure, test, and verify that the data is correct, potentially more than ever.\n\nThe challenge remains in modeling the data in a way that represents what the business is, making sure data flows fast but is also correct. But any AI assistant is only as good as the context we give it, and how easily it can read and express context, metrics, models, and knowledge. So, does the context layer solve these problems?\n\nThe context layer primarily supports the **accuracy of SQL queries, continuous updates to the business context, and governance**. Additionally, with a newer context layer, we can include more relevant business insights that are stored internally, usually in unstructured form, in tools like Notion as business documentation. Traditionally, these data weren’t included because they weren’t a “pure data source”, but they can be helpful for decision-making, especially when done in an agentic way. Every bit of **human-written data can help the agent make better decisions**. So the ultimate problem we solve is to include more useful data and have a more agentic engineered way that is faster and more streamlined, especially to start with.\n\nThe core is to turn warehouse metadata, BI tool definitions, query history, docs, and approved metric definitions into reviewable files that agents can search and execute. We feed valuable metadata and actual data to agents and humans so that they can be ingested and discovered more easily in one place.\n\nIt’s a further evolution of a [[semantic layer]], such as Cube or AtScale, which are more focused on the modeling of metrics and domain knowledge and usually have less extensive knowledge of documentation, metadata outside of metrics, joins, and source data schema. And building a semantic layer from scratch can take time, as we need to extract metrics and unify them in a single repository in an additional layer. The main driver for a context layer is that it’s automatically generated and includes **additional business knowledge** that lives outside the technical containers of data engineers, such as data catalogs, DDLs, or YAMLs, in tools like Confluence, Notion, and internal wikis.\n\nI have written about how to model data in the agentic era, [Data Modeling for the Agentic Era (Semantics, Speed, and Stewardship)](https://www.rilldata.com/blog/data-modeling-for-the-agentic-era-semantics-speed-and-stewardship), and we’ve seen how metrics and context can strengthen a BI tool or insights. The context layer is all about context that we already have from our database catalogs (information_schema, data types, tables), more technical metadata, but also from our metrics inside the BI tools or in a declarative YAML, while adding new sources such as documentation in Markdown, rich text, fetched via API, CLI, etc. All with the goal of giving agents and LLMs more information to make the right calls autonomously.\n\nMaxime Beauchemin calls documentation, wikis, and Markdown soft semantics, whereas YAML and the SQL metrics are hard facts or hard semantics. More soft semantics, such as business context from domain experts, help agents, and we need to make them available to other agents and humans too, building a **full context layer** that grows fast, where **governance is essential**.\n\nGoogle describes [How Looker’s semantic layer enhances gen AI trustworthiness](https://cloud.google.com/blog/products/business-intelligence/how-lookers-semantic-layer-enhances-gen-ai-trustworthiness), and a semantic layer *can* reduce errors for generative AI and autonomous agents creating queries. They say:\n\nWith the\n\nfoundation of a semantic layer, rather than asking an LLMto write SQL code against raw tables with ambiguous field names (e.g.,`order.sales_sku_price_US`\n\n), the LLM is empowered to do what it excels at: searching through clearly defined business objects within LookML (e.g.,`Orders > Total Revenue`\n\n). These objects caninclude metadata and human-friendly descriptions(e.g., “The sum of transaction amounts or total sales price”).\n\nWe also acknowledge that data modeling for agents is not the same as data modeling for humans, and that shapes how we work with agents and context together.\n\nIn the end, it’s also saving money, as more governance means the agent knows the system in more detail and therefore needs to make fewer queries to the actual database, or can do so in fewer iterations.\n\nQueries on Snowflake, BigQuery, and Fabric can be expensive, and agents work at a different speed than humans. If we add more context to the agents, they need fewer queries because the information can be extracted from the retrieved context.\n\nThe agents have access to all this data, either directly or through CLI, skills, or MCP. The big advantage is that the agents get a warm-up and do not start with a cold database connection. All the warehouse schema, business semantics, BI usage patterns, and human documentation are **in one single place**.\n\nAlso, on the other end, verifying that the queries are correct is hard. More context can help with this.\n\n`ktx`\n\n: Open-source Context LayerKaelio launches its new open-source context layer solution called [ktx](https://github.com/Kaelio/ktx). The goal is to get an OSS context layer solution that everyone can use for the future of analytics.\n\nIt started as an AI data agent platform, with the agent performing well on accuracy benchmarks against other platforms, which helped make the case for developing it further. Dashboards and agent interfaces are easier to build than ever, but these data apps need more durable and complete knowledge from all the company’s data. This extended data added with ktx makes the agent **more accurate**.\n\n[!info] Want to get started immediately?\n\nYou can quickstart with a demo project that fetches data from Postgres and Notion, with a dbt project and a Metabase interface. See[Quick Start with public demo data], or check out the[GitHub repo]for the full code. If you already have a project with relevant context and sources, get started with a simple prompt:`Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install and configure ktx`\n\n. More at the[Quickstart docs].\n\nSo how do we enhance business context for agents to make use of it?\n\nIngestion happens through two parallel items:\n\nIf we look at a high-level overview, compared to a common semantic layer or logical layers in BI tools, you can build a **knowledge layer** around your business, with not only structured data but also unstructured knowledge from your company-wide internal documentation on Notion, or your data in Markdown or a git repository. ktx helps navigate and maintain this business knowledge.\n\nBoth of these can substantially enhance your “context”, basically critical business insights that are usually internal or in a format that can’t be easily used. The challenge here is obviously to separate the updated and accurate documentation or knowledge from the outdated, which needs more focus or rigid process flow/rules on documentation to keep the latest updates, adding an owner and also going back to docs and invalidating them as no longer relevant to **improve the context quality**.\n\n*A knowledge layer wraps structured data and unstructured company knowledge into one place agents can use. See the full video.*\n\nSmart agents such as Claude, Codex or any other that get context have built-in tools to understand business metrics, and combined with this newly ingested unstructured knowledge, they can help us get accurate responses against our warehouse or across our various data sources, and act as the human curator of data sets and entities for BI dashboards and apps.\n\n*Agents pair their built-in reasoning with ingested unstructured knowledge to answer questions across your sources. See the full video.*\n\nIf you compare using raw Claude Code vs. the ktx semantic layer component 1 with extended context, you get faster and more correct answers with ktx. Here’s such an example:\n\nktx has basically two connected sides where the first one **builds and maintains** the context layer, and the second **serves that context** to agents at runtime, which you see above.\n\nThe anatomy of a context layer is based on two files and two jobs. **YAML** for what the warehouse can execute. **Markdown** for what the team needs to interpret it. Both are committed to git and reviewed like code.\n\nThe wiki is the self-organizing collection of ingested Markdown files. And for the YAML files, ktx has a similarly self-organizing executable semantic layer.\n\n`ktx ingest`\n\npulls raw data from your data stack into a context layer, where you can also edit and create additional context manually. Additionally, you plug ktx into your agent of choice, which allows that agent to send extra memories for ingestion to keep the context up to date based on actual source data.\n\nThe semantic part is a set of functions that define metrics precisely, like “monthly revenue” or “profit”, so if you ask your agents, ktx knows exactly how to pull the data and answer the query with SQL, running it consistently.\n\nktx is open source and you can install it with `npm install -g @kaelio/ktx`\n\n. It has a **self-improving context layer** that teaches agents how to query your warehouse accurately, from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you.\n\nWith the ktx CLI you get **context as code**, writing wiki pages and semantic-layer definitions as git-based files you can review, diff, and merge. This is to update and enhance current context (documentation) you have in your company, and provide it to your agents as well. Git-based, so humans and agents can collaborate on it by editing the same context.\n\nSo how does it work? Here are the most common commands you can run and use. We don’t go through the whole [CLI reference](https://docs.kaelio.com/ktx/docs/cli-reference/ktx), but cover the most important ones to get a feel for how to work with the `ktx`\n\nCLI. Six commands cover the loop from `check status → discover → verify → serve`\n\n.\n\n[!note] Setup: Demo data\n\nThis assumes you have set up ktx and want to work with it. If you want to follow along, or build up your own, check the[quickstart with public data]Kaelio provides.\n\n`ktx status`\n\nFirst, get the status, which tells you which connections, LLM, and embeddings are live before you do anything else.\n\n|\n\n```\n1\n```\n\n |\n\n```\nktx status\n```\n\n |\n\n|\n\n```\n1\n2\n3\n4\n5\n6\n7\n8\n9\n```\n\n |\n\n```\n  LLM           claude-code · sonnet  ✓\n  Embeddings    all-MiniLM-L6-v2 (384d)  ✓\n  Storage       sqlite (state) · sqlite-fts5 (search)\n\n  Connections (4)\n    ✓ orbit      postgres\n    ✓ metabase   metabase\n    ✓ dbt        dbt\n    ✓ notion     notion\n```\n\n |\n\n`ktx sl`\n\n`sl`\n\nis the **semantic layer**, the YAML definitions that tell agents how to turn “revenue” into correct SQL. Bare `ktx sl`\n\nlists every source and you can add a query to search. Add `--json`\n\nwhen an agent (not a human) is reading.\n\n|\n\n```\n1\n2\n3\n```\n\n |\n\n```\nktx sl                       # list all semantic sources\nktx sl \"revenue\" --json      # search, machine-readable\nktx sl \"int_active_contract_arr\" --json\n```\n\n |\n\nThe JSON result is rich: a relevance `score`\n\n, a `snippet`\n\n, and `matchReasons`\n\nshowing *why* it matched (lexical, token, semantic). Trimmed:\n\n|\n\n```\n1\n2\n3\n4\n5\n6\n7\n```\n\n |\n\n```\n{\n  \"name\": \"int_active_contract_arr\",\n  \"path\": \"semantic-layer/orbit/_schema/orbit_analytics.yaml#int_active_contract_arr\",\n  \"columnCount\": 5,\n  \"score\": 0.036,\n  \"matchReasons\": [\"lexical\", \"token\"]\n}\n```\n\n |\n\nThis is what makes an agent *cheaper and more accurate* as it finds the right metric from context instead of guessing field names against your warehouse.\n\n`validate`\n\n+ `query --format sql`\n\nWhen you edit a metric, always test first against the live schema with the following command:\n\n|\n\n```\n1\n```\n\n |\n\n```\nktx sl validate mart_revenue_daily --connection-id orbit\n```\n\n |\n\n`validate`\n\ncatches missing columns, bad joins, and unsafe source names, so you fix the YAML *before* it hits production data. Then compile the SQL and **read it** before running anything:\n\n|\n\n```\n1\n2\n3\n4\n5\n```\n\n |\n\n```\nktx sl query \\\n  --connection-id orbit \\\n  --measure mart_revenue_daily.total_net_revenue \\\n  --dimension mart_revenue_daily.revenue_date \\\n  --format sql\n```\n\n |\n\nSwap `--format sql`\n\n→ `--execute --max-rows 100`\n\nonce the SQL looks right. The pattern is always: *compile, eyeball, then run.*\n\n`ktx wiki`\n\nMetrics are the *hard* facts. The wiki is the **soft semantics** we were talking about, such as the business definitions, rules, and gotchas in Markdown that agents search for context. Same shape as `sl`\n\n: bare lists, query searches.\n\n|\n\n```\n1\n```\n\n |\n\n```\nktx wiki \"new hire\"\n```\n\n |\n\n|\n\n```\n1\n2\n3\n4\n5\n```\n\n |\n\n```\n  GLOBAL (4 pages)\n    #1  new-hire-onboarding-requirements  — what every new hire must know by week one\n    #2  orbit-arr-methodology             — contract-first ARR definition + NRR treatment\n    #3  orbit-company-overview            — what Orbit sells, plan tiers, workflow\n    #4  implementation-handoff-process    — Sales Ops → CS handoff requirements\n```\n\n |\n\nWhen [ sqlite-fts5](https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki#options) is configured, wiki search is\n\n`ktx sql`\n\nNeed to peek at the actual rows? `ktx sql`\n\nruns **read-only** SQL (single `SELECT`\n\n/`WITH`\n\nonly) against a connection.\n\n|\n\n```\n1\n```\n\n |\n\n```\nktx sql --connection orbit \"select count(*) from orbit_analytics.int_procurement_qualifying_actions\"\n```\n\n |\n\n`metabase`\n\nconnection has no native scan connector. Supported drivers are `sqlite`\n\n, `postgres`\n\n, `mysql`\n\n, `clickhouse`\n\n, `sqlserver`\n\n, `bigquery`\n\n, and `snowflake`\n\n.`relation \"public.orders\" does not exist`\n\nusually means you’re guessing the schema, which is exactly the problem the semantic layer (`ktx sl`\n\n) exists to solve.Most of the time you want `ktx sl query`\n\n, not raw `ktx sql`\n\n. Reach for `sql`\n\nonly when you genuinely need rows the semantic layer doesn’t model yet.\n\n`ktx mcp`\n\n+ `ktx setup`\n\nThis is what your business users or engineers might use most often, connected with actual agents. Start the MCP server, then wire up your agent of choice:\n\n|\n\n```\n1\n2\n```\n\n |\n\n```\nktx mcp start\nktx setup --agents --target claude-code\n```\n\n |\n\n`setup`\n\ndrops a `.mcp.json`\n\ninto your project and installs the analytics skill. Next time Claude Code opens in that directory, it picks up the `ktx`\n\nMCP server and starts answering questions *with your governed context*, calling `ktx sl`\n\nand `ktx wiki`\n\nunder the hood instead of hallucinating field names.\n\n|\n\n```\n1\n2\n3\n4\n5\n6\n7\n8\n```\n\n |\n\n```\n● Skill(ktx-analytics)\n  Called ktx 8 times\n\n● Here are the metrics defined in the orbit semantic layer:\n  mart_arr_daily         arr                  ARR ($) — contract-first\n  mart_revenue_daily     total_net_revenue    gross − credits − refunds\n  mart_customer_health   at_risk_customer_count   distinct high-risk accounts\n  ...\n```\n\n |\n\nThe agent discovered every metric on its own. No copy-pasted schema, no guesswork.\n\nRecapping the full workflow in one loop:\n\n|\n\n```\n1\n2\n3\n4\n5\n```\n\n |\n\n```\nktx status              # 1. is it wired up?\nktx sl \"revenue\"        # 2. discover metrics\nktx sl validate …       # 3. verify against schema\nktx wiki \"ARR\"          # 4. pull in business context\nktx mcp start           # 5. serve it to your agent\n```\n\n |\n\nWhen you’re done, shut the runtime down cleanly:\n\n|\n\n```\n1\n```\n\n |\n\n```\nktx admin runtime stop\n```\n\n |\n\n**Go deeper:** [Building Context](https://docs.kaelio.com/ktx/docs/guides/building-context) · [Writing Context](https://docs.kaelio.com/ktx/docs/guides/writing-context) · [Reviewing Context](https://docs.kaelio.com/ktx/docs/guides/reviewing-context) · [Serving Agents](https://docs.kaelio.com/ktx/docs/guides/serving-agents) · [LLM configuration](https://docs.kaelio.com/ktx/docs/guides/llm-configuration)\n\nFrom reading configured source connectors (1) from databases, BI tools, modeling code, docs and notes, to context building and extracting the context (2) of each, to reconciliation (3) of creating new semantics and metrics based on it, to validation (4) and checking references before agents rely on them.\n\nResulting in a wiki and semantic layer that is referenced and self-improving, and this is how the semantic layer part looks within ktx:\n\nWrapping up, I know this has been a lot of new information, but I’m very excited about the direction of [ktx](https://github.com/Kaelio/ktx) and integrating logical layers such as metrics and semantic layers with more advanced business context, and with everything being fully open source for you to try out.\n\nBut the challenge of natural language for semantic layers being too imprecise is real. A context layer with hard and soft defined configurations will help the conversational interface via the agents tremendously. More context doesn’t always help, but if it’s well maintained and fetched from the actual source or pulling in actual domain knowledge from Notion pages, it’s priceless.\n\nAlso, the feature of easily automating commands via CLI and adding or fixing wrong context in a central repository quickly is super helpful. This keeps the human in the loop and provides a consistent way to verify a governed layer that holds the company’s context while using the power of agentic engineering.\n\nIf you like this, try ktx with the [getting started guide](https://docs.kaelio.com/ktx/docs/getting-started/quickstart), and [star it on GitHub](https://github.com/Kaelio/ktx). It’s all open source.\n\n```\nFull article published at Kaelio.com - written as part of my services\n```\n\nthe wiki component helps, too, but the semantic layer is key for the speedup [↩︎](https://www.ssp.sh/index.xml#fnref:1)", "url": "https://wpnews.pro/news/beyond-the-semantic-layer-building-a-context-layer-for-the-agentic-era", "canonical_source": "https://www.ssp.sh/blog/agentic-context-layer/", "published_at": "2026-06-11 06:00:08+00:00", "updated_at": "2026-06-22 08:10:17.347378+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "ai-infrastructure", "large-language-models", "generative-ai"], "entities": ["ktx", "Claude", "Codex", "Notion"], "alternates": {"html": "https://wpnews.pro/news/beyond-the-semantic-layer-building-a-context-layer-for-the-agentic-era", "markdown": "https://wpnews.pro/news/beyond-the-semantic-layer-building-a-context-layer-for-the-agentic-era.md", "text": "https://wpnews.pro/news/beyond-the-semantic-layer-building-a-context-layer-for-the-agentic-era.txt", "jsonld": "https://wpnews.pro/news/beyond-the-semantic-layer-building-a-context-layer-for-the-agentic-era.jsonld"}}