cd /news/artificial-intelligence/building-a-trusted-semantic-layer-wi… · home topics artificial-intelligence article
[ARTICLE · art-40320] src=pub.towardsai.net ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

Building a Trusted Semantic Layer with Snowflake Horizon Context

Snowflake Horizon Context enables a trusted semantic layer for enterprise AI by using domain-oriented Semantic Views, metric certification workflows, and governance at the meaning layer. Without such a layer, AI agents make decisions based on unvalidated definitions, leading to governance failures. The approach recommends starting with high-impact metrics like revenue, ARR, churn, and active users.

read14 min views1 publishedJun 26, 2026

In Part 1, we established why enterprise AI struggles: context fragmentation means the same question yields different answers depending on which system resolves it. Snowflake Horizon Context addresses this architecturally — but architecture without implementation is just a diagram on a whiteboard.

This article is where we build. We’ll design domain-oriented Semantic Views, implement metric certification workflows, establish governance at the meaning layer, define organizational responsibilities, and walk through the production patterns that separate a proof-of-concept from an enterprise deployment.

The central claim of this article: the semantic layer is the trust boundary for AI. If your metrics aren’t governed, versioned, and certified, then your AI agents are making financial decisions based on unvalidated definitions. That’s not an AI problem. That’s a governance failure.

A Semantic View is more than a database view with a description attached. It’s a governed business interface that declares:

When Cortex Context (the AI orchestration engine) needs to answer “What was Q3 revenue by region?”, it doesn’t scan raw tables and infer meaning. It reads the Semantic View, finds the certified total_revenue metric, understands that Q3 maps to the fiscal calendar starting in February, knows the join path to the customer dimension for regional breakdown, and generates a governed query.

Without Semantic Views, AI guesses. With them, AI knows.

The first design decision: how to partition your semantic layer.

We strongly recommend domain orientation over monolithic models:

Why domain orientation:

Architect Note:Start with revenue, ARR, churn, and active users. These four metrics typically generate the highest semantic confusion and the fastest ROI. Don’t try to model your entire business on day one.

Here’s a production-grade Semantic View for the Finance revenue domain. Every decision is annotated.

name: finance_revenuedescription: >  Certified revenue metrics for the Finance domain.  All revenue figures follow ASC 606 recognition rules.  Fiscal year starts February 1. All amounts in USD  after currency conversion at daily spot rates.tables:  - name: fct_revenue    base_table: analytics.marts.fct_revenue    description: >      Fact table containing recognized revenue transactions.      One row per revenue recognition event. Grain: transaction-level.    columns:      - name: revenue_id        description: Unique identifier for each revenue recognition event        data_type: VARCHAR      - name: recognized_date        description: >          Date revenue was recognized per ASC 606 rules.          This is NOT the booking date or invoice date.        data_type: DATE      - name: amount_usd        description: >          Revenue amount in USD. Converted from local currency          at daily spot rate on recognition date.        data_type: NUMBER(18,2)      - name: customer_id        description: Foreign key to dim_customer        data_type: VARCHAR      - name: product_line        description: >          Product categorization. Values: Platform, Services, Add-ons.          Platform = core subscription. Services = professional services.          Add-ons = optional product extensions.        data_type: VARCHAR      - name: is_recurring        description: >          TRUE if revenue is from a subscription contract.          FALSE for one-time services, implementation fees, overages.        data_type: BOOLEAN      - name: contract_id        description: Source contract identifier from billing system        data_type: VARCHAR  - name: dim_customer    base_table: analytics.marts.dim_customer    description: >      Customer dimension. One row per customer account.      Segmentation based on current ARR tier.    columns:      - name: customer_id        data_type: VARCHAR      - name: customer_name        data_type: VARCHAR      - name: segment        description: >          Enterprise: ARR > $100K.          Mid-Market: ARR $25K-$100K.          SMB: ARR < $25K.          Segments reassessed quarterly.        data_type: VARCHAR      - name: region        description: >          Geographic region based on billing address.          Values: AMER, EMEA, APAC, LATAM.        data_type: VARCHAR      - name: industry        description: NAICS-based industry classification        data_type: VARCHARrelationships:  - name: revenue_to_customer    left_table: fct_revenue    right_table: dim_customer    join_type: many_to_one    on:      - left_column: customer_id        right_column: customer_id    description: >      Every revenue event belongs to exactly one customer.      Orphan revenue records (customer_id IS NULL) are excluded      from all metrics and investigated by data quality team.metrics:  - name: total_revenue    description: >      Total recognized revenue in USD across all product lines.      Includes recurring and non-recurring. Excludes deferred      revenue, credits, and refunds (handled in separate metrics).    expression: SUM(fct_revenue.amount_usd)    certified: true    owner: finance-data-team@company.com    last_certified: "2025-12-15"  - name: recurring_revenue    description: >      Revenue from subscription contracts only (is_recurring = TRUE).      This is the numerator for MRR and ARR calculations.      Does NOT include overage charges even on subscription contracts.    expression: >      SUM(CASE WHEN fct_revenue.is_recurring = TRUE           THEN fct_revenue.amount_usd ELSE 0 END)    certified: true    owner: finance-data-team@company.com    last_certified: "2025-12-15"  - name: arr    description: >      Annual Recurring Revenue. Current month's recurring revenue      multiplied by 12. Point-in-time metric - always computed      from the most recent COMPLETE month. Do not use partial      month data. This is the company's primary growth metric      reported to the board.    expression: >      SUM(CASE WHEN fct_revenue.is_recurring = TRUE           THEN fct_revenue.amount_usd ELSE 0 END) * 12    default_filters:      - column: recognized_date        operator: between        value: "FIRST DAY OF PREVIOUS COMPLETE MONTH to LAST DAY OF PREVIOUS COMPLETE MONTH"    certified: true    owner: cfo-office@company.com    last_certified: "2025-12-01"  - name: mrr    description: >      Monthly Recurring Revenue. Same as recurring_revenue when      filtered to a single month. Reported monthly to investors.    expression: >      SUM(CASE WHEN fct_revenue.is_recurring = TRUE           THEN fct_revenue.amount_usd ELSE 0 END)    default_filters:      - column: recognized_date        operator: within        value: "MOST RECENT COMPLETE MONTH"    certified: true    owner: finance-data-team@company.com  - name: gross_margin    description: >      (Total Revenue - COGS) / Total Revenue. COGS sourced from      finance_costs semantic view. Cross-domain metric - requires      join to costs domain.    expression: >      (SUM(fct_revenue.amount_usd) - SUM(costs.cogs_amount)) /      NULLIF(SUM(fct_revenue.amount_usd), 0)    certified: true    owner: finance-data-team@company.com    dependencies:      - semantic_view: finance_costs        metric: total_cogs  - name: net_revenue_retention    description: >      NRR measures revenue retained and expanded from the existing      customer base. Trailing 12-month calculation. Denominator:      recurring revenue from customers active 12 months ago.      Numerator: current recurring revenue from those same customers.      Target: >110% for Enterprise, >100% for SMB.    expression: >      SUM(current_period_recurring) /      NULLIF(SUM(base_period_recurring), 0)    certified: true    owner: finance-data-team@company.comtime_dimensions:  - name: recognized_date    table: fct_revenue    column: recognized_date    time_grains: [day, week, month, quarter, year]    fiscal_year_start_month: 2    description: >      Primary time dimension. Fiscal year starts February 1.      Q1 = Feb-Apr, Q2 = May-Jul, Q3 = Aug-Oct, Q4 = Nov-Jan.filters:  - name: product_line    table: fct_revenue    column: product_line    description: Filter by product line (Platform, Services, Add-ons)  - name: customer_segment    table: dim_customer    column: segment    description: Filter by customer segment (Enterprise, Mid-Market, SMB)  - name: region    table: dim_customer    column: region    description: Filter by geographic region (AMER, EMEA, APAC, LATAM)  - name: industry    table: dim_customer    column: industry    description: Filter by customer industry

Why so much description text? Because Cortex Context reads these descriptions when resolving ambiguity. When a user asks “What’s our subscription revenue?”, Cortex Context needs enough context in the metric description to determine that recurring_revenue is the right metric — not total_revenue. Sparse descriptions force AI to guess.

Why explicit fiscal calendar rules? Because “Q3” means different things in different fiscal calendars. With February fiscal year start, Q3 is August through October. Without this declaration, Cortex Context would default to calendar Q3 (July-September) and return wrong numbers to the CEO.

Why declare join relationships explicitly? Because AI agents are notorious for generating incorrect multi-table joins. By declaring that fct_revenue.customer_id joins to dim_customer.customer_id as a many-to-one relationship, we prevent Cortex Context from ever generating a query that produces row duplication through a bad join.

Churn is the metric most likely to have competing definitions in your organization. Here’s how to handle it:

name: product_churndescription: >  Customer churn and retention metrics based on PRODUCT ACTIVITY.  A customer is churned if they have zero platform activity  (no logins, no API calls, no data processed) for 90 consecutive  days. This is the PRODUCT definition of churn.IMPORTANT: This differs from the SALES definition of churn  (contract non-renewal). For contract-based churn, see the  sales_churn semantic view owned by revenue-ops@company.com.  Both definitions are valid for different analytical purposes.  When in doubt, ask the user which definition they need.metrics:  - name: churned_customers    description: >      Count of customers with zero platform activity for 90+      consecutive days. Excludes: (1) customers in first 30 days      of onboarding, (2) accounts flagged as seasonal by Customer      Success, (3) accounts in planned migration/downtime.    expression: >      COUNT(DISTINCT CASE        WHEN days_since_last_activity > 90        AND days_since_signup > 30        AND NOT is_seasonal_account        AND NOT is_in_planned_downtime        THEN customer_id      END)    certified: true    owner: product-analytics@company.com    last_certified: "2025-11-20"  - name: churn_rate    description: >      Monthly gross churn rate. Denominator: customers active at      start of month. Numerator: customers who crossed the 90-day      inactivity threshold during that month. Expressed as a      percentage. Industry benchmark for B2B SaaS: <2% monthly.    expression: >      churned_customers_in_period /      NULLIF(active_customers_at_period_start, 0)    certified: true    owner: product-analytics@company.com

Notice the explicit callout: “This differs from the SALES definition.” This is intentional. When Cortex Context encounters a question about churn, it needs to understand that multiple valid definitions exist and which one to use — or when to ask the user for clarification.

A metric without certification is just an opinion with SQL. Certification is what transforms a metric from “something an analyst wrote” into “the organization’s agreed-upon truth.”

Before a metric can be certified:

CREATE OR REPLACE TABLE governance.semantic.metric_certifications (    metric_name VARCHAR NOT NULL,    semantic_view VARCHAR NOT NULL,    domain VARCHAR NOT NULL,    business_definition TEXT NOT NULL,    technical_expression TEXT NOT NULL,    certified_by VARCHAR NOT NULL,    certification_date TIMESTAMP_NTZ NOT NULL,    next_review_date TIMESTAMP_NTZ NOT NULL,    status VARCHAR NOT NULL DEFAULT 'draft',    deprecation_reason TEXT,    replacement_metric VARCHAR,    CONSTRAINT valid_status CHECK (status IN ('draft', 'in_review', 'certified', 'deprecated')));CREATE OR REPLACE TABLE governance.semantic.certification_audit (    audit_id VARCHAR DEFAULT UUID_STRING(),    metric_name VARCHAR NOT NULL,    semantic_view VARCHAR NOT NULL,    action VARCHAR NOT NULL,    performed_by VARCHAR NOT NULL,    action_timestamp TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),    previous_definition TEXT,    new_definition TEXT,    reason TEXT);

Production Reality:Certification isn’t a one-time event. It’s a cadence. We recommend quarterly reviews for all certified metrics. Market conditions change, business rules evolve, and what was true last quarter may not be true today. Build the review cadence into your governance council’s calendar from day one.

Traditional governance focuses on access: who can see which tables. Semantic governance extends this to meaning: who can define what data means, and how those definitions flow to consumers.

Level Question Traditional Approach Horizon Context Approach 1. Access Who can see the data? RBAC, row policies RBAC + semantic view permissions 2. Quality Is the data correct? DMFs, tests DMFs + semantic validation tests 3. Semantics What does it mean? Wiki pages, glossaries Certified Semantic Views 4. AI How is meaning consumed by AI? (nothing) Agent-metric binding, confidence thresholds

Most organizations operate at Levels 1–2. Horizon Context enables Levels 3–4 — which is where AI trust lives.

It’s not sufficient to control table access. You must control semantic definition access:

-- Governance role hierarchy for semantic assetsCREATE ROLE IF NOT EXISTS semantic_author;    -- Can create/modify views in devCREATE ROLE IF NOT EXISTS semantic_certifier; -- Can approve for productionCREATE ROLE IF NOT EXISTS semantic_consumer;  -- Can query certified viewsCREATE ROLE IF NOT EXISTS semantic_admin;     -- Full controlGRANT ROLE semantic_author TO ROLE analytics_engineering;GRANT ROLE semantic_certifier TO ROLE data_governance_council;GRANT ROLE semantic_consumer TO ROLE cortex_agents;GRANT ROLE semantic_admin TO ROLE data_platform_admin;

Access policies apply at the semantic layer — meaning AI agents automatically respect them without needing policy-aware prompts:

CREATE OR REPLACE ROW ACCESS POLICY governance.policies.region_accessAS (region VARCHAR) RETURNS BOOLEAN ->    IS_ROLE_IN_SESSION('FINANCE_ADMIN')    OR region = CURRENT_SESSION_CONTEXT('REGION');ALTER TABLE analytics.marts.fct_revenue    ADD ROW ACCESS POLICY governance.policies.region_access ON (region);

When the EMEA Sales Director asks the Finance Agent “What was Q3 revenue?”, they see $41.1M (EMEA only). When the CFO asks the same question, they see $142.3M (global). Same agent, same Semantic View, same metric — different access context. No prompt engineering required.

Clear ownership prevents the “everybody’s job is nobody’s job” failure mode:

| Activity             | Data Governance | Data Engineering | Analytics Engineering | BI Teams | AI Engineering | Security || -------------------- | :-------------: | :--------------: | :-------------------: | :------: | :------------: | :------: || Define Metrics       |        C        |         I        |           R           |     C    |        I       |     I    || Certify Metrics      |        C        |         I        |           I           |     R    |        I       |     I    || Build Semantic Views |        I        |         C        |           R           |     C    |        I       |     I    || Configure Connectors |        I        |         R        |           C           |     I    |        I       |     I    || Manage Lineage       |        C        |         R        |           C           |     I    |        I       |     I    || Build AI Agents      |        I        |         C        |           I           |     I    |        R       |     I    || Set Access Policies  |        C        |         I        |           I           |     I    |        I       |     R    || Monitor Usage        |        R        |         C        |           C           |     C    |        C       |     I    || Deprecate Metrics    |        A        |         I        |           R           |     C    |        I       |     I    |

Snowflake Horizon isn’t owned by a single team — it’s a team sport.

What happens: Team creates a single Semantic View with 200 metrics, 50 tables, and 30 relationships. It becomes unmaintainable. Nobody can certify it because nobody understands all of it.

Fix: Domain-oriented models. One Semantic View per business domain. Max 15–20 metrics per view. Cross-domain metrics reference other views explicitly.

What happens: Metric description says “SUM(amount_usd)” — which is the expression, not the definition. AI can’t distinguish this metric from any other sum.

Fix: Every description must answer: What’s included? What’s excluded? What are the boundary conditions? Who should use this vs. an alternative metric?

What happens: Semantic View doesn’t declare relationships. AI infers joins from column names (customer_id matches customer_id). Sometimes this produces Cartesian products on tables with composite keys.

Fix: Declare every relationship explicitly with join type (one-to-one, many-to-one, many-to-many). AI should never infer join paths.

What happens: Metric works correctly for “last month” but fails for “Q3” because fiscal calendar isn’t declared. AI defaults to calendar quarters.

Fix: Always declare fiscal year start month, supported time grains, and any custom period logic (e.g., 4–4–5 retail calendar).

What happens: Team certifies metrics before BI teams have validated them against existing dashboards. Inconsistencies emerge post-certification, eroding trust.

Fix: Certification requires sign-off from at least one consuming team. Run parallel: compute the metric from both the Semantic View and the existing BI tool. They must match within a defined tolerance before certification.

1. Version your semantic models in Git. Semantic Views are code. They need branches, pull requests, reviews, and rollback capability. Never edit production semantic views directly.

2. Test against closed books. For financial metrics, validate against the Finance team’s system of record after each month-end close. Automated CI tests should catch drift before it reaches production.

3. Deprecate gracefully. When a metric definition changes, don’t delete the old one. Mark it deprecated with a 30-day sunset window. Update the description to point to the replacement. Notify consuming teams.

4. Start narrow, go deep. It’s better to have 5 perfectly certified metrics than 50 draft metrics. Quality compounds trust. Quantity without governance compounds confusion.

5. Make AI agents ask, not guess. Configure confidence thresholds. When Cortex Context’s semantic resolution confidence is below 80% (e.g., ambiguous between two metrics), the agent should ask the user for clarification rather than choosing one silently.

Lesson Learned:The hardest conversation in semantic modeling isn’t about SQL. It’s getting the VP of Sales and the CFO to agree on one number. Plan for this. Budget calendar time for alignment meetings. The Semantic View will reflect whatever political resolution emerges — and that’s fine. The point is to make the resolution explicit and discoverable, not hidden in a spreadsheet somewhere.

Production Reality:After deploying semantic views for revenue metrics, one enterprise discovered 23 different calculated fields across Tableau workbooks that each defined “revenue” slightly differently. Seventeen of them were producing numbers that appeared in executive reports. The metadata connector surfaced these discrepancies in week one — before any AI agent was deployed. The governance value of connectors alone justified the project.

Architect Note:Don’t wait for perfect definitions to start building. Ship a “v1” Semantic View with your best understanding. Let consuming teams validate it. Iterate. Certification is the end of a process, not a prerequisite for starting. Draft views are useful — they just shouldn’t power production AI agents.

Next in Part 3: We activate the semantic layer for AI — building Cortex Agents, configuring Cortex Context resolution flows, integrating MCP for external AI systems, implementing DevOps for semantic assets, and deploying the full observability stack for production monitoring.

If this was useful — give it a clap 👏, share it with someone building on Snowflake, and subscribe to Snowflake Chronicles so you don’t miss the next one. Every piece is practitioner-first and production-focused. Let’s connect on LinkedIn.

Disclaimer: All views expressed are my own and do not reflect those of any current or former employer.

Follow Snowflake Chronicles for production-ready implementation guides, architecture deep dives, and practical lessons learned from enterprise data platforms.

Building a Trusted Semantic Layer with Snowflake Horizon Context was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @snowflake 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/building-a-trusted-s…] indexed:0 read:14min 2026-06-26 ·