cd /news/artificial-intelligence/google-open-knowledge-format-why-ent… · home topics artificial-intelligence article
[ARTICLE · art-32205] src=dev.to ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Google Open Knowledge Format: Why Enterprise Agents Need a Knowledge Layer, Not Just More Tools

Google Cloud's Open Knowledge Format (OKF) proposes a simple yet powerful approach to enterprise knowledge management by combining markdown content with structured metadata. The format aims to make organizational knowledge accessible to both humans and AI agents, addressing the fragmentation that hinders enterprise AI adoption. A developer argues that enterprise agentic AI frameworks require a dedicated knowledge layer, beyond models and tools, to ensure agents operate with accurate business context.

read12 min views4 publishedJun 18, 2026

Most enterprise AI conversations still start in the wrong place.

They start with the model.

Which model should we use? Which framework should we adopt? Which vendor has the best agent platform? Which tools should we connect next?

These are fair questions. But in real enterprise architecture, they are not the hardest questions.

The harder question is this:

Can our AI systems actually understand how our business works?

That is why Google Cloud’s article on Open Knowledge Format caught my attention. The article talks about a simple but important idea: representing knowledge in a way that humans can read and machines can use. In OKF, that means markdown for the content and structured metadata for context.

At first glance, that may sound too simple.

But that simplicity is the point.

Enterprises do not need another place where knowledge goes to die. We already have enough portals, catalogs, wikis, dashboards, folders, and internal tools. What we need is a practical way to package knowledge so it can be reviewed, versioned, governed, searched, and reused by both people and AI agents.

That is where this idea becomes very relevant for agentic AI.

Most organizations already have the knowledge their AI agents need.

They have it in databases, dashboards, tickets, architecture notes, runbooks, Confluence pages, data catalogs, code comments, incident reports, old project documents, and the heads of experienced employees.

The issue is not that knowledge does not exist.

The issue is that it is fragmented.

Some of it is outdated. Some of it is duplicated. Some of it is tribal. Some of it is locked inside tools. Some of it is written for humans but not structured enough for AI systems to use reliably.

This becomes a serious problem when we move from AI assistants to AI agents.

An assistant can give a helpful answer. An agent does more. It plans, selects tools, queries systems, executes steps, generates outputs, and sometimes triggers workflows.

That means the cost of wrong context is much higher.

A data agent may know how to generate SQL. But does it know which table is the source of truth?

A finance agent may calculate revenue. But does it know whether the business means booked revenue, invoiced revenue, recognized revenue, or collected cash?

A support agent may summarize a customer case. But does it know what customer information must be masked before anything is shared externally?

A delivery agent may review project status. But does it understand governance rules, escalation paths, release gates, and dependency risks?

A cloud cost agent may recommend savings. But does it know which environments are production-critical and which ones are safe to shut down?

Without this context, agents do not become enterprise-ready. They become fast, confident, and risky.

One common mistake in agentic AI is assuming that more tool access means better capability.

Connect the database.

Connect the CRM.

Connect the ticketing system.

Connect the cloud APIs.

Connect the document repository.

Connect the workflow engine.

This improves reach, but not necessarily judgment.

An agent with many tools and weak context can still choose the wrong source, apply the wrong rule, query the wrong table, expose the wrong field, or automate the wrong step.

That is why I believe every serious enterprise agentic AI framework needs three layers:

Most teams are investing heavily in the first two. They are testing models, orchestration frameworks, prompts, tools, APIs, and agent workflows.

That work is needed.

But the third layer is where enterprise differentiation will come from.

The model can be changed.

The tools can be integrated.

But the organization’s internal knowledge — its definitions, operating rules, business logic, exceptions, ownership, architecture, and lessons learned — is unique.

That is the real asset.

What I like about the Open Knowledge Format idea is that it does not overcomplicate the problem.

It treats knowledge as something that should be readable, portable, structured, and maintainable.

Markdown makes it easy for humans to read and contribute. Structured metadata makes it easier for systems and agents to classify, retrieve, and use the knowledge. Version control makes review and audit possible.

This matters because traditional documentation is passive.

Someone writes it. Someone may read it. Eventually, it becomes stale.

Agentic AI needs active knowledge.

The knowledge has to be available at runtime. It should help the agent decide what to do, what not to do, which source to trust, which rule to apply, and when to escalate.

A database schema may say that a table has a column called segment

.

That is useful, but not enough.

The agent also needs to know:

segment

mean?This is the gap between data access and enterprise intelligence.

In our agentic AI framework, I would treat an OKF-like structure as the Enterprise Knowledge Layer.

This layer should sit between enterprise systems and agent execution.

The agent should not jump directly from a user request to a tool call. That is where many mistakes happen.

A better flow is:

User request
   ↓
Agent identifies intent and domain
   ↓
Agent retrieves relevant knowledge
   ↓
Agent checks source of truth, ownership, caveats, access rules, and usage guidance
   ↓
Agent plans the action
   ↓
Agent calls the right tool
   ↓
Agent produces the answer or executes the workflow
   ↓
Agent proposes a knowledge update if reusable learning is found

This changes the quality of execution.

Take a simple question:

“Show me revenue by customer segment.”

A weak agent will search for tables with names like revenue

, customer

, and segment

, then generate SQL.

A stronger enterprise agent will first check the knowledge layer:

Only after that should it query the database.

That is the difference between automation and governed intelligence.

For AWS SQL environments such as Amazon RDS, Aurora, and Redshift, the starting point is metadata extraction.

We can automatically extract:

AWS Glue Crawlers and the Glue Data Catalog can help discover and centralize metadata. Database-native sources like information_schema

can provide table and column-level structure.

But metadata is not knowledge.

A pipeline can discover that a table has a column called revenue_amount

. It cannot automatically know whether that means booked revenue, recognized revenue, invoiced revenue, or pipeline value. That meaning has to come from finance, sales operations, data owners, or approved documentation.

So OKF generation should be semi-automated.

Technical metadata should be generated automatically. Business meaning should be reviewed and approved by the right domain owners.

For every critical SQL table, the knowledge file should capture:

A table-level knowledge file should not simply describe the table. It should tell an agent how to use that table safely.

Example:

---
id: sales.crm.customer_account
kind: table
system: aurora-postgresql
cloud: aws
database: crm_prod
schema: sales
table: customer_account
domain: sales
owner: revenue-operations
steward: data-platform-team
environment: production
classification: confidential
pii: true
freshness_sla: "15 minutes"
source_system: salesforce
agent_usage: allowed_with_row_level_controls
approval_status: approved
---


## Business Meaning

This table represents customer account records synchronized from Salesforce into the CRM production database.

It is the approved source for account ownership, account segment, customer lifecycle stage, and sales territory mapping.

## Agent Usage Guidance

Use this table for customer account analysis, sales ownership, account segmentation, and lifecycle stage reporting.

Do not use this table for audited revenue reporting, invoice reconciliation, or financial close reporting.

For revenue reporting, use the approved finance revenue table.

## Important Caveats

Some legacy accounts may have missing segment values. Agents must not infer missing segment values without confirmation.

This table contains confidential customer information. Agents must apply row-level access controls and masking rules where applicable.

This is much more useful than a raw schema.

The schema tells the agent what exists.

The knowledge file tells the agent how to use it.

NoSQL systems need even more care.

In DynamoDB, the table name and attributes rarely tell the full story. The real design is usually in the access patterns.

A DynamoDB table may store multiple entity types. It may use composite keys. It may depend on global secondary indexes. It may be optimized for specific queries and unsuitable for others.

If an agent does not understand this, it can misuse the table, trigger inefficient scans, produce incomplete answers, or misunderstand the business process.

For DynamoDB, the knowledge file should capture:

The most important part is access pattern documentation.

For example:

Access pattern 1:
Get all events for an order
PK = order_id
SK = event_timestamp

Access pattern 2:
Get latest order status
PK = order_id
Sort descending by event_timestamp
Limit = 1

Access pattern 3:
Investigate failed payment events
Use event_type index if available

This tells the agent how the table is actually meant to be used.

Example:

---
id: commerce.dynamodb.order_events
kind: nosql_table
system: dynamodb
cloud: aws
table: order_events
domain: commerce
owner: order-platform-team
environment: production
partition_key: order_id
sort_key: event_timestamp
billing_mode: PAY_PER_REQUEST
stream_enabled: true
classification: confidential
pii: false
agent_usage: allowed_read_only
approval_status: approved
---


## Business Meaning

This table stores the event history of customer orders.

Each item represents an event in the order lifecycle, such as order created, payment completed, shipment initiated, shipment delivered, cancellation requested, or refund completed.

## Primary Access Pattern

Retrieve the event timeline for a specific order.

PK = order_id  
SK = event_timestamp

## Agent Usage Guidance

Agents should use this table to reconstruct order history, check operational status, and investigate order workflow issues.

Agents should not use this table as the financial source of truth for revenue, refunds, or payment settlement.

## Caveats

This table is append-only.

The latest operational event should not be treated as financial completion. For accounting status, agents must check the finance ledger.

This prevents an agent from treating NoSQL like a normal relational model.

For document databases such as Amazon DocumentDB or MongoDB-compatible systems, the main challenge is flexible structure and nested sensitive data.

A support case document may contain customer messages. A customer profile may contain personal data. A workflow document may include internal comments, commercial terms, or escalation notes.

Agents need clear rules before reading, summarizing, or exposing this type of content.

For document collections, the knowledge file should capture:

Example:

---
id: support.documentdb.customer_cases
kind: document_collection
system: documentdb
cloud: aws
database: support_prod
collection: customer_cases
domain: customer-support
owner: support-platform-team
environment: production
classification: restricted
pii: true
agent_usage: allowed_with_masking
approval_status: approved
---


## Business Meaning

This collection stores customer support cases raised through web, email, account manager, and internal escalation channels.

It is used to view case history, identify recurring issues, and prepare escalation summaries.

## Agent Usage Guidance

Agents can use this collection for internal case summaries, issue classification, support briefings, and next-action recommendations.

Agents must not expose raw customer messages externally without masking sensitive information.

## Sensitive Fields

- customer_email
- phone_number
- messages.message
- account_id
- internal_notes

## Caveats

Free-text messages may contain sensitive personal or commercial information. Agents should summarize the issue and intent rather than copying raw text.

This is not just documentation. It is a guardrail.

I would not make this a manual documentation exercise.

That will not scale.

I would build a pipeline that generates draft knowledge files automatically, then routes critical content for human review.

A practical AWS-based pipeline would look like this:

AWS data sources
   ↓
Metadata extraction
   ↓
Profiling and classification
   ↓
Business enrichment
   ↓
OKF draft generation
   ↓
Human review and approval
   ↓
Git-based version control
   ↓
Indexing into agent knowledge retrieval
   ↓
Agent execution
   ↓
Feedback loop

The extraction layer would pull from:

The enrichment layer would add:

The governance layer would make sure the knowledge is trusted before agents rely on it.

This is important.

If we automate everything without review, we risk creating wrong knowledge at scale.

If we manually write everything, we will never scale.

The practical answer is auto-generation with human governance.

Enterprise agents need trust boundaries.

Every knowledge file should have ownership and lifecycle metadata.

At minimum:

owner: finance-operations
steward: data-platform-team
classification: confidential
approval_status: approved
agent_usage: allowed_internal_only
last_reviewed_at: "2026-06-18"
next_review_due: "2026-09-18"
confidence: high

This gives the framework accountability.

If an agent uses a revenue definition, we should know who approved it.

If an agent queries a customer table, we should know whether it contains PII.

If an agent summarizes a support case, we should know what masking rules apply.

Governance is not bureaucracy here.

Governance is what allows agentic AI to move from demo to production.

The best part of this approach is that the knowledge layer can improve over time.

Agents should not only consume knowledge. They should help identify gaps in it.

For example:

But agents should not directly overwrite approved knowledge.

They should propose updates.

The workflow should be:

Agent identifies reusable learning
   ↓
Agent creates OKF update proposal
   ↓
Domain owner reviews
   ↓
Approved change is merged
   ↓
Knowledge index is refreshed
   ↓
Future agents use improved context

This turns agentic AI into a learning operating model, not just task automation.

I would not start by documenting the whole enterprise.

That sounds ambitious, but it is usually a bad execution plan.

It becomes a documentation program, not an AI acceleration program.

I would start with one high-value domain where correctness matters.

Good candidates are:

For the first MVP, I would select 10 to 20 high-value datasets and generate knowledge files around them.

The MVP should include:

The goal is not to create perfect documentation.

The goal is to make agents more accurate, more governed, and more useful.

The wrong metric is “number of OKF files created.”

That only measures documentation volume.

The right metrics are:

The question should always be:

Did the knowledge layer make the agent better?

If not, we are just creating another documentation repository.

Google Open Knowledge Format is not interesting because markdown and YAML are new.

They are not.

It is interesting because it points to one of the most important problems in enterprise AI: how to make organizational knowledge usable by agents without locking it inside one platform.

In an AWS environment, we already have many of the raw signals: RDS schemas, Aurora metadata, Redshift catalogs, DynamoDB keys and indexes, Glue Data Catalog, S3 exports, CloudWatch metrics, tags, IAM policies, Lake Formation rules, and existing documentation.

The opportunity is to convert these scattered signals into a governed Enterprise Knowledge Layer.

That layer becomes the memory and context foundation for agentic AI.

My view is simple:

Models give agents reasoning power.

Tools give agents execution power.

Knowledge gives agents enterprise judgment.

Without that knowledge layer, agentic AI will remain impressive in demos and fragile in production.

With it, enterprises can build agents that do not just act fast, but act correctly, safely, and in alignment with how the business actually works.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @google cloud 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/google-open-knowledg…] indexed:0 read:12min 2026-06-18 ·