{"slug": "google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not", "title": "Google Open Knowledge Format: Why Enterprise Agents Need a Knowledge Layer, Not Just More Tools", "summary": "Google Cloud's Open Knowledge Format (OKF) proposes a simple yet powerful approach to enterprise knowledge management by combining markdown content with structured metadata. The format aims to make organizational knowledge accessible to both humans and AI agents, addressing the fragmentation that hinders enterprise AI adoption. A developer argues that enterprise agentic AI frameworks require a dedicated knowledge layer, beyond models and tools, to ensure agents operate with accurate business context.", "body_md": "Most enterprise AI conversations still start in the wrong place.\n\nThey start with the model.\n\nWhich model should we use? Which framework should we adopt? Which vendor has the best agent platform? Which tools should we connect next?\n\nThese are fair questions. But in real enterprise architecture, they are not the hardest questions.\n\nThe harder question is this:\n\nCan our AI systems actually understand how our business works?\n\nThat is why Google Cloud’s article on Open Knowledge Format caught my attention. The article talks about a simple but important idea: representing knowledge in a way that humans can read and machines can use. In OKF, that means markdown for the content and structured metadata for context.\n\nAt first glance, that may sound too simple.\n\nBut that simplicity is the point.\n\nEnterprises do not need another place where knowledge goes to die. We already have enough portals, catalogs, wikis, dashboards, folders, and internal tools. What we need is a practical way to package knowledge so it can be reviewed, versioned, governed, searched, and reused by both people and AI agents.\n\nThat is where this idea becomes very relevant for agentic AI.\n\nMost organizations already have the knowledge their AI agents need.\n\nThey have it in databases, dashboards, tickets, architecture notes, runbooks, Confluence pages, data catalogs, code comments, incident reports, old project documents, and the heads of experienced employees.\n\nThe issue is not that knowledge does not exist.\n\nThe issue is that it is fragmented.\n\nSome of it is outdated. Some of it is duplicated. Some of it is tribal. Some of it is locked inside tools. Some of it is written for humans but not structured enough for AI systems to use reliably.\n\nThis becomes a serious problem when we move from AI assistants to AI agents.\n\nAn assistant can give a helpful answer. An agent does more. It plans, selects tools, queries systems, executes steps, generates outputs, and sometimes triggers workflows.\n\nThat means the cost of wrong context is much higher.\n\nA data agent may know how to generate SQL. But does it know which table is the source of truth?\n\nA finance agent may calculate revenue. But does it know whether the business means booked revenue, invoiced revenue, recognized revenue, or collected cash?\n\nA support agent may summarize a customer case. But does it know what customer information must be masked before anything is shared externally?\n\nA delivery agent may review project status. But does it understand governance rules, escalation paths, release gates, and dependency risks?\n\nA cloud cost agent may recommend savings. But does it know which environments are production-critical and which ones are safe to shut down?\n\nWithout this context, agents do not become enterprise-ready. They become fast, confident, and risky.\n\nOne common mistake in agentic AI is assuming that more tool access means better capability.\n\nConnect the database.\n\nConnect the CRM.\n\nConnect the ticketing system.\n\nConnect the cloud APIs.\n\nConnect the document repository.\n\nConnect the workflow engine.\n\nThis improves reach, but not necessarily judgment.\n\nAn agent with many tools and weak context can still choose the wrong source, apply the wrong rule, query the wrong table, expose the wrong field, or automate the wrong step.\n\nThat is why I believe every serious enterprise agentic AI framework needs three layers:\n\nMost teams are investing heavily in the first two. They are testing models, orchestration frameworks, prompts, tools, APIs, and agent workflows.\n\nThat work is needed.\n\nBut the third layer is where enterprise differentiation will come from.\n\nThe model can be changed.\n\nThe tools can be integrated.\n\nBut the organization’s internal knowledge — its definitions, operating rules, business logic, exceptions, ownership, architecture, and lessons learned — is unique.\n\nThat is the real asset.\n\nWhat I like about the Open Knowledge Format idea is that it does not overcomplicate the problem.\n\nIt treats knowledge as something that should be readable, portable, structured, and maintainable.\n\nMarkdown makes it easy for humans to read and contribute. Structured metadata makes it easier for systems and agents to classify, retrieve, and use the knowledge. Version control makes review and audit possible.\n\nThis matters because traditional documentation is passive.\n\nSomeone writes it. Someone may read it. Eventually, it becomes stale.\n\nAgentic AI needs active knowledge.\n\nThe knowledge has to be available at runtime. It should help the agent decide what to do, what not to do, which source to trust, which rule to apply, and when to escalate.\n\nA database schema may say that a table has a column called `segment`\n\n.\n\nThat is useful, but not enough.\n\nThe agent also needs to know:\n\n`segment`\n\nmean?This is the gap between data access and enterprise intelligence.\n\nIn our agentic AI framework, I would treat an OKF-like structure as the Enterprise Knowledge Layer.\n\nThis layer should sit between enterprise systems and agent execution.\n\nThe agent should not jump directly from a user request to a tool call. That is where many mistakes happen.\n\nA better flow is:\n\n```\nUser request\n   ↓\nAgent identifies intent and domain\n   ↓\nAgent retrieves relevant knowledge\n   ↓\nAgent checks source of truth, ownership, caveats, access rules, and usage guidance\n   ↓\nAgent plans the action\n   ↓\nAgent calls the right tool\n   ↓\nAgent produces the answer or executes the workflow\n   ↓\nAgent proposes a knowledge update if reusable learning is found\n```\n\nThis changes the quality of execution.\n\nTake a simple question:\n\n“Show me revenue by customer segment.”\n\nA weak agent will search for tables with names like `revenue`\n\n, `customer`\n\n, and `segment`\n\n, then generate SQL.\n\nA stronger enterprise agent will first check the knowledge layer:\n\nOnly after that should it query the database.\n\nThat is the difference between automation and governed intelligence.\n\nFor AWS SQL environments such as Amazon RDS, Aurora, and Redshift, the starting point is metadata extraction.\n\nWe can automatically extract:\n\nAWS Glue Crawlers and the Glue Data Catalog can help discover and centralize metadata. Database-native sources like `information_schema`\n\ncan provide table and column-level structure.\n\nBut metadata is not knowledge.\n\nA pipeline can discover that a table has a column called `revenue_amount`\n\n. It cannot automatically know whether that means booked revenue, recognized revenue, invoiced revenue, or pipeline value. That meaning has to come from finance, sales operations, data owners, or approved documentation.\n\nSo OKF generation should be semi-automated.\n\nTechnical metadata should be generated automatically. Business meaning should be reviewed and approved by the right domain owners.\n\nFor every critical SQL table, the knowledge file should capture:\n\nA table-level knowledge file should not simply describe the table. It should tell an agent how to use that table safely.\n\nExample:\n\n```\n---\nid: sales.crm.customer_account\nkind: table\nsystem: aurora-postgresql\ncloud: aws\ndatabase: crm_prod\nschema: sales\ntable: customer_account\ndomain: sales\nowner: revenue-operations\nsteward: data-platform-team\nenvironment: production\nclassification: confidential\npii: true\nfreshness_sla: \"15 minutes\"\nsource_system: salesforce\nagent_usage: allowed_with_row_level_controls\napproval_status: approved\n---\n\n# customer_account\n\n## Business Meaning\n\nThis table represents customer account records synchronized from Salesforce into the CRM production database.\n\nIt is the approved source for account ownership, account segment, customer lifecycle stage, and sales territory mapping.\n\n## Agent Usage Guidance\n\nUse this table for customer account analysis, sales ownership, account segmentation, and lifecycle stage reporting.\n\nDo not use this table for audited revenue reporting, invoice reconciliation, or financial close reporting.\n\nFor revenue reporting, use the approved finance revenue table.\n\n## Important Caveats\n\nSome legacy accounts may have missing segment values. Agents must not infer missing segment values without confirmation.\n\nThis table contains confidential customer information. Agents must apply row-level access controls and masking rules where applicable.\n```\n\nThis is much more useful than a raw schema.\n\nThe schema tells the agent what exists.\n\nThe knowledge file tells the agent how to use it.\n\nNoSQL systems need even more care.\n\nIn DynamoDB, the table name and attributes rarely tell the full story. The real design is usually in the access patterns.\n\nA DynamoDB table may store multiple entity types. It may use composite keys. It may depend on global secondary indexes. It may be optimized for specific queries and unsuitable for others.\n\nIf an agent does not understand this, it can misuse the table, trigger inefficient scans, produce incomplete answers, or misunderstand the business process.\n\nFor DynamoDB, the knowledge file should capture:\n\nThe most important part is access pattern documentation.\n\nFor example:\n\n```\nAccess pattern 1:\nGet all events for an order\nPK = order_id\nSK = event_timestamp\n\nAccess pattern 2:\nGet latest order status\nPK = order_id\nSort descending by event_timestamp\nLimit = 1\n\nAccess pattern 3:\nInvestigate failed payment events\nUse event_type index if available\n```\n\nThis tells the agent how the table is actually meant to be used.\n\nExample:\n\n```\n---\nid: commerce.dynamodb.order_events\nkind: nosql_table\nsystem: dynamodb\ncloud: aws\ntable: order_events\ndomain: commerce\nowner: order-platform-team\nenvironment: production\npartition_key: order_id\nsort_key: event_timestamp\nbilling_mode: PAY_PER_REQUEST\nstream_enabled: true\nclassification: confidential\npii: false\nagent_usage: allowed_read_only\napproval_status: approved\n---\n\n# order_events\n\n## Business Meaning\n\nThis table stores the event history of customer orders.\n\nEach item represents an event in the order lifecycle, such as order created, payment completed, shipment initiated, shipment delivered, cancellation requested, or refund completed.\n\n## Primary Access Pattern\n\nRetrieve the event timeline for a specific order.\n\nPK = order_id  \nSK = event_timestamp\n\n## Agent Usage Guidance\n\nAgents should use this table to reconstruct order history, check operational status, and investigate order workflow issues.\n\nAgents should not use this table as the financial source of truth for revenue, refunds, or payment settlement.\n\n## Caveats\n\nThis table is append-only.\n\nThe latest operational event should not be treated as financial completion. For accounting status, agents must check the finance ledger.\n```\n\nThis prevents an agent from treating NoSQL like a normal relational model.\n\nFor document databases such as Amazon DocumentDB or MongoDB-compatible systems, the main challenge is flexible structure and nested sensitive data.\n\nA support case document may contain customer messages. A customer profile may contain personal data. A workflow document may include internal comments, commercial terms, or escalation notes.\n\nAgents need clear rules before reading, summarizing, or exposing this type of content.\n\nFor document collections, the knowledge file should capture:\n\nExample:\n\n```\n---\nid: support.documentdb.customer_cases\nkind: document_collection\nsystem: documentdb\ncloud: aws\ndatabase: support_prod\ncollection: customer_cases\ndomain: customer-support\nowner: support-platform-team\nenvironment: production\nclassification: restricted\npii: true\nagent_usage: allowed_with_masking\napproval_status: approved\n---\n\n# customer_cases\n\n## Business Meaning\n\nThis collection stores customer support cases raised through web, email, account manager, and internal escalation channels.\n\nIt is used to view case history, identify recurring issues, and prepare escalation summaries.\n\n## Agent Usage Guidance\n\nAgents can use this collection for internal case summaries, issue classification, support briefings, and next-action recommendations.\n\nAgents must not expose raw customer messages externally without masking sensitive information.\n\n## Sensitive Fields\n\n- customer_email\n- phone_number\n- messages.message\n- account_id\n- internal_notes\n\n## Caveats\n\nFree-text messages may contain sensitive personal or commercial information. Agents should summarize the issue and intent rather than copying raw text.\n```\n\nThis is not just documentation. It is a guardrail.\n\nI would not make this a manual documentation exercise.\n\nThat will not scale.\n\nI would build a pipeline that generates draft knowledge files automatically, then routes critical content for human review.\n\nA practical AWS-based pipeline would look like this:\n\n```\nAWS data sources\n   ↓\nMetadata extraction\n   ↓\nProfiling and classification\n   ↓\nBusiness enrichment\n   ↓\nOKF draft generation\n   ↓\nHuman review and approval\n   ↓\nGit-based version control\n   ↓\nIndexing into agent knowledge retrieval\n   ↓\nAgent execution\n   ↓\nFeedback loop\n```\n\nThe extraction layer would pull from:\n\nThe enrichment layer would add:\n\nThe governance layer would make sure the knowledge is trusted before agents rely on it.\n\nThis is important.\n\nIf we automate everything without review, we risk creating wrong knowledge at scale.\n\nIf we manually write everything, we will never scale.\n\nThe practical answer is auto-generation with human governance.\n\nEnterprise agents need trust boundaries.\n\nEvery knowledge file should have ownership and lifecycle metadata.\n\nAt minimum:\n\n```\nowner: finance-operations\nsteward: data-platform-team\nclassification: confidential\napproval_status: approved\nagent_usage: allowed_internal_only\nlast_reviewed_at: \"2026-06-18\"\nnext_review_due: \"2026-09-18\"\nconfidence: high\n```\n\nThis gives the framework accountability.\n\nIf an agent uses a revenue definition, we should know who approved it.\n\nIf an agent queries a customer table, we should know whether it contains PII.\n\nIf an agent summarizes a support case, we should know what masking rules apply.\n\nGovernance is not bureaucracy here.\n\nGovernance is what allows agentic AI to move from demo to production.\n\nThe best part of this approach is that the knowledge layer can improve over time.\n\nAgents should not only consume knowledge. They should help identify gaps in it.\n\nFor example:\n\nBut agents should not directly overwrite approved knowledge.\n\nThey should propose updates.\n\nThe workflow should be:\n\n```\nAgent identifies reusable learning\n   ↓\nAgent creates OKF update proposal\n   ↓\nDomain owner reviews\n   ↓\nApproved change is merged\n   ↓\nKnowledge index is refreshed\n   ↓\nFuture agents use improved context\n```\n\nThis turns agentic AI into a learning operating model, not just task automation.\n\nI would not start by documenting the whole enterprise.\n\nThat sounds ambitious, but it is usually a bad execution plan.\n\nIt becomes a documentation program, not an AI acceleration program.\n\nI would start with one high-value domain where correctness matters.\n\nGood candidates are:\n\nFor the first MVP, I would select 10 to 20 high-value datasets and generate knowledge files around them.\n\nThe MVP should include:\n\nThe goal is not to create perfect documentation.\n\nThe goal is to make agents more accurate, more governed, and more useful.\n\nThe wrong metric is “number of OKF files created.”\n\nThat only measures documentation volume.\n\nThe right metrics are:\n\nThe question should always be:\n\nDid the knowledge layer make the agent better?\n\nIf not, we are just creating another documentation repository.\n\nGoogle Open Knowledge Format is not interesting because markdown and YAML are new.\n\nThey are not.\n\nIt is interesting because it points to one of the most important problems in enterprise AI: how to make organizational knowledge usable by agents without locking it inside one platform.\n\nIn an AWS environment, we already have many of the raw signals: RDS schemas, Aurora metadata, Redshift catalogs, DynamoDB keys and indexes, Glue Data Catalog, S3 exports, CloudWatch metrics, tags, IAM policies, Lake Formation rules, and existing documentation.\n\nThe opportunity is to convert these scattered signals into a governed Enterprise Knowledge Layer.\n\nThat layer becomes the memory and context foundation for agentic AI.\n\nMy view is simple:\n\nModels give agents reasoning power.\n\nTools give agents execution power.\n\nKnowledge gives agents enterprise judgment.\n\nWithout that knowledge layer, agentic AI will remain impressive in demos and fragile in production.\n\nWith it, enterprises can build agents that do not just act fast, but act correctly, safely, and in alignment with how the business actually works.", "url": "https://wpnews.pro/news/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not", "canonical_source": "https://dev.to/aws-builders/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not-just-more-tools-je1", "published_at": "2026-06-18 06:41:06+00:00", "updated_at": "2026-06-18 06:51:28.254547+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "large-language-models", "ai-infrastructure", "developer-tools"], "entities": ["Google Cloud", "Open Knowledge Format", "OKF"], "alternates": {"html": "https://wpnews.pro/news/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not", "markdown": "https://wpnews.pro/news/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not.md", "text": "https://wpnews.pro/news/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not.txt", "jsonld": "https://wpnews.pro/news/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not.jsonld"}}