Open Knowledge Format: I Already Do This, and Now It Has a Name

wpnews.pro

TLDR #

Google Cloud shipped the Open Knowledge Format (OKF): a directory of Markdown files with YAML frontmatter that any agent can read and any human can edit. No SDK, no runtime, no database. - I already do something like this in every coding repo. Markdown files the agent reads before it works, frontmatter on top, linked together. OKF is that pattern with a name and a tiny spec.
The part I actually like is the frontmatter. One required field (type

), a handful of optional ones, and now the files are queryable instead of just readable. - This is semantic memory for agents, standardized. It’s the same idea asCLAUDE.md

andAGENTS.md

, pushed one step toward “portable between tools.” - A format is worth more than another knowledge service. You can write OKF by hand today without installing anything, and that’s the whole point.

I already keep my knowledge in Markdown #

I don’t write much code by hand anymore. I delegate to a coding agent and review the output. And the single thing that decides whether that goes well is what the agent knows about my project before it starts.

So every repo I work in has the same shape. A CLAUDE.md

at the root. A docs/

folder with architecture notes, conventions, gotchas, the stuff that lives in the heads of the people who’ve been here a while. The agent reads those files, then does the work. When it learns something new, it writes it back. Plain Markdown, checked into git, sitting next to the code it describes.

This works really well. It’s simple, it’s versioned, and I can read it myself in any editor. No vendor, no lock-in, no “open the catalog UI to see the metadata.”

And I’ve gone further than a bare docs/

folder. On some projects I ran brainmaxxing: a Markdown vault the agent reads at session start and writes back to when it learns something. It ships a little learning loop, /reflect

after a session, /ruminate

over old conversations, /meditate

to prune stale notes, and because the vault is also an Obsidian vault, I can open the agent’s memory and read it like my own notes. The OKF post names this exact species in passing: “Obsidian vaults wired to coding agents.” It worked really well. But it was bespoke. My folder names, my conventions, my skills, none of it was built to travel to your repo or a different agent.

The thing I never had was a format. My docs/

folder and your docs/

folder both look like Markdown, but they don’t agree on anything. What fields go on top of a file? What does index.md

mean? Nothing was written down. It just worked because I was the only one reading it.

That’s the gap Open Knowledge Format is trying to fill, and after reading the spec I think it’s a good move.

What OKF actually is #

OKF is a directory of Markdown files. Each file is one concept: a table, a dataset, a metric, a runbook, an API, whatever you want to capture. The file path is the concept’s identity.

sales/
├── index.md
├── datasets/
│   └── orders_db.md
├── tables/
│   ├── orders.md
│   └── customers.md
└── metrics/
    └── weekly_active_users.md

Every concept file has a small block of YAML frontmatter and a Markdown body. The frontmatter holds the few fields that need to be queryable; the body holds everything else.

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?...&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---


| Column        | Type   | Description                              |
|---------------|--------|------------------------------------------|
| `order_id`    | STRING | Globally unique order identifier.        |
| `customer_id` | STRING | FK to [customers](/tables/customers.md). |


Joined with [customers](/tables/customers.md) on `customer_id`.

That’s it. Concepts link to each other with normal Markdown links, which turns the folder into a graph of relationships, not just a tree. Two filenames are reserved: index.md

for an overview at each level, and log.md

for a running history of changes. The whole v0.1 spec fits on one page.

If you’ve used Obsidian, or written a CLAUDE.md

, or kept an AGENTS.md

, the shape is immediately familiar. That’s deliberate. OKF isn’t inventing a new thing, it’s writing down the small set of conventions that make the existing thing interoperable.

Note

OKF v0.1 comes from the Google Cloud Data Cloud team, and the reference tooling is data-flavored: an agent that walks a BigQuery dataset and drafts a concept file per table, plus a static HTML visualizer. Ignore the BigQuery framing if it’s not your world. The format is just Markdown and frontmatter, and nothing about it is tied to a database or a cloud.

The frontmatter is the part I like #

The body of these files I already write. Markdown is Markdown. The piece OKF adds that I didn’t have is the agreed-upon frontmatter.

OKF requires exactly one field on every concept: type

. Everything else is optional: title

, description

, resource

, tags

, timestamp

. That’s a good ratio. One thing you must do, a handful of things you can do, and the body is yours.

Why does this matter? Because frontmatter makes the files queryable without parsing prose. An agent can filter “every concept where type: metric

” or “everything tagged revenue

” or “what changed since this timestamp

” without reading the bodies first. It’s the same trick I use on this blog. Every post has frontmatter with title

, description

, tags

, pubDatetime

, and that’s what powers the tag pages and the related-posts logic. The body is for humans; the frontmatter is for the machine.

I keep coming back to one line from the OKF principles: it defines the interoperability surface, not the content model. It tells you the file must declare a type

. It does not tell you what types exist or what your tables should say. That restraint is exactly right. The places I’ve seen “metadata standards” fail are the ones that try to model your whole domain for you.

This is semantic memory, standardized #

If you read my post on the four types of memory for AI agents The Four Types of Memory for AI Agents (and How Claude Code Implements Each) Working, semantic, procedural, episodic. The CoALA framework splits agent memory into four kinds. Here is what each one is, and how Claude Code actually implements them on disk. , OKF slots in cleanly. It’s semantic memory: the agent’s stable knowledge about your project that’s true independent of any single session. Facts, conventions, schemas, join paths. Loaded in so the agent doesn’t relearn your project every single time.

The academic version of semantic memory always reaches for vector databases and knowledge graphs. But in practice, the thing that actually ships is Markdown files. OKF agrees, and then goes one further: it says the Markdown should carry frontmatter and link to other Markdown, so the flat pile of facts becomes a navigable graph.

The two reserved filenames map onto patterns I already lean on:

is progressive disclosure. The agent reads the index for a folder, sees what’s there, and only opens the specific concept files it needs. That’s the same move I argued for inindex.md

Stop Bloating Your CLAUDE.mdStop Bloating Your CLAUDE.md: Progressive Disclosure for AI Coding Tools AI coding tools are stateless—every session starts fresh. The solution isn't cramming everything into CLAUDE.md, but building a layered context system where learnings accumulate in docs and specialized agents load on-demand. : don’t dump everything into context, give the agent a lightweight map and let it pull the detail on demand.is a slice oflog.md

episodic memory. A chronological record of what changed and when, sitting right next to the knowledge it describes.

So OKF isn’t a new idea so much as the convergence point of three things I already do: a CLAUDE.md

-style knowledge file, progressive disclosure through an index, and frontmatter for queryability. Seeing them written down as one small spec is what makes it click.

Tip

Andrej Karpathy framed the why of this whole pattern best in his LLM Wiki gist: “LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in one pass.” The bookkeeping that makes humans abandon their personal wikis is exactly the work LLMs are good at. Let the agent maintain the knowledge base; you curate it.

A format beats another service #

The OKF post makes one argument I fully agree with: the answer here is a format, not a platform.

A format is something anyone can produce without an SDK, anyone can consume without an integration, and that survives moving between tools and orgs because it’s just files in version control. A platform is something you log into, that owns your data, that you integrate against. Every catalog vendor today ships the second thing: its own API, its own schema, its own export format, and your knowledge is locked behind whichever surface created it.

OKF is the first thing. A bundle of OKF documents is just Markdown, just files, just YAML. Readable in any editor, renderable on GitHub, shippable as a tarball, mountable on any filesystem. The producer and the consumer are independently swappable: a human writes a bundle, an agent reads it, or an export pipeline writes it and a visualizer reads it. The file is the contract.

This is why I’m bullish even though the obvious objection is right there. Yes, it’s “another standard,” and yes, it comes from a giant cloud vendor. But the value of a knowledge format comes from how many parties speak it, not from who wrote it down first. And the cost of trying it is basically zero, because you don’t install anything. You rename a couple of conventions in a folder you probably already have.

What I’d actually do with it #

I’m not going to rewrite my repos overnight, and you shouldn’t either. But here’s what’s worth doing today:

Add frontmatter to the docs you already keep. Even justtype

,title

,description

,tags

,timestamp

. It costs nothing and makes the files queryable.Add an Give the agent a map before it opens individual files. This pays off immediately, format or no format.index.md

per folder.Keep a A running changelog the agent reads and appends to.log.md

for the areas that change a lot.

None of that requires “adopting OKF.” It’s just good hygiene for a Markdown knowledge base that an agent reads. The format is what lets your folder and my folder cooperate later, if the ecosystem of producers and consumers actually shows up. That’s the open question. v0.1 is a starting point, not a finished standard, and a format only earns its name if enough tools speak it.

Wrapping up #

OKF is a directory of Markdown files with YAML frontmatter, designed as portable knowledge for AI agents. No SDK, no runtime, lives in git.
I already run this pattern in every repo. The new part is having an agreed format, especially the frontmatter, which makes the files queryable instead of just readable. - It’s semantic memory for agents, withindex.md

for progressive disclosure andlog.md

for history. The same moves I use in myCLAUDE.md

setup, written down. - A format beats another knowledge service, because anyone can produce and consume it without an integration. The catch is adoption: a format is only worth as much as the number of tools that speak it.

If you keep an AGENTS.md

or a docs/

folder your agent reads before it works, you’re already 90% of the way to an OKF bundle. Throw some frontmatter on top and you’re there. Are you keeping your agent’s knowledge as plain files yet, or is it still locked in someone’s catalog?

source & further reading

alexop.dev — original article How VS Code Copilot Chat Compacts Your Conversation The Four Types of Memory for AI Agents (and How Claude Code Implements Each) Build Your Own Eval Harness from Scratch with Bun and Claude -p