{"slug": "googles-specification-and-tooling-for-the-llm-wiki", "title": "Googles specification (and tooling) for the LLM wiki", "summary": "Google has released the Open Knowledge Format (OKF), a vendor-neutral specification for representing knowledge as plain markdown files with YAML frontmatter, along with tooling including a reference agent and visualizer. The format aims to enable human and AI collaboration on knowledge artifacts using standard software engineering workflows like version control and pull requests.", "body_md": "This repository is primarily about the[Open Knowledge Format (OKF)].OKF is a\n\nuniversal, vendor-neutral formatfor representing knowledge as plain markdown files with YAML frontmatter. It isnot tied to any particular agent, framework, model provider, or serving system. The goal is simple:\n\nAnyone can produceOKF — humans authoring by hand, agents built on any framework (Google ADK, LangChain, custom), export pipelines from existing catalogs (Dataplex, Unity Catalog, Collibra, …), or scripts walking a database.Anyone can serve and consumeOKF — a static file server, a knowledge-management UI (Obsidian, Notion, MkDocs), an LLM loading files into context, a search index, or a graph viewer like the one bundled in this repo.The agent below is a\n\nproof of conceptdemonstratingoneway to produce OKF bundles automatically. The format itself is the contribution; this agent and the visualizer exist to make the format tangible at both ends — production and consumption.\n\nSee OKF in practice— three ready-to-browse bundles produced by this agent, checked into[:]`bundles/`\n\n[— GA4 e-commerce dataset (]`bundles/ga4/`\n\n[viz.html])[— Stack Overflow public dataset (]`bundles/stackoverflow/`\n\n[viz.html])[— Bitcoin blocks/transactions (]`bundles/crypto_bitcoin/`\n\n[viz.html])\n\nOKF represents catalog knowledge as plain markdown files with YAML frontmatter, organized in a directory hierarchy. That choice unlocks a few properties that are hard to get from a service-owned metadata store:\n\n**Human- and agent-readable.** No SDK or query language stands between a reader and the content. An engineer can`cat`\n\na concept; an LLM can ingest it verbatim into context.**Version-controllable out of the box.** Bundles live in git. Pull requests, line-by-line diffs, blame, and review workflows just work — knowledge curation becomes a normal software-engineering activity.**Portable and lock-in free.** A bundle is a directory. Ship it as a tarball, host it in any repo, mount it from any filesystem, or sync it to any system that speaks files. No proprietary API stands between you and your metadata.**Mixes structured and unstructured data deliberately.** Use frontmatter for the few fields you want to query, filter, or index on (`type`\n\n,`resource`\n\n,`tags`\n\n,`timestamp`\n\n); use the markdown body for the prose, schemas, and example queries that LLMs and humans actually read.**Minimally opinionated, freely extensible.** A small set of required keys ensures interoperability, but bundles can carry arbitrary extra frontmatter keys and arbitrary body sections without breaking consumers.**Composes with existing tooling.** Many knowledge tools — Notion, Obsidian, MkDocs, Hugo, Jekyll — already speak markdown plus YAML frontmatter, so bundles can be browsed, edited, or rendered without custom UI.**Progressive disclosure built in.** Auto-generated`index.md`\n\nfiles let an agent or human navigate the hierarchy one level at a time instead of loading the entire bundle into context.**Graph-shaped, not just tree-shaped.** Concepts link to each other via normal markdown links, expressing relationships richer than the parent/child implied by the directory layout.\n\nThe net effect is that reference agents, consumption agents, and humans collaborate on the same artifacts in the same way they already collaborate on source code.\n\n```\npython3.13 -m venv .venv\n.venv/bin/pip install --index-url https://pypi.org/simple/ -e .[dev]\n```\n\n- BigQuery:\n`gcloud auth application-default login`\n\nplus a project for billing (`gcloud config set project <id>`\n\n). Public datasets are readable, but the caller's project is billed for query bytes. - Gemini: set\n`GEMINI_API_KEY`\n\n(AI Studio)**or** use Vertex AI by setting`GOOGLE_GENAI_USE_VERTEXAI=true`\n\n,`GOOGLE_CLOUD_PROJECT=<id>`\n\n, and`GOOGLE_CLOUD_LOCATION=<region>`\n\n.\n\nThe reference agent runs in two passes. The **BQ pass** writes one OKF\ndoc per concept the source advertises, using BigQuery metadata alone.\nThe **web pass** then runs the LLM as its own crawler: it receives a\nlist of seed URLs (provided via `--web-seed`\n\nor `--web-seed-file`\n\n),\nfetches the seeds via the `fetch_url`\n\ntool, and decides which outbound\nlinks are worth following based on whether they look like authoritative\ndocumentation for the existing concepts. For each page it fetches, the\nagent chooses to (a) enrich one or more existing concept docs, (b) mint\na standalone `references/<slug>`\n\ndoc, or (c) skip. A hard\n`--web-max-pages`\n\ncap and a same-domain allowed-hosts filter\n(configurable via `--web-allowed-host`\n\n) are enforced inside the tool,\nso the agent cannot overrun. Use `--no-web`\n\nto skip the web pass.\n\nMinimum invocation — point at a BigQuery dataset and a bundle output\ndirectory. Seeds for the web pass are explicit; omit them (or pass\n`--no-web`\n\n) to run BQ-only:\n\n```\n.venv/bin/python -m reference_agent enrich \\\n    --source bq \\\n    --dataset <project>.<dataset> \\\n    --web-seed-file <path/to/seeds.txt> \\\n    --out ./bundles/<name>\n```\n\nIterate on a single concept by adding `--concept <type>/<name>`\n\n(e.g.\n`--concept tables/events_`\n\n); repeatable.\n\nEach sample pairs a **recipe** (`samples/<name>/`\n\n, with the seed URLs and\nexact `enrich`\n\ncommand) with the **produced bundle** (`bundles/<name>/`\n\n)\nthat the recipe generated. Open the recipe to reproduce; open the bundle\nto browse the result directly.\n\n**GA4 Google Merchandise Store**— public e-commerce dataset, seeded with canonical GA4 BigQuery Export documentation URLs. ·[recipe](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/samples/ga4_merch_store/README.md)·[bundle](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/bundles/ga4)·[viz.html](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/bundles/ga4/viz.html)**Stack Overflow**— public dataset (mirror of the Stack Exchange Data Dump), seeded with the community's canonical schema references. Exercises multi-concept enrichment from cross-cutting docs pages. ·[recipe](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/samples/stackoverflow/README.md)·[bundle](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow)·[viz.html](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/bundles/stackoverflow/viz.html)**Bitcoin (crypto)**— public dataset (blocks, transactions, inputs, outputs) from the`bitcoin-etl`\n\npipeline. Exercises cross-table foreign-key relationships in prose. ·[recipe](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/samples/crypto_bitcoin/README.md)·[bundle](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/bundles/crypto_bitcoin)·[viz.html](/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/bundles/crypto_bitcoin/viz.html)\n\nThe `visualize`\n\nsubcommand renders any OKF bundle as a **self-contained\ninteractive HTML file** — one file, no backend, no install on the\nviewing side. Open it in any modern browser, share it as an artifact,\nhost it on a static file server, or commit it next to the bundle (as\nthis repo does).\n\nThe viewer is itself a proof-of-concept *consumer* of OKF, mirroring\nthe way the reference agent is a proof-of-concept *producer*. OKF\nbundles can be consumed by anything that reads markdown; this is just\none shape.\n\n- A\n**force-directed graph** of every concept in the bundle, with colored nodes by type (datasets, tables, references, …) and directed edges drawn from each cross-link in the markdown bodies. - A\n**detail panel** for the selected concept showing its frontmatter (description, resource link, tags) and its rendered markdown body — with internal`[…](/path/to/concept.md)`\n\nlinks rewired to navigate within the viewer instead of following the path. - A\n**\"Cited by\" backlinks** list under each concept (computed from the reverse of the link graph). - A\n**search box**(matches title, concept id, and tags), a** type filter**, and switchable graph layouts (cose / concentric / breadth-first / circle / grid).\n\n```\n.venv/bin/python -m reference_agent visualize --bundle ./bundles/<name>\n```\n\nThat writes `bundles/<name>/viz.html`\n\n. Flags:\n\n| Flag | Default | Description |\n|---|---|---|\n`--bundle` |\n(required) |\nBundle root directory. |\n`--out` |\n`<bundle>/viz.html` |\nOutput HTML path. |\n`--name` |\nbundle directory name | Display name shown in the viewer header. |\n\nExample, writing the output somewhere else and overriding the header:\n\n```\n.venv/bin/python -m reference_agent visualize \\\n    --bundle ./bundles/crypto_bitcoin \\\n    --out /tmp/btc.html \\\n    --name \"Bitcoin OKF\"\n```\n\nThe HTML embeds the bundle as a JSON blob and uses\n[Cytoscape.js](https://js.cytoscape.org/) for the graph and\n[marked](https://marked.js.org/) for in-browser markdown rendering,\nboth loaded from a CDN. No data leaves the page; the bundle is parsed\nonce at generation time and serialized into the file.\n\n```\n.venv/bin/pytest\n```\n\n", "url": "https://wpnews.pro/news/googles-specification-and-tooling-for-the-llm-wiki", "canonical_source": "https://github.com/GoogleCloudPlatform/knowledge-catalog/tree/main/okf", "published_at": "2026-06-21 16:06:05+00:00", "updated_at": "2026-06-21 16:34:49.383962+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "developer-tools"], "entities": ["Google", "Open Knowledge Format", "BigQuery", "Gemini", "Vertex AI", "Google ADK", "LangChain", "Dataplex"], "alternates": {"html": "https://wpnews.pro/news/googles-specification-and-tooling-for-the-llm-wiki", "markdown": "https://wpnews.pro/news/googles-specification-and-tooling-for-the-llm-wiki.md", "text": "https://wpnews.pro/news/googles-specification-and-tooling-for-the-llm-wiki.txt", "jsonld": "https://wpnews.pro/news/googles-specification-and-tooling-for-the-llm-wiki.jsonld"}}