CKP LLM: The Missing Layer Between Your AI Agent and Its Knowledge Base

A developer built CKP (Compiled Knowledge Pattern) to solve the problem of AI coding agents loading too many knowledge files and producing inaccurate answers. The system adds five structured fields to each knowledge file, including keywords that trigger loading and explicit dependency relationships, reducing context from 20 files to 2-4 per query. Testing on a NestJS codebase with 11 files showed token usage dropped from 4,800 to 1,618 tokens, a 66% reduction, by moving relationship computation from query time to write time.

Last week my AI coding agent gave me a confident, detailed answer — referencing the wrong project entirely. The problem was not the model. It was context: the agent had loaded 20 knowledge files and picked the wrong one to answer from. The signal was buried in noise. That bug led me to build CKP LLM — Compiled Knowledge Pattern. Most developers who use AI coding agents build a knowledge base: a folder of Markdown files describing projects, architecture decisions, and recurring patterns. The agent reads them at startup and uses them as memory. It works. Until it doesn't. Agents load everything, every time. Whether your question is about authentication or database schemas, the agent reads all 20 files before answering. Context fills up with noise. Answer quality drops — not because the LLM is bad, but because it is reading too much. RAG solves this at scale, but for a personal or team knowledge base of 20–100 files, it is overkill. You need an embedding model, a vector store, and runtime computation on every query. Too much complexity for the problem size. What if knowledge files could tell the agent when to load them, and what to bring along? CKP adds five structured fields to each knowledge file. Here is what a file looks like with a CKP header, CONCEPT: mentat TLDR: Civic intelligence platform with ISTAT data, RAG chat and AI analytics for municipalities. ANSWERS WHEN: mentat, civic, ISTAT, territory, NestJS, security, crime, RAG, analytics SIMILAR HIGH: prescient:2025-05, civis:2025-05 SIMILAR MID: travelguidehub:2025-05 CONFIDENCE: high VALIDATED: 2025-05 Your normal content here. Nothing changes below the header. The body of the file is unchanged. Everything else stays the same. TLDR — one sentence optimised for LLM reading. If this already answers the query, the full file is never loaded. ANSWERS WHEN — keywords that trigger this file. The agent matches these against the query before loading anything. SIMILAR HIGH — files that must always load alongside this one. Direct dependencies, shared APIs, same architecture. Encoded explicitly so the agent never has to infer it. SIMILAR MID — files that load only if the query domain also matches them. Conditional, not automatic. VALIDATED — a timestamp per relationship, not per file. If a related file was updated after this date, that specific relationship is flagged as potentially stale. The agent keeps a small index.md containing only the headers of all knowledge files — no body content. This index is always in context. Everything else is loaded on demand. When a query arrives: Result: 2–4 files loaded instead of 20. Existing semantic search computes relationships at runtime — every query triggers embedding lookup, similarity calculation, retrieval. That computation runs hundreds of times a day. CKP moves that computation to write time. When you update a knowledge file, the LLM computes relationships once and stores them in the header. At query time, the agent reads pre-computed structure. No vector database. No embedding model at runtime. No infrastructure to maintain. Why categorical tiers instead of decimal scores? LLMs are significantly more consistent when classifying into categories than when assigning decimal scores. A score of 0.73 from one session may be 0.61 in another. With a fixed rubric and anchor examples, HIGH / MID / nothing produces stable, reproducible results across any LLM and any session. HIGH — one concept requires understanding the other. Direct dependency. Max 3 entries. MID — same domain, frequently relevant together. No direct dependency. Max 5 entries. Nothing — loosely related. Not stored. Tested on a real NestJS codebase Mentat — civic intelligence platform using Claude Sonnet. 5 query types, 3 runs each, 30 total runs, two environments compared. 3 files: 564 tokens No CKP → 522 tokens CKP → 8% reduction below break-even 11 files: 4,800 tokens No CKP → 1,618 tokens CKP → 66.3% reduction 30 files projected : ~13,000 tokens No CKP → ~1,900 tokens CKP → ~85% reduction The break-even is around 5–6 files. Above that, savings scale super-linearly. No CKP 11 files : 9 correct out of 15 → 60% accuracy CKP 11 files : 15 correct out of 15 → 100% accuracy The failures in No-CKP were not random. On an authentication query, the agent mixed information from geography, ISTAT, and frontend files and produced a vague answer. On an out-of-domain query about Stripe on a civic platform codebase, it hallucinated connections between Stripe and existing NestJS modules. CKP on that same query loaded nothing, declared out-of-domain, and answered honestly. Reduced context is not just a cost saving. It is a hallucination risk reduction. Claude Sonnet at $3 per million input tokens 1,000 queries per day: $286 saved per month 5,000 queries per day: $1,432 saved per month 10,000 queries per day: $2,865 saved per month Add this block to your AGENT.md or GEMINI.md: BOOT — runs unconditionally on every first message 1. Use current working directory as PROJECT ROOT no find/ls . 2. Read PROJECT ROOT/memory-bank/ index.md directly. 3. If exists → ROUTING. If not → INIT. INIT — autonomous, zero questions to user Analyse project from package.json, README.md, src/ structure. Classify sibling directories as HIGH/MID. Create memory-bank/projects/ PROJECT ID .md with full CKP header. Create memory-bank/ index.md routing table. Confirm with one line: CKP LLM initialised. Proceeding. ROUTING Match query keywords against ANSWERS WHEN. Load matched file + SIMILAR HIGH direct read . Load SIMILAR MID only if their ANSWERS WHEN also match. Declare in one line: CKP: loaded X/Y files — match: file via keyword Never ask the user questions. Make assumptions, declare them, proceed. CKP builds on Andrej Karpathy's LLM Wiki pattern, which proposed compiling knowledge into structured files loaded directly into context. The gap in the original pattern: files are isolated. The agent knows what exists but has to infer which files relate to which, and how strongly. CKP makes those relationships explicit, pre-computed, and consistent. From a compiled wiki to a self-routing knowledge graph, stored entirely in plain text. CKP is designed for developers and teams who: It is not designed for millions of documents. RAG remains the right tool at that scale. CKP fills the gap between loading everything and building full RAG infrastructure. Full pattern documentation and copy-paste AGENT.md rule: https://alessandro-marocchini.github.io/ckp-llm/ Add the header to your first knowledge file. Add the BOOT rule to your agent config. The rest is automatic.