CKP LLM: The Missing Layer Between Your AI Agent and Its Knowledge Base

wpnews.pro

Last week my AI coding agent gave me a confident, detailed answer — referencing the wrong project entirely.

The problem was not the model. It was context: the agent had loaded 20 knowledge files and picked the wrong one to answer from. The signal was buried in noise.

That bug led me to build CKP LLM — Compiled Knowledge Pattern.

Most developers who use AI coding agents build a knowledge base: a folder of Markdown files describing projects, architecture decisions, and recurring patterns. The agent reads them at startup and uses them as memory.

It works. Until it doesn't.

Agents load everything, every time. Whether your question is about authentication or database schemas, the agent reads all 20 files before answering. Context fills up with noise. Answer quality drops — not because the LLM is bad, but because it is reading too much.

RAG solves this at scale, but for a personal or team knowledge base of 20–100 files, it is overkill. You need an embedding model, a vector store, and runtime computation on every query. Too much complexity for the problem size.

What if knowledge files could tell the agent when to load them, and what to bring along?

CKP adds five structured fields to each knowledge file. Here is what a file looks like with a CKP header,

CONCEPT:      mentat
TLDR:         Civic intelligence platform with ISTAT data, RAG chat and AI analytics for municipalities.
ANSWERS_WHEN: mentat, civic, ISTAT, territory, NestJS, security, crime, RAG, analytics
SIMILAR_HIGH: prescient:2025-05, civis:2025-05
SIMILAR_MID:  travelguidehub:2025-05
CONFIDENCE:   high
VALIDATED:    2025-05
Your normal content here. Nothing changes below the header.

The body of the file is unchanged. Everything else stays the same.

TLDR — one sentence optimised for LLM reading. If this already answers the query, the full file is never loaded.

ANSWERS_WHEN — keywords that trigger this file. The agent matches these against the query before anything.

SIMILAR_HIGH — files that must always load alongside this one. Direct dependencies, shared APIs, same architecture. Encoded explicitly so the agent never has to infer it.

SIMILAR_MID — files that load only if the query domain also matches them. Conditional, not automatic.

VALIDATED — a timestamp per relationship, not per file. If a related file was updated after this date, that specific relationship is flagged as potentially stale.

The agent keeps a small _index.md containing only the headers of all knowledge files — no body content. This index is always in context. Everything else is loaded on demand.

When a query arrives:

Result: 2–4 files loaded instead of 20.

Existing semantic search computes relationships at runtime — every query triggers embedding lookup, similarity calculation, retrieval. That computation runs hundreds of times a day.

CKP moves that computation to write time. When you update a knowledge file, the LLM computes relationships once and stores them in the header. At query time, the agent reads pre-computed structure.

No vector database. No embedding model at runtime. No infrastructure to maintain.

Why categorical tiers instead of decimal scores?

LLMs are significantly more consistent when classifying into categories than when assigning decimal scores. A score of 0.73 from one session may be 0.61 in another. With a fixed rubric and anchor examples, HIGH / MID / nothing produces stable, reproducible results across any LLM and any session.

HIGH — one concept requires understanding the other. Direct dependency. Max 3 entries.

MID — same domain, frequently relevant together. No direct dependency. Max 5 entries.

Nothing — loosely related. Not stored.

Tested on a real NestJS codebase (Mentat — civic intelligence platform) using Claude Sonnet. 5 query types, 3 runs each, 30 total runs, two environments compared.

3 files: 564 tokens No CKP → 522 tokens CKP → 8% reduction (below break-even)

11 files: 4,800 tokens No CKP → 1,618 tokens CKP → 66.3% reduction

30 files (projected): ~13,000 tokens No CKP → ~1,900 tokens CKP → ~85% reduction

The break-even is around 5–6 files. Above that, savings scale super-linearly.

No CKP (11 files): 9 correct out of 15 → 60% accuracy

CKP (11 files): 15 correct out of 15 → 100% accuracy

The failures in No-CKP were not random. On an authentication query, the agent mixed information from geography, ISTAT, and frontend files and produced a vague answer. On an out-of-domain query about Stripe on a civic platform codebase, it hallucinated connections between Stripe and existing NestJS modules.

CKP on that same query loaded nothing, declared out-of-domain, and answered honestly.

Reduced context is not just a cost saving. It is a hallucination risk reduction.

Claude Sonnet at $3 per million input tokens

1,000 queries per day: $286 saved per month

5,000 queries per day: $1,432 saved per month

10,000 queries per day: $2,865 saved per month

Add this block to your AGENT.md or GEMINI.md:

`BOOT — runs unconditionally on every first message
1. Use current working directory as PROJECT_ROOT (no find/ls).
2. Read PROJECT_ROOT/memory-bank/_index.md directly.
3. If exists → ROUTING. If not → INIT.

INIT — autonomous, zero questions to user
Analyse project from package.json, README.md, src/ structure.
Classify sibling directories as HIGH/MID.
Create memory-bank/projects/[PROJECT_ID].md with full CKP header.
Create memory-bank/_index.md routing table.
Confirm with one line: [CKP LLM initialised. Proceeding.]

ROUTING
Match query keywords against ANSWERS_WHEN.
Load matched file + SIMILAR_HIGH (direct read).
Load SIMILAR_MID only if their ANSWERS_WHEN also match.
Declare in one line: [CKP: loaded X/Y files — match: file via keyword]
Never ask the user questions. Make assumptions, declare them, proceed.`

CKP builds on Andrej Karpathy's LLM Wiki pattern, which proposed compiling knowledge into structured files loaded directly into context.

The gap in the original pattern: files are isolated. The agent knows what exists but has to infer which files relate to which, and how strongly. CKP makes those relationships explicit, pre-computed, and consistent.

From a compiled wiki to a self-routing knowledge graph, stored entirely in plain text.

CKP is designed for developers and teams who:

It is not designed for millions of documents. RAG remains the right tool at that scale. CKP fills the gap between everything and building full RAG infrastructure.

Full pattern documentation and copy-paste AGENT.md rule:

[https://alessandro-marocchini.github.io/ckp-llm/]

Add the header to your first knowledge file. Add the BOOT rule to your agent config. The rest is automatic.

source & further reading

dev.to — original article Aider vs OpenCode vs Claude Code: Which CLI Coding Agent Wins in 2026? TIL Git Hooks Exist (After a Decade of Using Git) Yhuu: What Happens When You Build "Relationship Loyalty Testing" as a Product

CKP LLM: The Missing Layer Between Your AI Agent and Its Knowledge Base

Run your AI side-project on zahid.host