{"slug": "llm-wiki-v3-compartmentalized-wiki-architecture", "title": "LLM Wiki V3: Compartmentalized Wiki Architecture", "summary": "The article introduces LLM Wiki V3, a concept document that proposes a \"compartmentalized wiki architecture\" for scaling LLM-maintained knowledge systems. It argues that stateful LLM systems are built around stateless model calls, where the context window is a temporary working surface, not memory, and that scaling requires segmentation—multiple narrow, purpose-built foundations—rather than stacking complexity onto a single schema or pipeline. V3 builds on V1's portable pattern and V2's identified failure modes (e.g., knowledge staleness, schema overload) by applying segmentation as a design philosophy to ingestion, schemas, roles, and retrieval.", "body_md": "# LLM Wiki V3: Segmentation\n\n> Karpathy gave us the foundation.\n> Rohitg00 warned us what breaks.\n> V3 is how you structure it to scale.\n\nThis is a concept document in the same spirit as V1 and V2.\n\nV1 was intentionally vague. Build on it.\nV2 was intentionally open. Solve it.\nV3 is intentionally incomplete. Segment it.\n\n---\n\n## Start here: everything is stateless\n\nBefore anything else, name the constraint every builder is working inside but nobody says out loud.\n\nThere is no such thing as a stateful LLM by itself. Stateful LLM systems are built around a stateless model call.\n\nThe context window is not memory. It is a temporary working surface. Tokens at the top grow cold as the window fills. Instructions fade. Context drifts. What looks like continuity — in Claude, in GPT, in any model — is the provider actively managing that stateless surface behind the scenes. Pruning it. Compacting it. Injecting summaries to rewarm scope. Building reference structures so the important things stay warm long enough to be useful.\n\nA mini wiki inside the conversation.\n\nOnce you understand this, everything about wiki design changes. You stop trying to build systems that hold everything. You start building systems that keep the right things warm at the right moment.\n\nThat is what V3 is about.\n\n---\n\n## The chain so far\n\n### V1 — Karpathy's LLM Wiki\n\nV1 gave us the smallest complete loop for an LLM-maintained knowledge system.\n\n```\nraw sources → ingest → wiki pages → query → answers\n                ↑                      ↓\n              schema                  lint\n                ↑                      ↓\n              index ←————————————————— log\n```\n\nThe LLM is the compiler. Feed it raw sources. It compiles them once into structured wiki pages. Future queries read the compiled wiki instead of re-deriving knowledge from scratch every time. Knowledge compounds.\n\nKarpathy's wiki was specialized by design. He engaged with each piece of material personally — reading it, talking through it, deciding what mattered. That is a narrow, deliberate ingestion filter. The schema stays focused. The LLM never gets overwhelmed because the scope never gets out of hand.\n\nV1 is not a complete scaling architecture. It is a portable pattern.\n\n### V2 — LLM Wiki v2 (rohitg00)\n\nV2 looked at V1 running in production and asked what breaks as it scales.\n\nThe findings are real — knowledge goes stale, confidence is flat, indexes grow too large, automation has no guard rails, multi-agent work has no coordination layer, schemas accumulate complexity until they start to fail.\n\nV2 proposed mechanisms: confidence scoring, supersession, forgetting curves, consolidation tiers, hybrid search, knowledge graphs, crystallization.\n\nThose mechanisms are correct in theory. But V2 has an unspoken assumption baked into almost all of them — that one schema, one index, and one ingest pipeline can be made smart enough to handle everything. Stack enough complexity onto that foundation and the LLM does not get smarter. It gets overwhelmed. It starts hallucinating. The schema that was supposed to govern everything becomes the thing that breaks everything.\n\nV2 is not wrong. It just left the architectural answer open.\n\n---\n\n## The failure mode nobody named\n\nV2's failure modes are not data problems. They are segmentation failures.\n\nWhen a wiki grows, the instinct is to add more to the existing structure. More rules in the schema. More entries in the index. More mechanisms in the pipeline. You are stacking more weight onto the same foundation. At some point the foundation cannot hold it.\n\nThe answer is not a stronger foundation. The answer is more foundations — each one narrow, each one purpose-built, each one carrying only what it needs to carry.\n\nSegmentation. Not as a folder structure. Not as a tagging convention. As a design philosophy applied to everything: ingestion, schemas, roles, prompts, retrieval, lint. Every component stays narrow enough that the LLM executing it can do so reliably without drifting.\n\nEverything V2 warns about is a symptom of the same underlying problem. Too much stacked on too little structure. Segmentation is the reinforcement that lets you build up.\n\n---\n\n## Two contrasting examples\n\nThe clearest way to understand segmentation is to see it at both ends of the spectrum.\n\n**Karpathy's V1 — the restricted section.**\n\nIf you have ever read Harry Potter, you know the Hogwarts library has a restricted section. You do not browse it casually. You go there deliberately, with purpose, because what lives there requires careful handling.\n\nKarpathy's wiki is the restricted section. Every piece of material is personally curated. Every ingestion is deliberate. The schema is tight. The human is in the loop constantly. The result is a deeply indexed, highly queryable knowledge base for research that genuinely matters.\n\nThis is not a system you build for a thousand files. This is a system you build for the fifty documents that define your thinking on a subject. Expensive in human attention. Worth every bit of that cost for the right use case.\n\n**A broad file system wiki — the open stacks.**\n\nOn the other end: thousands of files. Scripts, research notes, markdown, configs, outputs, half-finished ideas. Nobody is curating this manually. Nobody should be.\n\nThe ingestion rule here is almost the opposite of Karpathy's. Do not read the whole file. Read just enough to understand what it is. The title probably tells you most of it. Read the first few lines. Write a three to five sentence summary. Classify it. Tag it. That is it. Nothing more.\n\nThe goal is not deep indexing. The goal is findability. A good title and a clean summary is enough for a librarian to locate it later. Trying to do more at this stage wastes tokens and produces dirty data that poisons retrieval downstream.\n\nTwo completely different ingestion strategies. Both correct. Both segmented for their purpose. Both can coexist in the same system — the restricted section living inside the broader library, untouched and fully functional, nothing breaking because neither interferes with the other.\n\n---\n\n## The library\n\nA V3 wiki operates like a real library. Not metaphorically. Operationally.\n\nA real library has specialized roles. Nobody asks the librarian to also receive shipments, catalog new arrivals, and repair damaged books simultaneously. Those are separate jobs done by separate people with separate workflows. The library works because the roles are segmented.\n\nA V3 wiki has the same structure.\n\n**The ingestor** brings in new material. Its only job is classification — summary, tags, category. It reads as little as it needs to. It does not synthesize. It does not cross-reference. It files and moves on.\n\n**The librarian** handles retrieval. When a request comes in it searches by title, by summary, by tags. It tracks what gets checked out frequently and what sits untouched. Over time those access patterns become signal — documents that get pulled repeatedly are candidates for deeper indexing or easier navigation. The librarian does not answer questions. It routes them.\n\n**The linter** keeps the collection healthy. It deduplicates. It flags outdated entries. It makes sure the same document did not get filed under three different titles. It runs on its own schedule, not as part of every query.\n\n---\n\n## Why the librarian matters\n\nWhen an agent team goes into the wiki to find their own research context, something subtle happens.\n\nIt can work. A well-scoped agent that reinforces its objective as it navigates can come back with good material. But there is a real risk. As the agent searches — reading, evaluating, following references, deciding what is relevant — its context fills. By the time it returns, it may have drifted from the original objective. It may bring back material that is tangentially related but not quite right. And that muddied context, delivered fresh to the waiting team, can bias the whole group before actual work begins. The team inherits the agent's drift.\n\nThe librarian pattern separates that risk out entirely.\n\nThe team states the objective and waits. The librarian handles retrieval in its own context — searching, filtering, scoping. The team receives clean pre-scoped material delivered fresh alongside the original objective. They start the actual work with both things warm in the window and nothing in between.\n\nThis is not just token economy. It is bias prevention. The team's first read of the research happens together, with the objective present, without a single agent's navigational drift already baked in.\n\nSometimes two librarians run in parallel — one for local knowledge, one for web retrieval — so neither gets cold on a large request. Segmentation applies to the librarians too.\n\n---\n\n## Cache rewarming: keeping scope alive in long sessions\n\nAs sessions grow longer and systems grow more complex, tokens go cold faster than the work gets done.\n\nYou can rewarm them.\n\nAnthropic does this in Claude Code with automatic compaction — injecting a structured summary back into context when the window fills, so the model does not lose the thread. The important context gets compressed and reinjected. Scope is restored.\n\nThe same pattern applies in wiki systems. After a certain number of turns or a token threshold, the librarian injects a brief recap — what was retrieved, why it was relevant, what the team is building toward. Key documents get flagged for re-reference. The team gets a lightweight reorientation before continuing.\n\nExplicit triggers that rewarm implicit context before it drifts. This is the same principle as good schema design applied at the session level: do not rely on implicit context staying warm on its own. Build the reinforcement in.\n\nFor simple wikis this is overkill. For complex multi-agent systems running long sessions, it is the difference between a team that finishes clean and one that quietly loses the thread halfway through.\n\n---\n\n## What this means for V2\n\nEverything V2 describes still applies. The mechanisms are real. The warnings are valid.\n\nConfidence scoring works when it is scoped to a specific ingestion pipeline with a defined standard. Crystallization works when the librarian knows which documents have been accessed enough to warrant deep indexing. Supersession works when the linter has a clean enough catalog to detect duplicates. Hybrid search works when the librarian is routing queries through the right filter before retrieval begins.\n\nNone of these fail because the mechanisms are wrong. They fail when you try to run all of them through one undifferentiated schema on one undifferentiated pile of data.\n\nSegment the system and V2's mechanisms work exactly as intended.\n\n---\n\n## Schema design: explicit triggers, implicit configuration\n\nSegmentation applies to schemas too. This is where configuration gets hard and where most systems quietly break.\n\nThe instinct when writing a schema is to put everything in it. Every behavior, every preference, every edge case. The schema becomes dense and the LLM is expected to hold all of it at once. That is the mistake.\n\nTwo kinds of instructions behave differently under context pressure.\n\n**Explicit instructions hold.** A defined trigger, a defined action, a defined expected output. The model does not interpret it. It executes it.\n\n**Implicit instructions drift.** Broad behavioral guidance goes cold as context fills. The model stops attending to it. The behavior disappears quietly, usually right when it matters most.\n\nThe solution is to keep schemas explicit and use explicit triggers to invoke implicit guidance at exactly the right moment.\n\nHere is what that looks like for a librarian role:\n\n```\nLIBRARIAN SCHEMA\n\n1. Request received.\n   → Read: dewey-decimal-system.md\n   [explicit: orientation to the filing system]\n\n2. Before searching.\n   → Read: how-to-gather-relevant-materials.md\n   [explicit: defined procedure, no judgment required]\n\n3. Search and retrieve. Leave results at the desk.\n   [Stack one: direct matches]\n\n4. After retrieval complete.\n   → Read: how-to-find-context-that-isnt-obvious.md\n   [implicit: judgment, thinking beyond direct matches]\n\n5. Search again. Leave results at the desk.\n   [Stack two: non-obvious candidates]\n\n6. Read stack one. Read stack two.\n   Add anything from stack two now verified relevant into stack one.\n   Return stack one to the team.\n   [explicit: mechanical, no interpretation]\n```\n\nStep one orients the librarian to the filing system before anything else. Step two is a defined procedure — how to search — with no judgment calls, purely explicit. Step four is the only implicit step: it invites the model to think beyond the obvious, at exactly the moment that judgment is needed and the structured work is already done. Step six is mechanical and explicit — compare, verify, return. No interpretation.\n\nOne implicit step. Everything else explicit. The implicit instruction arrives at the seam between structured work and judgment work — after the first pass is complete, before the second pass begins. That timing is not accidental. Implicit instructions that arrive too early get buried under orientation. Too late and the model's habits are already set.\n\nGetting trigger timing right is one of the harder design problems in segmented wiki systems. Implicit instructions invoked at the wrong moment introduce variance. The model starts interpreting instead of executing. That is where wiki corruption begins quietly — misclassified files, drifting summaries, a linter that stops catching what it should.\n\nThe working principle: explicit schema stays minimal and structural. Implicit charters carry judgment and culture. Explicit triggers connect the two at the right moment in the workflow. Never passively loaded. Always invoked on demand.\n\n---\n\n## What is left open\n\nThis is a concept, not a finished system.\n\nSome pieces are working. Early ingestion pipelines. A librarian that routes and tracks. Some segmentation between research teams and build teams. The restricted section coexisting with the broader collection. But the full architecture described here is what is being worked toward, not what has shipped.\n\nV2 gives the mechanism layer. V3 gives the segmentation layer. Neither is complete on its own.\n\nThe part that genuinely needs more work — from me and from anyone building in this space — is the configuration of segmentation itself. Segmentation works inside a segment. But how segments communicate, how handoffs are governed, what rules exist at the boundaries between them — that is the hardest part. That is where schemas and navigation get complex fast. That is the open problem worth investigating and discussing.\n\nIf you are building something in this space, reach out in the comments. If you have solved a piece of this, I want to hear it. If you have questions or ideas, same thing.\n\n---\n\n## The V3 principle\n\nV1 showed us the wiki as compiler.\n\nV2 showed us what breaks when the compiler tries to do too much.\n\nV3 shows us how to build a system where nothing ever has to do too much.\n\nSegment the ingestion. Segment the roles. Segment the retrieval. Keep every context narrow enough to stay warm. Protect your execution teams from bias and drift. Rewarm scope before it fades.\n\nA wiki is not just a knowledge store. It is an execution system. Treat it like one.\n\nEverything is stateless. Design around it.\n\n---\n\n*This document builds on [Andrej Karpathy's LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) and [LLM Wiki v2](https://gist.github.com/rohitg00/2067ab416f7bbe447c1977edaaa681e2) by rohitg00. Everything in both documents still applies. This adds the segmentation layer that lets them scale.*", "url": "https://wpnews.pro/news/llm-wiki-v3-compartmentalized-wiki-architecture", "canonical_source": "https://gist.github.com/ahumanft/6c96385be6ca4af578cc9b20e0f79e66", "published_at": "2026-05-20 07:57:20+00:00", "updated_at": "2026-05-22 16:41:20.917840+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "developer-tools"], "entities": ["Karpathy", "Rohitg00", "Claude", "GPT"], "alternates": {"html": "https://wpnews.pro/news/llm-wiki-v3-compartmentalized-wiki-architecture", "markdown": "https://wpnews.pro/news/llm-wiki-v3-compartmentalized-wiki-architecture.md", "text": "https://wpnews.pro/news/llm-wiki-v3-compartmentalized-wiki-architecture.txt", "jsonld": "https://wpnews.pro/news/llm-wiki-v3-compartmentalized-wiki-architecture.jsonld"}}