Five primitives I exercised end-to-end on world-model-mcp's own repo

Five primitives demonstrated on the world-model-mcp project's own codebase, including a learned constraint that denies edits violating a rule (e.g., using `console.log` instead of `logger.debug`) after three violations, and a regression warning that flags edits to files with recorded bug fixes. The system uses a SQLite database to store constraints, facts, and audit data, with all outputs shown as verbatim database responses that can be reproduced by cloning the repository and running setup and demo scripts. The project aims to help AI coding agents maintain context across compactions, avoid repeating mistakes, and prevent hallucinating non-existent APIs.

I shipped four releases of world-model-mcp in twelve days. v0.6.1 to v0.7.2. The pitch is "AI coding agents lose context across compaction, repeat the same mistakes, and hallucinate APIs that do not exist." Before I write more about it I wanted to demonstrate the primitives on a real codebase, with real outputs, not screenshots someone has to take my word for. The codebase is the project's own repo. I ran python -m world model server.cli setup it auto-seeded 598 entities from the source , then ran scripts/demo seed.py which inserts the small set of constraints, facts, and a compaction audit row that real PostToolUse / record correction hook activity would write organically over one to two weeks of development with Claude Code installed. Every output block below is verbatim from the actual SQLite database after running the actual command. You can reproduce every output here by cloning the repo, running python -m world model server.cli setup, then python scripts/demo seed.py. The script is idempotent and supports --dry-run and --reset. Install: pip install world-model-mcp. Source: github.com/SaravananJaichandar/world-model-mcp. 1. A learned constraint denying an edit at the PreToolUse boundary When a developer corrects the agent rewrites console.log to logger.debug , the PostToolUse hook records the diff and infers a rule. Once that rule's violation count crosses the hard-threshold severity=error, count ≥ 3 , the next attempt is denied at PreToolUse before the tool runs. The constraint as the graph stores it: { "rule name": "no-console-log", "severity": "error", "violation count": 5, "description": "Use logger.debug not console.log in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers.", "file pattern": " .ts", "examples": {"incorrect": "console.log", "correct": "logger.debug"} } The PreToolUse hook's actual JSON response when an edit containing console.log reaches it: { "hookSpecificOutput": { "hookEventName": "PreToolUse", "permissionDecision": "deny", "permissionDecisionReason": "Hard constraint violation: no-console-log Use logger.debug not console.log in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers. . Violated 5 times previously." }, "violations": { "rule": "no-console-log", "severity": "error", "violation count": 5, "is hard": true, "is defer": false } } Rules in CLAUDE.md or AGENTS.md are advisory and the model treats them as suggestions. Rules with a violation count and an enforcement boundary at the edit step are binding. Both have the same source — a developer correcting the agent — but very different effect. 2. A regression warning that flags edits to a file with a recorded bug fix get related bugs walks decision traces and prior bug-fix facts. When validate change runs on a file with a recorded fix, the related-bugs query surfaces the prior fix and flags the proposed change. The project has a bug fix on file world model server/knowledge graph.py:120-135 for content-hash backfill the migration logic must run on every initialize , not just when the column is created . I proposed a refactor that removed the backfill loop and ran the related-bugs check: { "risk score": 0.6, "bugs": { "bug id": "12457e2a-5638-46ec-a9df-02fe13b9c104", "description": "Bug fix: NULL content hash backfill must run on every initialize to cover post-migration inserts. Earlier code only backfilled when the column was created, which left merge from rows un-hashed and broke dedup.", "fixed at": "2026-05-10T10:17:51.737046", "critical regions": {"file": "world model server/knowledge graph.py", "lines": "120-135"} } , "warnings": "Lines 120-135 preserve fix for 12457e2a-5638-46ec-a9df-02fe13b9c104: Bug fix: NULL content hash backfill must run on every initialize to cover post-migration inserts. Earlier code only backfilled when the column was created, which left merge from rows un-hashed and broke dedup." } The risk score is 0.6 because the proposed change touched a critical region without re-implementing the fix. The warning text quotes the original bug description directly so the agent or the human can see why the region matters, not just that it does. 3. A contradiction resolved by confidence + source-count weighting The temporal layer assigns each fact a confidence score, a source count, and a valid at timestamp. When two facts about the same entity disagree, find contradictions surfaces them with both sides' metadata, and resolve contradiction picks a winner using the strategy you set. Two facts both pointing at the same entity http transport port : { "fact a id": "e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5", "fact b id": "7fe854f9-d64a-4304-b43a-7d1b126c6ebb", "fact a text": "HTTP transport listen port default is 8080", "fact b text": "HTTP transport listen port default is 8765", "similarity score": 0.929, "both valid": true, "reason": "same entity, similar text", "confidence a": 0.7, "confidence b": 0.95, "source count a": 1, "source count b": 3 } resolve contradiction strategy="auto" picks the strategy with the largest signal gap. Here source count differs 3:1, so it picks keep most sources: { "strategy": "keep most sources", "winner id": "7fe854f9-d64a-4304-b43a-7d1b126c6ebb", "loser id": "e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5", "resolved at": "2026-05-21T10:24:16.287368" } The loser is updated in place: { "id": "e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5", "fact text": "HTTP transport listen port default is 8080", "status": "superseded", "invalid at": "2026-05-21T10:24:16.285184", "confidence": 0.7 } Queries that ask "what's true now?" silently skip the superseded fact. Queries that ask "what was true on 2026-05-18?" still see it. That's what the temporal layer earns. 4. The PostCompact injection bundle v0.7.0 added a PostCompact hook that re-injects the top constraints and recent canonical facts after the agent's context is compacted. The bundle is small configurable, default ~10 constraints + 10 facts and prioritized. The actual bundle returned by get injection context event type="PostCompact", max constraints=5, max facts=5 : python3 -m twine check dist/ before tagging. Catches PyPI metadata errors before the tag is pushed; saves a retraction. violated 5x That bundle is what gets spliced into the agent's working context as additionalContext after a compaction event. The same query also runs on UserPromptSubmit, biased toward whatever the user just asked about. The compaction audit log records what happened, queryable via the CLI: $ world-model audit-compactions --limit 5 1 compaction audit rows 2026-05-21T10:38:01.606771 session=demo-session-1 pre=84320 post=22150 facts injected=10 constraints injected=3 event=PostCompact pre=84320, post=22150 — the compaction dropped ~62k tokens of context. The injection put 10 facts + 3 constraints back. The audit row exists so a human can later answer "what did the agent see vs what did it lose." 5. A defer decision that pauses a headless agent v0.7.0 added a defer enforcement tier between deny and warn. Warning-severity violations with violation count ≥ 5 return permissionDecision: "defer" when the client advertises support, so headless agents pause instead of silently passing or hard-blocking. Clients that do not advertise support fall back to ask automatically. I have a check-twine-before-tag constraint with violation count=5, severity=warning. When a Bash tool input matches it, the hook returns: { "hookSpecificOutput": { "hookEventName": "PreToolUse", "permissionDecision": "defer", "permissionDecisionReason": "Recurring warning-level violations check-twine-before-tag . Headless agents should pause for confirmation." } } Same payload, same constraint, but with supports defer: false in the request — fall back to ask: { "hookSpecificOutput": { "hookEventName": "PreToolUse", "permissionDecision": "ask", "permissionDecisionReason": "Recurring warning-level violations check-twine-before-tag . Headless agents should pause for confirmation." } } The defer tier exists because the binary deny / warn choice forces you to either be too strict or too permissive. Recurring warnings that don't rise to error-level should pause for a human, not block, not pass. What this means if you are building agents The reason this works is not that the tool is clever. It is that the substrate — a temporal knowledge graph with facts, constraints, contradictions, and decision traces — captures the right shape of information. Plain markdown rules in CLAUDE.md cannot answer: A graph can. The cost is one MCP server, ~2,000 lines of Python, and a SQLite database that sits at ~155 KB empty mine grew to about 2 MB after running this exercise plus the auto-seed . The payoff is a memory layer that survives compaction, enforces at the edit boundary, and tracks evidence chains back to the source. If you are building anything with Claude Code, Cursor, or any harness that supports MCP + hooks: pip install world-model-mcp cd /your/project python -m world model server.cli setup For Claude Managed Agents with self-hosted sandboxes where Anthropic's built-in Memory primitive is not yet supported , v0.7.2 added streamable HTTP transport so the same 25 MCP tools also work behind an MCP tunnel. Source: github.com/SaravananJaichandar/world-model-mcp. If world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one.