{"slug": "five-primitives-i-exercised-end-to-end-on-world-model-mcp-s-own-repo", "title": "Five primitives I exercised end-to-end on world-model-mcp's own repo", "summary": "Five primitives demonstrated on the world-model-mcp project's own codebase, including a learned constraint that denies edits violating a rule (e.g., using `console.log` instead of `logger.debug`) after three violations, and a regression warning that flags edits to files with recorded bug fixes. The system uses a SQLite database to store constraints, facts, and audit data, with all outputs shown as verbatim database responses that can be reproduced by cloning the repository and running setup and demo scripts. The project aims to help AI coding agents maintain context across compactions, avoid repeating mistakes, and prevent hallucinating non-existent APIs.", "body_md": "I shipped four releases of world-model-mcp in twelve days. v0.6.1 to v0.7.2. The pitch is \"AI coding agents lose context across compaction, repeat the same mistakes, and hallucinate APIs that do not exist.\" Before I write more about it I wanted to demonstrate the primitives on a real codebase, with real outputs, not screenshots someone has to take my word for.\nThe codebase is the project's own repo. I ran python -m world_model_server.cli setup (it auto-seeded 598 entities from the source), then ran scripts/demo_seed.py which inserts the small set of constraints, facts, and a compaction audit row that real PostToolUse / record_correction hook activity would write organically over one to two weeks of development with Claude Code installed.\nEvery output block below is verbatim from the actual SQLite database after running the actual command. You can reproduce every output here by cloning the repo, running python -m world_model_server.cli setup, then python scripts/demo_seed.py. The script is idempotent and supports --dry-run and --reset.\nInstall: pip install world-model-mcp. Source: github.com/SaravananJaichandar/world-model-mcp.\n1. A learned constraint denying an edit at the PreToolUse boundary\nWhen a developer corrects the agent (rewrites console.log to logger.debug), the PostToolUse hook records the diff and infers a rule. Once that rule's violation count crosses the hard-threshold (severity=error, count ≥ 3), the next attempt is denied at PreToolUse before the tool runs.\nThe constraint as the graph stores it:\n{\n\"rule_name\": \"no-console-log\",\n\"severity\": \"error\",\n\"violation_count\": 5,\n\"description\": \"Use logger.debug() not console.log() in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers.\",\n\"file_pattern\": \"*.ts\",\n\"examples\": [\n{\"incorrect\": \"console.log\", \"correct\": \"logger.debug\"}\n]\n}\nThe PreToolUse hook's actual JSON response when an edit containing console.log reaches it:\n{\n\"hookSpecificOutput\": {\n\"hookEventName\": \"PreToolUse\",\n\"permissionDecision\": \"deny\",\n\"permissionDecisionReason\": \"Hard constraint violation: no-console-log (Use logger.debug() not console.log() in TypeScript source. Production logs route through pino; console.log bypasses formatting and breaks downstream parsers.). Violated 5 times previously.\"\n},\n\"violations\": [\n{\n\"rule\": \"no-console-log\",\n\"severity\": \"error\",\n\"violation_count\": 5,\n\"is_hard\": true,\n\"is_defer\": false\n}\n]\n}\nRules in CLAUDE.md or AGENTS.md are advisory and the model treats them as suggestions. Rules with a violation count and an enforcement boundary at the edit step are binding. Both have the same source — a developer correcting the agent — but very different effect.\n2. A regression warning that flags edits to a file with a recorded bug fix\nget_related_bugs walks decision traces and prior bug-fix facts. When validate_change runs on a file with a recorded fix, the related-bugs query surfaces the prior fix and flags the proposed change.\nThe project has a bug fix on file world_model_server/knowledge_graph.py:120-135 for content-hash backfill (the migration logic must run on every initialize(), not just when the column is created). I proposed a refactor that removed the backfill loop and ran the related-bugs check:\n{\n\"risk_score\": 0.6,\n\"bugs\": [\n{\n\"bug_id\": \"12457e2a-5638-46ec-a9df-02fe13b9c104\",\n\"description\": \"Bug fix: NULL content_hash backfill must run on every initialize() to cover post-migration inserts. Earlier code only backfilled when the column was created, which left merge_from rows un-hashed and broke dedup.\",\n\"fixed_at\": \"2026-05-10T10:17:51.737046\",\n\"critical_regions\": [\n{\"file\": \"world_model_server/knowledge_graph.py\", \"lines\": \"120-135\"}\n]\n}\n],\n\"warnings\": [\n\"Lines 120-135 preserve fix for 12457e2a-5638-46ec-a9df-02fe13b9c104: Bug fix: NULL content_hash backfill must run on every initialize() to cover post-migration inserts. Earlier code only backfilled when the column was created, which left merge_from rows un-hashed and broke dedup.\"\n]\n}\nThe risk score is 0.6 because the proposed change touched a critical region without re-implementing the fix. The warning text quotes the original bug description directly so the agent (or the human) can see why the region matters, not just that it does.\n3. A contradiction resolved by confidence + source-count weighting\nThe temporal layer assigns each fact a confidence score, a source_count, and a valid_at timestamp. When two facts about the same entity disagree, find_contradictions surfaces them with both sides' metadata, and resolve_contradiction picks a winner using the strategy you set.\nTwo facts both pointing at the same entity (http_transport_port):\n{\n\"fact_a_id\": \"e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5\",\n\"fact_b_id\": \"7fe854f9-d64a-4304-b43a-7d1b126c6ebb\",\n\"fact_a_text\": \"HTTP transport listen port default is 8080\",\n\"fact_b_text\": \"HTTP transport listen port default is 8765\",\n\"similarity_score\": 0.929,\n\"both_valid\": true,\n\"reason\": \"same entity, similar text\",\n\"confidence_a\": 0.7,\n\"confidence_b\": 0.95,\n\"source_count_a\": 1,\n\"source_count_b\": 3\n}\nresolve_contradiction(strategy=\"auto\") picks the strategy with the largest signal gap. Here source count differs 3:1, so it picks keep_most_sources:\n{\n\"strategy\": \"keep_most_sources\",\n\"winner_id\": \"7fe854f9-d64a-4304-b43a-7d1b126c6ebb\",\n\"loser_id\": \"e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5\",\n\"resolved_at\": \"2026-05-21T10:24:16.287368\"\n}\nThe loser is updated in place:\n{\n\"id\": \"e4b2ff84-8c23-4de5-aa9e-8bbb045a4ed5\",\n\"fact_text\": \"HTTP transport listen port default is 8080\",\n\"status\": \"superseded\",\n\"invalid_at\": \"2026-05-21T10:24:16.285184\",\n\"confidence\": 0.7\n}\nQueries that ask \"what's true now?\" silently skip the superseded fact. Queries that ask \"what was true on 2026-05-18?\" still see it. That's what the temporal layer earns.\n4. The PostCompact injection bundle\nv0.7.0 added a PostCompact hook that re-injects the top constraints and recent canonical facts after the agent's context is compacted. The bundle is small (configurable, default ~10 constraints + 10 facts) and prioritized.\nThe actual bundle returned by get_injection_context(event_type=\"PostCompact\", max_constraints=5, max_facts=5):\npython3 -m twine check dist/*\nbefore tagging. Catches PyPI metadata errors before the tag is pushed; saves a retraction. (violated 5x)That bundle is what gets spliced into the agent's working context as additionalContext after a compaction event. The same query also runs on UserPromptSubmit, biased toward whatever the user just asked about.\nThe compaction audit log records what happened, queryable via the CLI:\n$ world-model audit-compactions --limit 5\n1 compaction audit rows\n2026-05-21T10:38:01.606771 session=demo-session-1 pre=84320\npost=22150 facts_injected=10 constraints_injected=3 event=PostCompact\npre=84320, post=22150 — the compaction dropped ~62k tokens of context. The injection put 10 facts + 3 constraints back. The audit row exists so a human can later answer \"what did the agent see vs what did it lose.\"\n5. A defer decision that pauses a headless agent\nv0.7.0 added a defer enforcement tier between deny and warn. Warning-severity violations with violation_count ≥ 5 return permissionDecision: \"defer\" when the client advertises support, so headless agents pause instead of silently passing or hard-blocking. Clients that do not advertise support fall back to ask automatically.\nI have a check-twine-before-tag constraint with violation_count=5, severity=warning. When a Bash tool input matches it, the hook returns:\n{\n\"hookSpecificOutput\": {\n\"hookEventName\": \"PreToolUse\",\n\"permissionDecision\": \"defer\",\n\"permissionDecisionReason\": \"Recurring warning-level violations (check-twine-before-tag). Headless agents should pause for confirmation.\"\n}\n}\nSame payload, same constraint, but with supports_defer: false in the request — fall back to ask:\n{\n\"hookSpecificOutput\": {\n\"hookEventName\": \"PreToolUse\",\n\"permissionDecision\": \"ask\",\n\"permissionDecisionReason\": \"Recurring warning-level violations (check-twine-before-tag). Headless agents should pause for confirmation.\"\n}\n}\nThe defer tier exists because the binary deny / warn choice forces you to either be too strict or too permissive. Recurring warnings that don't rise to error-level should pause for a human, not block, not pass.\nWhat this means if you are building agents\nThe reason this works is not that the tool is clever. It is that the substrate — a temporal knowledge graph with facts, constraints, contradictions, and decision traces — captures the right shape of information.\nPlain markdown rules in CLAUDE.md cannot answer:\nA graph can. The cost is one MCP server, ~2,000 lines of Python, and a SQLite database that sits at ~155 KB empty (mine grew to about 2 MB after running this exercise plus the auto-seed). The payoff is a memory layer that survives compaction, enforces at the edit boundary, and tracks evidence chains back to the source.\nIf you are building anything with Claude Code, Cursor, or any harness that supports MCP + hooks:\npip install world-model-mcp\ncd /your/project\npython -m world_model_server.cli setup\nFor Claude Managed Agents with self-hosted sandboxes (where Anthropic's built-in Memory primitive is not yet supported), v0.7.2 added streamable HTTP transport so the same 25 MCP tools also work behind an MCP tunnel.\nSource: github.com/SaravananJaichandar/world-model-mcp.\nIf world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one.", "url": "https://wpnews.pro/news/five-primitives-i-exercised-end-to-end-on-world-model-mcp-s-own-repo", "canonical_source": "https://dev.to/saravananj2294/five-primitives-i-exercised-end-to-end-on-world-model-mcps-own-repo-moo", "published_at": "2026-05-21 05:36:21+00:00", "updated_at": "2026-05-21 06:22:14.088182+00:00", "lang": "en", "topics": ["artificial-intelligence", "developer-tools", "open-source", "large-language-models", "products"], "entities": ["world-model-mcp", "Claude Code", "SaravananJaichandar", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/five-primitives-i-exercised-end-to-end-on-world-model-mcp-s-own-repo", "markdown": "https://wpnews.pro/news/five-primitives-i-exercised-end-to-end-on-world-model-mcp-s-own-repo.md", "text": "https://wpnews.pro/news/five-primitives-i-exercised-end-to-end-on-world-model-mcp-s-own-repo.txt", "jsonld": "https://wpnews.pro/news/five-primitives-i-exercised-end-to-end-on-world-model-mcp-s-own-repo.jsonld"}}