Show HN: Memory layer for Claude Code(+10.2 pts on SWE-bench Verified benchmark)

A developer released World Model MCP, a memory layer for AI coding agents that uses a temporal knowledge graph to prevent repeated mistakes, achieving a +10.2 point improvement on the SWE-bench Verified benchmark across 49 instances. The tool validates code changes against learned constraints, re-injects context after compaction, and resolves contradictions, supporting Claude Code, Cursor, and other MCP-aware agents.

Enforcement, provenance, and harness-neutral memory for AI coding agents. A temporal knowledge graph that validates code changes against learned constraints at the edit boundary, re-injects relevant context after compaction, tracks contradictions with confidence-weighted resolution, and runs across Claude Code, Cursor, and pi. Status: v0.9.1— 26 MCP tools, 19 CLI subcommands, 375 tests, SWE-bench Verified repeat-mistake benchmark with +10.2 pts paired delta across 49 instances +15.0 pts within-domain, +6.9 pts cross-domain , 105-pair contradiction-resolution benchmark. v0.9 ships the empirical wedge proof: a locked, pre-registered methodology tested whether the persistent-knowledge layer measurably reduces repeated coding-agent mistakes on a public task corpus. Result confirms positive within-domain and cross-domain effects with zero observed regressions on out-of-domain tasks. Full per-task tables, mechanistic analysis of the two cross-domain flips sphinx-9461 is the cleanest case , and honest limitations in . v0.8.1 expanded the contradiction-resolution benchmark to 105 pairs across 19 categories. v0.8.0 added domain-aware confidence decay with per-evidence-type TTL, per-item provenance fields benchmarks/repeat-mistake/RESULTS.md source tool and confirmer , slash command write operations, and a confirmer parameter on resolve contradiction . Antigravity adapter held for the fourth consecutive release pending a TransformCompactionHook in the SDK; next re-verify 2026-07-24. v0.7.6 added the /world-model slash command and status-watch TUI widget. v0.7.5 added the Codex CLI adapter. v0.7.0 introduced PostCompact auto-injection, the defer enforcement tier, confidence-weighted contradiction resolution, and a compaction audit log. Contributions welcome. mcp-name: io.github.SaravananJaichandar/world-model-mcp If world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one and the feedback shapes what ships next. World Model MCP creates a temporal knowledge graph of your codebase that learns from every coding session to: Prevent Hallucinations -- Validates API/function references against known entities before use Stop Repeated Mistakes -- Learns constraints from corrections, applies them in future sessions Reduce Regressions -- Tracks bug fixes and warns when changes touch critical regions Survive Compaction -- Re-injects top constraints and recent facts after the agent's context window resets Resolve Contradictions -- Picks a winner between conflicting facts using confidence, recency, or source count Think of it as a long-term memory layer that runs alongside Claude Code, Cursor, or any MCP-aware coding agent. - Repeat-mistake benchmark on SWE-bench Verified — the central wedge proof. 50 SWE-bench Verified tasks across django, sympy, matplotlib, scikit-learn, and sphinx, run as a paired baseline-vs-treatment comparison. Methodology was locked aton 2026-06-17 before the data existed so the result cannot be accused of goalpost-moving. benchmarks/repeat-mistake/DESIGN.md - Headline results — Subset 1 within-domain: django + sympy baseline 15/20 = 75.0 percent, treatment 18/20 = 90.0 percent, delta +15.0 pts with 4 FAIL to PASS flips and 1 regression. Subset 2 cross-domain: matplotlib + scikit-learn + sphinx baseline 18/29 = 62.1 percent, treatment 20/29 = 69.0 percent, delta +6.9 pts with 2 flips and zero regressions. Combined paired result across 49 instances: 33/49 to 38/49, delta +10.2 pts. - Cross-domain transfer isolated cleanly — the Subset 2 treatment arm loaded ONLY the 4 Subset 1 constraints django and sympy directives , holding out the 11 Subset 2 constraints to test whether learning from one repo family generalizes to a different one. Two cross-domain flips with plausible mechanistic explanations grounded in the loaded constraints. Sphinx-9461 is the strongest case: a sympy classmethod constraint transferred to a sphinx classmethod-wrapper unwrapping bug. - Honest caveats embedded in RESULTS.md — seven explicit limitations including single-trial design, constraint-failure overlap on Subset 1, the small cross-domain transfer rate, one dropped instance due to an upstream SWE-bench pip flag issue, and judge-model self-reference risk. Stated verbatim rather than hidden in an appendix. - Full reproducibility artifacts — every progress JSONL, predictions JSON, results JSONL, classification JSONL, constraints JSON, and harness report JSON committed in. Locked judge prompts in benchmarks/repeat-mistake/ failure classifier.py and learning hook.py . Total agent cost across both arms was approximately 90 USD on a Claude Code subscription. - Contradiction-resolution benchmark expansion -- the v0.7.4 24-pair benchmark grew to 105 hand-curated pairs across 19 categories. Six new categories exercise the v0.8.0 schema specifically: source tool corroboration , confirmer overrides pending , decay advantage session vs source , decay advantage stale session , evidence type user correction , settled beats higher confidence . Deterministic runner at; full per-strategy + per-category breakdown at benchmarks/contradictions-200/run.py . benchmarks/contradictions-200/RESULTS.md - Honest framing on the numbers : the new dataset is harder than v0.7.4's 24-pair set because the new categories deliberately test schema awareness confirmer, evidence type, decay rather than raw confidence ranking. Headline numbers: keep most sources 99.0%, keep higher confidence 81.0%, auto 77.1%, keep higher confidence decayed 90.5% on the 21 pairs where evidence type is present , overall 78.2% across all strategies. The original 24-pair v0.7.4 93.5% number is preserved unchanged at benchmarks/contradictions/ and is not invalidated; it tested a different smaller, easier corpus. - The wedge benchmark is v0.9 : "does the learning loop measurably reduce repeated coding-agent mistakes on a public task corpus?" The contradiction-resolution work in this release is internal schema-correctness validation. The empirical artifact that maps to the published essay framing — the learning loop is the durable layer — lands in v0.9 with a SWE-bench-style repeat-mistake benchmark. - Domain-aware confidence decay -- new world model server/decay.py module with exponential half-life decay per evidence type . Half-lives: source code 365d, test 180d, session 14d, user correction 730d, bug fix 365d. Decay applies on read no background task , so the next query fact call returns the time-corrected confidence. Settled facts canonical status, or any fact with confirmer = NULL never auto-transition. Synthesized facts that decay below 0.2 confidence and corroborated facts that decay below 0.1 confidence auto-supersede on read, surfacing rot to the next compaction injection. - Per-item provenance fields on facts -- three additive columns source tool TEXT , confirmer TEXT , last decay at TIMESTAMP , all NULL-defaulted, no backfill. source tool records which tool wrote the fact e.g. claude code , codex , cursor , pi , user . confirmer records who confirmed it, distinct from the asserter; NULL means pending, non-NULL means settled. Both are exposed on the Fact model and propagated through create fact . Honors the public commitment to Patdolitse anthropics/claude-code 47023 https://github.com/anthropics/claude-code/issues/47023 and ferhimedamine openai/codex 19195 https://github.com/openai/codex/issues/19195 . - Slash command write operations -- two new subcommands. /world-model resolve <id marks a contradiction as resolved manual; for confidence-weighted picking use the resolve contradiction MCP tool . /world-model forget <id sets invalid at on a fact preserved in the audit log; current-only reads skip it from then on . Both are idempotent and report cleanly on unknown ids. Help text now lists both alongside the read-only subcommands shipped in v0.7.6. - -- when a resolve contradiction accepts confirmer confirmer argument is provided to the MCP tool or its underlying resolve function, the winning fact gets its confirmer column stamped with that value. This is the spec primitive that distinguishes "the asserter says X" from "X is confirmed by Y" per the working group sketch. - Antigravity adapter held for the third consecutive release. The 2026-06-13 re-verification found OnCompactionHook declared as InspectHook in the SDK with no TransformCompactionHook and no additional context return field. The load-bearing memory-injection contract still does not exist in the SDK. Next re-verify 2026-06-27. In-agent -- typed by the user inside the agent harness, surfaces the world model state without leaving the chat. Read-only in v0.7.6 /world-model slash command status , contradictions , recent , help ; write operations resolve , forget land in v0.8. Works across Claude Code, Cursor, Codex, and pi by intercepting UserPromptSubmit in the existing inject helper . Returns additionalContext in the strict camelCase shape Codex enforces deny unknown fields , so the same wire-up serves all four harnesses without a per-harness branch.-- terminal pane that runs alongside the agent and refreshes every 5 seconds. Shows constraints total, severity=error, severity=warning , unresolved contradictions, facts canonical / synthesized / superseded , and last compaction time. Built on the world-model status-watch TUI widget rich library already in the dependency tree; falls back to a plain-text one-shot dump when rich is not installed. Antigravity CLI adapter intentionally NOT shipped in this release -- the re-verification on 2026-06-13 against google-antigravity/antigravity-sdk-python HEAD surfaced an architectural gap: OnCompactionHook is declared as an InspectHook read-only, non-blocking with no additional context return field and no TransformCompactionHook subclass. The load-bearing memory-injection contract does not exist in the SDK today. Targeting 2026-06-27 for the next re-verification; v0.7.6 ships without Antigravity rather than against a contract that cannot do the work. Codex CLI adapter -- new install-codex CLI subcommand appends a mcp servers.world model block plus PreToolUse, PostToolUse, PostCompact, and SessionStart hooks to ~/.codex/config.toml . The bundled snippet was verified against openai/codex@main at v0.138.0-alpha server name uses underscore to dodge the tool-name hyphen-strip in codex-rs/codex-mcp/src/mcp/mod.rs ; hook output sticks to camelCase with deny unknown fields compliance . Schema regression tests in tests/test v075 features.py lock the contract down. See adapters/codex/README.md /SaravananJaichandar/world-model-mcp/blob/main/adapters/codex/README.md . Dual-shape payload normalization in -- both helpers now accept either Claude Code's payload shape hook helper and inject helper event , project dir or Codex's hook event name , cwd , so the same Python code drives all four adapters Claude Code, Cursor, pi, Codex . Antigravity CLI adapter intentionally NOT shipped this release -- the Antigravity API surface is still settling six 1.0.x releases in three weeks, the url field for HTTP MCP servers landed June 3, hook JSON event-name casing remains undocumented . Targeting June 25 for that adapter after the API stabilizes. Detailed reasoning in the v0.7.5 RELEASE NOTES entry. AGENTS.md / -- world-model-mcp now reads declarative project conventions from .agents/skills/ constraint reader AGENTS.md , CLAUDE.md , GEMINI.md , and .agents/skills/ .md files and mixes them into PreToolUse enforcement alongside the SQLite-backed constraints. Supports structured fence blocks constraint and YAML frontmatter and heuristic imperative-sentence extraction for prose-style AGENTS.md files. New MCP tool: get agents md constraints . anthropics/claude-code 6235 https://github.com/anthropics/claude-code/issues/6235 has 4,000+ thumbs-up for AGENTS.md as the cross-agent format. Self-hosted Claude Managed Agents deployment guide -- Anthropic's official position https://claude.com/blog/claude-managed-agents-updates : "Memory is not yet supported in self-hosted sessions." world-model-mcp fills that gap. New guide at, with a docs/deployment/managed-agents-self-hosted.md Modal quickstart /SaravananJaichandar/world-model-mcp/blob/main/examples/managed-agents-self-hosted you can deploy in under five minutes. Reproducible contradiction-resolution benchmark -- 24-pair dataset at, runner at benchmarks/contradictions/dataset.jsonl , results at benchmarks/contradictions/run.py . Headline: 93.5% overall accuracy, 100% on benchmarks/contradictions/RESULTS.md keep higher confidence and keep most sources , with documented honest weaknesses on tie-handling and small confidence gaps. Re-run with python benchmarks/contradictions/run.py . CI workflow guards regressions. -- one command to see every primitive working. Initializes the knowledge graph, seeds reproducible demo data via world-model demo scripts/demo seed.py , then exercises each primitive PreToolUse enforcement, contradiction detection, PostCompact injection, audit log with real outputs. New users can see the value without writing any code. Opt-in telemetry -- off by default, prompted once during world-model setup , inspectable with world-model telemetry --status , disabled with world-model telemetry --disable . No file paths, no code, no identifiers tied to a person. See Privacy and Security privacy-and-security for the exact payload. pi adapter -- new adapters/pi/ package. world-model-mcp now plugs into earendil-works/pi https://github.com/earendil-works/pi via pi's extension API tool call - PreToolUse, context - auto-injection, session compact - audit log . Install with world-model install-pi . PostCompact / UserPromptSubmit auto-injection -- when the agent's context is compacted, the hook automatically splices the top constraints and recent canonical facts back into the next turn. Configurable, fails open.-- PreToolUse now classifies recurring warning-level violations as defer enforcement tier defer , which pauses headless agents with graceful fallback to ask on older clients instead of either hard-denying or silently passing through. Confidence-weighted contradiction resolution -- the new resolve contradiction tool picks a winner using keep higher confidence , keep most recent , keep most sources , or auto . The loser is marked superseded. Compaction audit log -- every PostCompact event writes a row with pre/post token counts and what was re-injected. Query with the audit-compactions CLI or export to JSONL. Cursor adapter -- harness-neutral hooks under adapters/cursor/ . Same Python helpers, different manifest format. Streamable HTTP transport v0.7.2 -- WORLD MODEL TRANSPORT=http so the same 25 MCP tools work behind an MCP tunnel for Claude Managed Agents with self-hosted sandboxes. See docs/deployment/mcp-tunnel.md /SaravananJaichandar/world-model-mcp/blob/main/docs/deployment/mcp-tunnel.md . Download the latest .mcpb from Releases https://github.com/SaravananJaichandar/world-model-mcp/releases/latest and drag it into Claude Desktop. Auto-installs hooks, MCP server config, and dependencies. 1. Install the package pip install world-model-mcp 2. Setup in your project auto-seeds the knowledge graph from existing code cd /path/to/your/project python -m world model server.cli setup 3. Restart Claude Code Done The world model is pre-populated and active You can also re-seed or seed manually at any time: Seed from existing codebase world-model seed Re-seed with force re-processes already seeded files world-model seed --force For Claude Managed Agents with self-hosted sandboxes, or any deployment where the MCP server lives behind a firewall and the agent reaches it from Anthropic-side infrastructure, run world-model-mcp in HTTP mode. pip install 'world-model-mcp http ' export WORLD MODEL TRANSPORT=http export WORLD MODEL HTTP PORT=8765 python -m world model server.server Or use the bundled image: docker compose up -d Dockerfile.http + persistent volume curl http://127.0.0.1:8765/healthz {"status":"ok","version":"0.7.2"} Full walkthrough including Anthropic MCP tunnels setup: docs/deployment/mcp-tunnel.md /SaravananJaichandar/world-model-mcp/blob/main/docs/deployment/mcp-tunnel.md . Stdio remains the default transport for Claude Code, Cursor, and .mcpb installs. Nothing changes for those flows. To see every primitive working with real outputs from a real SQLite database before committing to a full install: pip install world-model-mcp cd /tmp/wm-test && mkdir -p wm-test && cd wm-test world-model demo The demo initializes a knowledge graph, seeds reproducible data, and exercises PreToolUse enforcement, contradiction detection, the PostCompact injection bundle, and the compaction audit log -- with the actual JSON outputs. Re-runs are idempotent. For users of earendil-works/pi https://github.com/earendil-works/pi : pip install world-model-mcp the Python helpers world-model install-pi writes adapters/world-model-pi/ pi install local:./adapters/world-model-pi The pi adapter wires the same hook helper and inject helper you'd use from Claude Code into pi's tool call , context , and session compact events. See adapters/pi/README.md /SaravananJaichandar/world-model-mcp/blob/main/adapters/pi/README.md . For users of OpenAI's Codex CLI https://github.com/openai/codex : pip install world-model-mcp the Python helpers python -m world model server.cli install-codex appends mcp servers.world model + hook blocks to ~/.codex/config.toml Restart codex; verify with: codex mcp list --dry-run prints what would be appended without writing; --force re-appends even if the adapter marker is already present. The bundled snippet uses world model underscore as the MCP server name to dodge Codex's silent hyphen-strip in its tool-name sanitizer. Hook output is camelCase with deny unknown fields compliance against Codex's strict Rust schema; the contract is locked down by tests in tests/test v075 features.py . See adapters/codex/README.md /SaravananJaichandar/world-model-mcp/blob/main/adapters/codex/README.md . your-project/ ├── .mcp.json MCP server configuration ├── .claude/ │ ├── settings.json Hook configuration │ ├── hooks/ Compiled TypeScript hooks │ └── world-model/ SQLite databases ~155 KB Before: js // Claude invents an API that doesn't exist const user = await User.findByEmail email ; // This method doesn't exist After: js // Claude checks the world model first const user = await User.findOne { email } ; // Verified to exist Goal : Reduce non-existent API references by validating against the knowledge graph Session 1 : User corrects Claude // Claude writes: console.log 'debug info' ; // User corrects to: logger.debug 'debug info' ; // World model learns: "Use logger.debug not console.log " Session 2 : Claude uses the learned pattern // Claude automatically writes: logger.debug 'debug info' ; // No correction needed Goal : Learned patterns persist across sessions and prevent repeat violations // Week 1: Bug fixed null check added if user && user.email { ... } // Week 2: Refactoring // World model warns: "This line preserves a critical bug fix" // Claude preserves the null check // Result: Bug not re-introduced Goal : Detect potential regressions before code execution ┌──────────────────────────────────────────────────────────┐ │ Claude Code + Hooks │ │ Captures: file edits, tool calls, user corrections │ └──────────────────────────────────────────────────────────┘ | v ┌──────────────────────────────────────────────────────────┐ │ MCP Server Python │ │ - 22 MCP tools for querying/recording/predicting │ │ - LLM-powered entity extraction Claude Haiku │ │ - External linter integration ESLint, Pylint, Ruff │ └──────────────────────────────────────────────────────────┘ | v ┌──────────────────────────────────────────────────────────┐ │ Knowledge Graph SQLite + FTS5 │ │ - entities.db: APIs, functions, classes │ │ - facts.db: Temporal assertions with evidence │ │ - relationships.db: Entity dependency graph │ │ - constraints.db: Learned rules from corrections │ │ - sessions.db: Session history and outcomes │ │ - events.db: Activity log with reasoning chains │ └──────────────────────────────────────────────────────────┘ - Temporal Facts : Every fact has validAt and invalidAt timestamps- "Function X existed from 2024-01-15 to 2024-03-20" - Query: "What was true on March 1st?" - Evidence Chains : Every assertion traces back to source- Fact - Session - Event - Source Code Location - Constraint Learning : Pattern recognition from user corrections- Automatic rule type inference linting, architecture, testing - Severity detection error, warning, info - Example generation for future reference - Dual Validation : Combines two validation sources- World model constraints learned from user - External linters ESLint, Pylint, Ruff Twenty-two MCP tools available to Claude Code: Check if APIs/functions exist before using them result = query fact query="Does User.findByEmail exist?", entity type="function" Returns: {exists: bool, confidence: float, facts: ... } Capture development activity with reasoning chains record event event type="file edit", file path="src/api/auth.ts", reasoning="Added JWT authentication middleware" Pre-execution validation against constraints and linters result = validate change file path="src/api/auth.ts", proposed content="..." Returns: {safe: bool, violations: ... , suggestions: ... } Retrieve project-specific rules for a file constraints = get constraints file path="src/ / .ts", constraint types= "linting", "architecture" Learn from user edits HIGH PRIORITY record correction claude action={...}, user correction={...}, reasoning="Use logger.debug instead of console.log" Regression risk assessment result = get related bugs file path="src/api/auth.ts", change description="refactoring authentication logic" Returns: {bugs: ... , risk score: float, critical regions: ... } Scan the codebase and populate the knowledge graph with entities and relationships result = seed project project dir=".", force=False Returns: {files seeded: int, entities created: int, relationships created: int} Pull GitHub PR review comments and convert team feedback into constraints result = ingest pr reviews repo="owner/repo", Auto-detected from git remote if omitted count=10 Returns: {prs scanned: int, constraints created: int, constraints updated: int} - 5-minute setup guide QUICKSTART.md /SaravananJaichandar/world-model-mcp/blob/main/QUICKSTART.md - Contribution guidelines CONTRIBUTING.md /SaravananJaichandar/world-model-mcp/blob/main/CONTRIBUTING.md - Version history and features RELEASE NOTES.md /SaravananJaichandar/world-model-mcp/blob/main/RELEASE NOTES.md Run tests pytest With coverage pytest --cov=world model server --cov-report=html 186 tests covering knowledge graph CRUD, FTS5 search, constraint management, bug tracking, auto-seeding, PR review ingestion, decision traces, outcome linkage, trajectory learning, prediction layer, memory health, contradiction detection, transcript pointers, project identity, and PreToolUse enforcement. See tests/ /SaravananJaichandar/world-model-mcp/blob/main/tests for details. Database location default: ./.claude/world-model/ export WORLD MODEL DB PATH="/custom/path" Anthropic API key optional - enables LLM extraction IMPORTANT: Never commit this Use .env file see .env.example export ANTHROPIC API KEY="your-api-key-here" Model selection export WORLD MODEL EXTRACTION MODEL="claude-3-haiku-20240307" Fast export WORLD MODEL REASONING MODEL="claude-3-5-sonnet-20241022" Accurate Debug mode export WORLD MODEL DEBUG=1 Note : Create a .env file in your project root see .env.example - it's automatically ignored by git. Edit .claude/settings.json to customize which tools trigger world model hooks: { "hooks": { "PostToolUse": { "matcher": "Edit|Write|Bash", "hooks": ... } } } Currently Supported : - TypeScript / JavaScript - Python Coming Soon : - Go, Rust, Java, C++ Extensible Architecture : Easy to add new language parsers see CONTRIBUTING.md /SaravananJaichandar/world-model-mcp/blob/main/CONTRIBUTING.md Local-First : All knowledge graph data stays on your machine. Optional LLM : Works without API key uses regex patterns as fallback . Encrypted Storage : SQLite databases are local files encrypt your disk for security . v0.7.3 added anonymous usage telemetry. It is: Off by default. You have to explicitly opt in. Asked once during world-model setup , with a clear y/N prompt. Inspectable : world-model telemetry --status shows the exact JSON payload that would be sent. Disable any time with world-model telemetry --disable , or globally with WORLD MODEL TELEMETRY DISABLE=1 . Skipped in non-TTY environments CI, scripts so it never blocks an automated setup. What we send only if you opt in : | Field | Example | Why | |---|---|---| event | setup completed , demo run , hook fired | Which lifecycle step ran | version | 0.7.3 | Which release you're on | install id | random UUID at ~/.world-model/install id | Distinguish installs without identifying users | ts | unix timestamp | When the event fired | What we never send: file paths, file contents, rule names, hostnames, IP addresses, API keys, decision-trace text, fact text, or anything else that could identify a person or leak business logic. The full payload schema lives in world model server/telemetry.py . Where it goes: opt-in events are posted to a dedicated private GitHub repo SaravananJaichandar/world-model-telemetry as plain issues. There is no third-party analytics service, no cookie, no fingerprint. The PAT embedded in the client is scoped to that one repo with Issues: write only. - Entity extraction from code changes - Constraint inference from corrections - Never sends: Credentials, secrets, PII - Never commit .env files - Use .env.example as template - Store API keys in environment variables or .env files only - The .gitignore automatically excludes sensitive files - Auto-seeding: knowledge graph populates from existing codebase on setup - PR Review Intelligence: ingest GitHub review comments as constraints - Relationship tracking: import and dependency graph between entities - Multi-language support: Python, TypeScript/JavaScript, Solidity, Go, Rust - CLI query command for knowledge graph lookups - 40 tests, 8 MCP tools - Module-level matching: query by module name finds the file and its contents - Incremental re-seeding: only re-process files changed since last seed - Fuzzy entity matching: approximate name search for typos and abbreviations - Query caching: in-memory cache with TTL for repeated lookups - Java support: complete multi-language coverage - MCP server pipeline validation on real projects - Outcome linkage: test failures linked to code changes with facts - Trajectory learning: co-edit patterns tracked across sessions - Decision trace capture: structured log of agent proposals and human corrections - Cross-project entity search with project registry - 5 new MCP tools 13 total , 104 tests - Regression prediction, "what if" simulation, test failure prediction - Multi-project knowledge transfer, memory health, fact TTL/decay - get context for action pre-edit bundle, constraint violation tracking, find contradictions - 20 MCP tools, 151 tests - PreToolUse constraint enforcement hook: deny hard violations at the edit boundary - Indexed transcript pointers: hydrate any fact back to source conversation - Project identity decoupling: stable UUID across directory renames - Content-hash deduplication for facts and constraints - Auto-generate CLAUDE.md from the knowledge graph - BetaAbstractMemoryTool subclass for Anthropic SDK integration - Desktop Extension .mcpb packaging for Claude Desktop - 22 MCP tools, 13 CLI subcommands, 186 tests - PostCompact and UserPromptSubmit auto-injection: re-emit top constraints and recent facts after context loss - defer enforcement tier in PreToolUse: pause headless agents on recurring warning-level violations, with graceful fallback to ask - Confidence-weighted contradiction resolution: pick a winner using confidence, recency, or source count, with an auto strategy - Compaction audit log: query and export what was remembered across each compaction boundary - Cursor adapter package - 25 MCP tools, 14 CLI subcommands, 220 tests - HTTP transport mode for remote / MCP-tunnel deployment - /healthz endpoint, Dockerfile.http, docker-compose.yml - docs/deployment/mcp-tunnel.md walkthrough for Claude Managed Agents - 236 tests - world-model demo guided tour for first-time users - Opt-in anonymous telemetry, off by default, inspectable - pi-package adapter adapters/pi/ , install-pi CLI - 17 CLI subcommands, 256 tests - AGENTS.md / .agents/skills/ constraint reader new MCP tool: get agents md constraints - Self-hosted Claude Managed Agents deployment guide + Modal quickstart - Reproducible contradiction-resolution benchmark 24-pair dataset, CI workflow, RESULTS.md - 26 MCP tools, 17 CLI subcommands, 283 tests - Codex CLI adapter install-codex , shipped 2026-06-05 - In-agent /world-model slash command read-only: status, contradictions, recent, help - world-model status-watch TUI status widget - Decay + provenance schema: source tool , confirmer , last decay at columns on facts. Per-evidence-type TTL with domain-aware half-lives source code 365d, test 180d, session 14d, user correction 730d, bug fix 365d . - Slash command write operations /world-model resolve <id , /world-model forget <id . - resolve contradiction accepts confirmer to stamp the winning fact as settled. - Expanded contradiction-resolution benchmark: 24 → 105 pairs across 19 categories, including 6 new categories that test the v0.8.0 schema decay, provenance, confirmer . - Honest per-strategy + per-category RESULTS.md with the v0.7.4 number preserved as baseline. - Repeat-mistake benchmark on AI coding tasks . The empirical test of the central wedge: does the learning loop measurably reduce repeated agent mistakes? Runs against a SWE-bench-style task corpus with Claude Code headless, measures delta in repeat-mistake rate with vs without world-model-mcp learning the constraint from the first attempt. This is the artifact the visibility plan has been reaching for; it maps directly to the June 2026 essay https://medium.com/@saravanan 2424/your-ai-model-is-temporary-your-learning-loop-should-not-be-874380e8ccf7 framing. - auto strategy rewrite to fold in confirmer + decay awareness should lift the v0.8.1 benchmark's auto score from 77.1% past 90% . - Antigravity CLI adapter held since 2026-06-13; SDK lacks a TransformCompactionHook for the load-bearing memory-injection contract; re-verify 2026-06-27 . - MCP spec 2026-07-28 readiness stateless transport, meta headers, InputRequiredResult . - Cline adapter lower urgency after they shipped global AGENTS rules in v3.86 . Contributions are welcome. See CONTRIBUTING.md /SaravananJaichandar/world-model-mcp/blob/main/CONTRIBUTING.md for: - Development setup - Coding standards - Adding language support - Writing tests - Submitting PRs Areas where help is needed : - Language parsers Go, Rust, Java, C++ - Performance optimization - Documentation improvements - Real-world testing feedback Project Size : - ~4,800 lines of code - 13 Python modules - 3 TypeScript hook implementations Storage Efficiency : - Empty database: ~155 KB - Per entity: ~500 bytes - Per fact: ~800 bytes MIT License /SaravananJaichandar/world-model-mcp/blob/main/LICENSE - Free for commercial and personal use Issues : GitHub Issues https://github.com/SaravananJaichandar/world-model-mcp/issues Discussions : GitHub Discussions https://github.com/SaravananJaichandar/world-model-mcp/discussions