{"slug": "show-hn-memory-layer-for-claude-code-10-2-pts-on-swe-bench-verified-benchmark", "title": "Show HN: Memory layer for Claude Code(+10.2 pts on SWE-bench Verified benchmark)", "summary": "A developer released World Model MCP, a memory layer for AI coding agents that uses a temporal knowledge graph to prevent repeated mistakes, achieving a +10.2 point improvement on the SWE-bench Verified benchmark across 49 instances. The tool validates code changes against learned constraints, re-injects context after compaction, and resolves contradictions, supporting Claude Code, Cursor, and other MCP-aware agents.", "body_md": "**Enforcement, provenance, and harness-neutral memory for AI coding agents.** A temporal knowledge graph that validates code changes against learned constraints at the edit boundary, re-injects relevant context after compaction, tracks contradictions with confidence-weighted resolution, and runs across Claude Code, Cursor, and pi.\n\nStatus: v0.9.1— 26 MCP tools, 19 CLI subcommands, 375 tests, SWE-bench Verified repeat-mistake benchmark with +10.2 pts paired delta across 49 instances (+15.0 pts within-domain, +6.9 pts cross-domain), 105-pair contradiction-resolution benchmark. v0.9 ships the empirical wedge proof: a locked, pre-registered methodology tested whether the persistent-knowledge layer measurably reduces repeated coding-agent mistakes on a public task corpus. Result confirms positive within-domain and cross-domain effects with zero observed regressions on out-of-domain tasks. Full per-task tables, mechanistic analysis of the two cross-domain flips (sphinx-9461 is the cleanest case), and honest limitations in[. v0.8.1 expanded the contradiction-resolution benchmark to 105 pairs across 19 categories. v0.8.0 added domain-aware confidence decay with per-evidence-type TTL, per-item provenance fields]`benchmarks/repeat-mistake/RESULTS.md`\n\n`source_tool`\n\nand`confirmer`\n\n, slash command write operations, and a`confirmer`\n\nparameter on`resolve_contradiction`\n\n. Antigravity adapter held for the fourth consecutive release pending a`TransformCompactionHook`\n\nin the SDK; next re-verify 2026-07-24. v0.7.6 added the`/world-model`\n\nslash command and`status-watch`\n\nTUI widget. v0.7.5 added the Codex CLI adapter. v0.7.0 introduced PostCompact auto-injection, the`defer`\n\nenforcement tier, confidence-weighted contradiction resolution, and a compaction audit log. Contributions welcome.\n\nmcp-name: io.github.SaravananJaichandar/world-model-mcp\n\nIf world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one and the feedback shapes what ships next.\n\nWorld Model MCP creates a **temporal knowledge graph** of your codebase that learns from every coding session to:\n\n**Prevent Hallucinations**-- Validates API/function references against known entities before use** Stop Repeated Mistakes**-- Learns constraints from corrections, applies them in future sessions** Reduce Regressions**-- Tracks bug fixes and warns when changes touch critical regions** Survive Compaction**-- Re-injects top constraints and recent facts after the agent's context window resets** Resolve Contradictions**-- Picks a winner between conflicting facts using confidence, recency, or source count\n\nThink of it as a long-term memory layer that runs alongside Claude Code, Cursor, or any MCP-aware coding agent.\n\n-\n**Repeat-mistake benchmark on SWE-bench Verified**— the central wedge proof. 50 SWE-bench Verified tasks across django, sympy, matplotlib, scikit-learn, and sphinx, run as a paired baseline-vs-treatment comparison. Methodology was locked aton 2026-06-17 (before the data existed) so the result cannot be accused of goalpost-moving.`benchmarks/repeat-mistake/DESIGN.md`\n\n-\n**Headline results**— Subset 1 (within-domain: django + sympy) baseline 15/20 = 75.0 percent, treatment 18/20 = 90.0 percent, delta +15.0 pts with 4 FAIL to PASS flips and 1 regression. Subset 2 (cross-domain: matplotlib + scikit-learn + sphinx) baseline 18/29 = 62.1 percent, treatment 20/29 = 69.0 percent, delta +6.9 pts with 2 flips and zero regressions. Combined paired result across 49 instances: 33/49 to 38/49, delta +10.2 pts. -\n**Cross-domain transfer isolated cleanly**— the Subset 2 treatment arm loaded ONLY the 4 Subset 1 constraints (django and sympy directives), holding out the 11 Subset 2 constraints to test whether learning from one repo family generalizes to a different one. Two cross-domain flips with plausible mechanistic explanations grounded in the loaded constraints. Sphinx-9461 is the strongest case: a sympy classmethod constraint transferred to a sphinx classmethod-wrapper unwrapping bug. -\n**Honest caveats embedded in RESULTS.md**— seven explicit limitations including single-trial design, constraint-failure overlap on Subset 1, the small cross-domain transfer rate, one dropped instance due to an upstream SWE-bench pip flag issue, and judge-model self-reference risk. Stated verbatim rather than hidden in an appendix. -\n**Full reproducibility artifacts**— every progress JSONL, predictions JSON, results JSONL, classification JSONL, constraints JSON, and harness report JSON committed in. Locked judge prompts in`benchmarks/repeat-mistake/`\n\n`failure_classifier.py`\n\nand`learning_hook.py`\n\n. Total agent cost across both arms was approximately 90 USD on a Claude Code subscription.\n\n-\n**Contradiction-resolution benchmark expansion**-- the v0.7.4 24-pair benchmark grew to 105 hand-curated pairs across 19 categories. Six new categories exercise the v0.8.0 schema specifically:`source_tool_corroboration`\n\n,`confirmer_overrides_pending`\n\n,`decay_advantage_session_vs_source`\n\n,`decay_advantage_stale_session`\n\n,`evidence_type_user_correction`\n\n,`settled_beats_higher_confidence`\n\n. Deterministic runner at; full per-strategy + per-category breakdown at`benchmarks/contradictions-200/run.py`\n\n.`benchmarks/contradictions-200/RESULTS.md`\n\n-\n**Honest framing on the numbers**: the new dataset is harder than v0.7.4's 24-pair set because the new categories deliberately test schema awareness (confirmer, evidence_type, decay) rather than raw confidence ranking. Headline numbers:`keep_most_sources`\n\n99.0%,`keep_higher_confidence`\n\n81.0%,`auto`\n\n77.1%,`keep_higher_confidence_decayed`\n\n90.5% (on the 21 pairs where evidence_type is present), overall 78.2% across all strategies. The original 24-pair v0.7.4 93.5% number is preserved unchanged at`benchmarks/contradictions/`\n\nand is not invalidated; it tested a different (smaller, easier) corpus. -\n**The wedge benchmark is v0.9**: \"does the learning loop measurably reduce repeated coding-agent mistakes on a public task corpus?\" The contradiction-resolution work in this release is internal schema-correctness validation. The empirical artifact that maps to the published essay framing — the learning loop is the durable layer — lands in v0.9 with a SWE-bench-style repeat-mistake benchmark.\n\n-\n**Domain-aware confidence decay**-- new`world_model_server/decay.py`\n\nmodule with exponential half-life decay per`evidence_type`\n\n. Half-lives: source_code 365d, test 180d, session 14d, user_correction 730d, bug_fix 365d. Decay applies on read (no background task), so the next`query_fact`\n\ncall returns the time-corrected confidence. Settled facts (`canonical`\n\nstatus, or any fact with`confirmer != NULL`\n\n) never auto-transition. Synthesized facts that decay below 0.2 confidence and corroborated facts that decay below 0.1 confidence auto-supersede on read, surfacing rot to the next compaction injection. -\n**Per-item provenance fields on facts**-- three additive columns (`source_tool TEXT`\n\n,`confirmer TEXT`\n\n,`last_decay_at TIMESTAMP`\n\n), all NULL-defaulted, no backfill.`source_tool`\n\nrecords which tool wrote the fact (e.g.`claude_code`\n\n,`codex`\n\n,`cursor`\n\n,`pi`\n\n,`user`\n\n).`confirmer`\n\nrecords who confirmed it, distinct from the asserter; NULL means pending, non-NULL means settled. Both are exposed on the`Fact`\n\nmodel and propagated through`create_fact`\n\n. Honors the public commitment to Patdolitse ([anthropics/claude-code#47023](https://github.com/anthropics/claude-code/issues/47023)) and ferhimedamine ([openai/codex#19195](https://github.com/openai/codex/issues/19195)). -\n**Slash command write operations**-- two new subcommands.`/world-model resolve <id>`\n\nmarks a contradiction as resolved (manual; for confidence-weighted picking use the`resolve_contradiction`\n\nMCP tool).`/world-model forget <id>`\n\nsets`invalid_at`\n\non a fact (preserved in the audit log; current-only reads skip it from then on). Both are idempotent and report cleanly on unknown ids. Help text now lists both alongside the read-only subcommands shipped in v0.7.6. -\n-- when a`resolve_contradiction`\n\naccepts`confirmer`\n\n`confirmer`\n\nargument is provided to the MCP tool or its underlying`resolve`\n\nfunction, the winning fact gets its`confirmer`\n\ncolumn stamped with that value. This is the spec primitive that distinguishes \"the asserter says X\" from \"X is confirmed by Y\" per the working group sketch. -\n**Antigravity adapter held for the third consecutive release.** The 2026-06-13 re-verification found`OnCompactionHook`\n\ndeclared as`InspectHook`\n\nin the SDK with no`TransformCompactionHook`\n\nand no`additional_context`\n\nreturn field. The load-bearing memory-injection contract still does not exist in the SDK. Next re-verify 2026-06-27.\n\n**In-agent**-- typed by the user inside the agent harness, surfaces the world model state without leaving the chat. Read-only in v0.7.6 (`/world-model`\n\nslash command`status`\n\n,`contradictions`\n\n,`recent`\n\n,`help`\n\n); write operations (`resolve`\n\n,`forget`\n\n) land in v0.8. Works across Claude Code, Cursor, Codex, and pi by intercepting`UserPromptSubmit`\n\nin the existing`inject_helper`\n\n. Returns`additionalContext`\n\nin the strict camelCase shape Codex enforces (`deny_unknown_fields`\n\n), so the same wire-up serves all four harnesses without a per-harness branch.-- terminal pane that runs alongside the agent and refreshes every 5 seconds. Shows constraints (total, severity=error, severity=warning), unresolved contradictions, facts (canonical / synthesized / superseded), and last compaction time. Built on the`world-model status-watch`\n\nTUI widget`rich`\n\nlibrary already in the dependency tree; falls back to a plain-text one-shot dump when`rich`\n\nis not installed.**Antigravity CLI adapter intentionally NOT shipped in this release**-- the re-verification on 2026-06-13 against`google-antigravity/antigravity-sdk-python`\n\nHEAD surfaced an architectural gap:`OnCompactionHook`\n\nis declared as an`InspectHook`\n\n(read-only, non-blocking) with no`additional_context`\n\nreturn field and no`TransformCompactionHook`\n\nsubclass. The load-bearing memory-injection contract does not exist in the SDK today. Targeting 2026-06-27 for the next re-verification; v0.7.6 ships without Antigravity rather than against a contract that cannot do the work.\n\n**Codex CLI adapter**-- new`install-codex`\n\nCLI subcommand appends a`[mcp_servers.world_model]`\n\nblock plus PreToolUse, PostToolUse, PostCompact, and SessionStart hooks to`~/.codex/config.toml`\n\n. The bundled snippet was verified against`openai/codex@main`\n\nat v0.138.0-alpha (server name uses underscore to dodge the tool-name hyphen-strip in`codex-rs/codex-mcp/src/mcp/mod.rs`\n\n; hook output sticks to camelCase with`deny_unknown_fields`\n\ncompliance). Schema regression tests in`tests/test_v075_features.py`\n\nlock the contract down. See[adapters/codex/README.md](/SaravananJaichandar/world-model-mcp/blob/main/adapters/codex/README.md).**Dual-shape payload normalization in**-- both helpers now accept either Claude Code's payload shape (`hook_helper`\n\nand`inject_helper`\n\n`event`\n\n,`project_dir`\n\n) or Codex's (`hook_event_name`\n\n,`cwd`\n\n), so the same Python code drives all four adapters (Claude Code, Cursor, pi, Codex).**Antigravity CLI adapter intentionally NOT shipped this release**-- the Antigravity API surface is still settling (six 1.0.x releases in three weeks, the`url`\n\nfield for HTTP MCP servers landed June 3, hook JSON event-name casing remains undocumented). Targeting June 25 for that adapter after the API stabilizes. Detailed reasoning in the v0.7.5 RELEASE_NOTES entry.\n\n**AGENTS.md /**-- world-model-mcp now reads declarative project conventions from`.agents/skills/`\n\nconstraint reader`AGENTS.md`\n\n,`CLAUDE.md`\n\n,`GEMINI.md`\n\n, and`.agents/skills/*.md`\n\nfiles and mixes them into PreToolUse enforcement alongside the SQLite-backed constraints. Supports structured fence blocks (```` constraint`\n\nand YAML frontmatter) and heuristic imperative-sentence extraction for prose-style AGENTS.md files. New MCP tool:`get_agents_md_constraints`\n\n. ([anthropics/claude-code#6235](https://github.com/anthropics/claude-code/issues/6235)has 4,000+ thumbs-up for AGENTS.md as the cross-agent format.)**Self-hosted Claude Managed Agents deployment guide**-- Anthropic's[official position](https://claude.com/blog/claude-managed-agents-updates):*\"Memory is not yet supported in self-hosted sessions.\"*world-model-mcp fills that gap. New guide at, with a`docs/deployment/managed-agents-self-hosted.md`\n\n[Modal quickstart](/SaravananJaichandar/world-model-mcp/blob/main/examples/managed-agents-self-hosted)you can deploy in under five minutes.**Reproducible contradiction-resolution benchmark**-- 24-pair dataset at, runner at`benchmarks/contradictions/dataset.jsonl`\n\n, results at`benchmarks/contradictions/run.py`\n\n. Headline: 93.5% overall accuracy, 100% on`benchmarks/contradictions/RESULTS.md`\n\n`keep_higher_confidence`\n\nand`keep_most_sources`\n\n, with documented honest weaknesses on tie-handling and small confidence gaps. Re-run with`python benchmarks/contradictions/run.py`\n\n. CI workflow guards regressions.\n\n-- one command to see every primitive working. Initializes the knowledge graph, seeds reproducible demo data via`world-model demo`\n\n`scripts/demo_seed.py`\n\n, then exercises each primitive (PreToolUse enforcement, contradiction detection, PostCompact injection, audit log) with real outputs. New users can see the value without writing any code.**Opt-in telemetry**-- off by default, prompted once during`world-model setup`\n\n, inspectable with`world-model telemetry --status`\n\n, disabled with`world-model telemetry --disable`\n\n. No file paths, no code, no identifiers tied to a person. See[Privacy and Security](#privacy-and-security)for the exact payload.**pi adapter**-- new`adapters/pi/`\n\npackage. world-model-mcp now plugs into[earendil-works/pi](https://github.com/earendil-works/pi)via pi's extension API (`tool_call`\n\n-> PreToolUse,`context`\n\n-> auto-injection,`session_compact`\n\n-> audit log). Install with`world-model install-pi`\n\n.\n\n**PostCompact / UserPromptSubmit auto-injection**-- when the agent's context is compacted, the hook automatically splices the top constraints and recent canonical facts back into the next turn. Configurable, fails open.-- PreToolUse now classifies recurring warning-level violations as`defer`\n\nenforcement tier`defer`\n\n, which pauses headless agents (with graceful fallback to`ask`\n\non older clients) instead of either hard-denying or silently passing through.**Confidence-weighted contradiction resolution**-- the new`resolve_contradiction`\n\ntool picks a winner using`keep_higher_confidence`\n\n,`keep_most_recent`\n\n,`keep_most_sources`\n\n, or`auto`\n\n. The loser is marked superseded.**Compaction audit log**-- every PostCompact event writes a row with pre/post token counts and what was re-injected. Query with the`audit-compactions`\n\nCLI or export to JSONL.**Cursor adapter**-- harness-neutral hooks under`adapters/cursor/`\n\n. Same Python helpers, different manifest format.**Streamable HTTP transport (v0.7.2)**--`WORLD_MODEL_TRANSPORT=http`\n\nso the same 25 MCP tools work behind an MCP tunnel for Claude Managed Agents with self-hosted sandboxes. See[docs/deployment/mcp-tunnel.md](/SaravananJaichandar/world-model-mcp/blob/main/docs/deployment/mcp-tunnel.md).\n\nDownload the latest `.mcpb`\n\nfrom [Releases](https://github.com/SaravananJaichandar/world-model-mcp/releases/latest) and drag it into Claude Desktop. Auto-installs hooks, MCP server config, and dependencies.\n\n```\n# 1. Install the package\npip install world-model-mcp\n\n# 2. Setup in your project (auto-seeds the knowledge graph from existing code)\ncd /path/to/your/project\npython -m world_model_server.cli setup\n\n# 3. Restart Claude Code\n# Done! The world model is pre-populated and active\n```\n\nYou can also re-seed or seed manually at any time:\n\n```\n# Seed from existing codebase\nworld-model seed\n\n# Re-seed with force (re-processes already seeded files)\nworld-model seed --force\n```\n\nFor Claude Managed Agents with self-hosted sandboxes, or any deployment where the MCP server lives behind a firewall and the agent reaches it from Anthropic-side infrastructure, run world-model-mcp in HTTP mode.\n\n```\npip install 'world-model-mcp[http]'\n\nexport WORLD_MODEL_TRANSPORT=http\nexport WORLD_MODEL_HTTP_PORT=8765\npython -m world_model_server.server\n```\n\nOr use the bundled image:\n\n```\ndocker compose up -d                    # Dockerfile.http + persistent volume\ncurl http://127.0.0.1:8765/healthz      # {\"status\":\"ok\",\"version\":\"0.7.2\"}\n```\n\nFull walkthrough including Anthropic MCP tunnels setup:\n[docs/deployment/mcp-tunnel.md](/SaravananJaichandar/world-model-mcp/blob/main/docs/deployment/mcp-tunnel.md).\n\nStdio remains the default transport for Claude Code, Cursor, and `.mcpb`\n\ninstalls. Nothing changes for those flows.\n\nTo see every primitive working with real outputs from a real SQLite database before committing to a full install:\n\n```\npip install world-model-mcp\ncd /tmp/wm-test && mkdir -p wm-test && cd wm-test\nworld-model demo\n```\n\nThe demo initializes a knowledge graph, seeds reproducible data, and exercises PreToolUse enforcement, contradiction detection, the PostCompact injection bundle, and the compaction audit log -- with the actual JSON outputs. Re-runs are idempotent.\n\nFor users of [earendil-works/pi](https://github.com/earendil-works/pi):\n\n```\npip install world-model-mcp           # the Python helpers\nworld-model install-pi                # writes adapters/world-model-pi/\npi install local:./adapters/world-model-pi\n```\n\nThe pi adapter wires the same `hook_helper`\n\nand `inject_helper`\n\nyou'd use from Claude Code into pi's `tool_call`\n\n, `context`\n\n, and `session_compact`\n\nevents. See [adapters/pi/README.md](/SaravananJaichandar/world-model-mcp/blob/main/adapters/pi/README.md).\n\nFor users of OpenAI's [Codex CLI](https://github.com/openai/codex):\n\n```\npip install world-model-mcp                # the Python helpers\npython -m world_model_server.cli install-codex\n# (appends [mcp_servers.world_model] + hook blocks to ~/.codex/config.toml)\n# Restart codex; verify with: codex mcp list\n```\n\n`--dry-run`\n\nprints what would be appended without writing; `--force`\n\nre-appends even if the adapter marker is already present. The bundled snippet uses `world_model`\n\n(underscore) as the MCP server name to dodge Codex's silent hyphen-strip in its tool-name sanitizer. Hook output is camelCase with `deny_unknown_fields`\n\ncompliance against Codex's strict Rust schema; the contract is locked down by tests in `tests/test_v075_features.py`\n\n. See [adapters/codex/README.md](/SaravananJaichandar/world-model-mcp/blob/main/adapters/codex/README.md).\n\n```\nyour-project/\n├── .mcp.json                    # MCP server configuration\n├── .claude/\n│   ├── settings.json           # Hook configuration\n│   ├── hooks/                  # Compiled TypeScript hooks\n│   └── world-model/            # SQLite databases (~155 KB)\n```\n\nBefore:\n\n``` js\n// Claude invents an API that doesn't exist\nconst user = await User.findByEmail(email); // This method doesn't exist\n```\n\nAfter:\n\n``` js\n// Claude checks the world model first\nconst user = await User.findOne({ email }); // Verified to exist\n```\n\n**Goal**: Reduce non-existent API references by validating against the knowledge graph\n\n**Session 1**: User corrects Claude\n\n```\n// Claude writes:\nconsole.log('debug info');\n\n// User corrects to:\nlogger.debug('debug info');\n\n// World model learns: \"Use logger.debug() not console.log()\"\n```\n\n**Session 2**: Claude uses the learned pattern\n\n```\n// Claude automatically writes:\nlogger.debug('debug info'); // No correction needed\n```\n\n**Goal**: Learned patterns persist across sessions and prevent repeat violations\n\n```\n// Week 1: Bug fixed (null check added)\nif (user && user.email) { ... }\n\n// Week 2: Refactoring\n// World model warns: \"This line preserves a critical bug fix\"\n// Claude preserves the null check\n\n// Result: Bug not re-introduced\n```\n\n**Goal**: Detect potential regressions before code execution\n\n```\n┌──────────────────────────────────────────────────────────┐\n│ Claude Code + Hooks                                      │\n│ Captures: file edits, tool calls, user corrections       │\n└──────────────────────────────────────────────────────────┘\n                         |\n                         v\n┌──────────────────────────────────────────────────────────┐\n│ MCP Server (Python)                                      │\n│ - 22 MCP tools for querying/recording/predicting          │\n│ - LLM-powered entity extraction (Claude Haiku)           │\n│ - External linter integration (ESLint, Pylint, Ruff)     │\n└──────────────────────────────────────────────────────────┘\n                         |\n                         v\n┌──────────────────────────────────────────────────────────┐\n│ Knowledge Graph (SQLite + FTS5)                          │\n│ - entities.db: APIs, functions, classes                  │\n│ - facts.db: Temporal assertions with evidence            │\n│ - relationships.db: Entity dependency graph              │\n│ - constraints.db: Learned rules from corrections         │\n│ - sessions.db: Session history and outcomes              │\n│ - events.db: Activity log with reasoning chains          │\n└──────────────────────────────────────────────────────────┘\n```\n\n-\n**Temporal Facts**: Every fact has`validAt`\n\nand`invalidAt`\n\ntimestamps- \"Function X existed from 2024-01-15 to 2024-03-20\"\n- Query: \"What was true on March 1st?\"\n\n-\n**Evidence Chains**: Every assertion traces back to source- Fact -> Session -> Event -> Source Code Location\n\n-\n**Constraint Learning**: Pattern recognition from user corrections- Automatic rule type inference (linting, architecture, testing)\n- Severity detection (error, warning, info)\n- Example generation for future reference\n\n-\n**Dual Validation**: Combines two validation sources- World model constraints (learned from user)\n- External linters (ESLint, Pylint, Ruff)\n\nTwenty-two MCP tools available to Claude Code:\n\nCheck if APIs/functions exist before using them\n\n```\nresult = query_fact(\n    query=\"Does User.findByEmail exist?\",\n    entity_type=\"function\"\n)\n# Returns: {exists: bool, confidence: float, facts: [...]}\n```\n\nCapture development activity with reasoning chains\n\n```\nrecord_event(\n    event_type=\"file_edit\",\n    file_path=\"src/api/auth.ts\",\n    reasoning=\"Added JWT authentication middleware\"\n)\n```\n\nPre-execution validation against constraints and linters\n\n```\nresult = validate_change(\n    file_path=\"src/api/auth.ts\",\n    proposed_content=\"...\"\n)\n# Returns: {safe: bool, violations: [...], suggestions: [...]}\n```\n\nRetrieve project-specific rules for a file\n\n```\nconstraints = get_constraints(\n    file_path=\"src/**/*.ts\",\n    constraint_types=[\"linting\", \"architecture\"]\n)\n```\n\nLearn from user edits (HIGH PRIORITY)\n\n```\nrecord_correction(\n    claude_action={...},\n    user_correction={...},\n    reasoning=\"Use logger.debug instead of console.log\"\n)\n```\n\nRegression risk assessment\n\n```\nresult = get_related_bugs(\n    file_path=\"src/api/auth.ts\",\n    change_description=\"refactoring authentication logic\"\n)\n# Returns: {bugs: [...], risk_score: float, critical_regions: [...]}\n```\n\nScan the codebase and populate the knowledge graph with entities and relationships\n\n```\nresult = seed_project(\n    project_dir=\".\",\n    force=False\n)\n# Returns: {files_seeded: int, entities_created: int, relationships_created: int}\n```\n\nPull GitHub PR review comments and convert team feedback into constraints\n\n```\nresult = ingest_pr_reviews(\n    repo=\"owner/repo\",  # Auto-detected from git remote if omitted\n    count=10\n)\n# Returns: {prs_scanned: int, constraints_created: int, constraints_updated: int}\n```\n\n- 5-minute setup guide[QUICKSTART.md](/SaravananJaichandar/world-model-mcp/blob/main/QUICKSTART.md)- Contribution guidelines[CONTRIBUTING.md](/SaravananJaichandar/world-model-mcp/blob/main/CONTRIBUTING.md)- Version history and features[RELEASE_NOTES.md](/SaravananJaichandar/world-model-mcp/blob/main/RELEASE_NOTES.md)\n\n```\n# Run tests\npytest\n\n# With coverage\npytest --cov=world_model_server --cov-report=html\n```\n\n186 tests covering knowledge graph CRUD, FTS5 search, constraint management, bug tracking, auto-seeding, PR review ingestion, decision traces, outcome linkage, trajectory learning, prediction layer, memory health, contradiction detection, transcript pointers, project identity, and PreToolUse enforcement. See [tests/](/SaravananJaichandar/world-model-mcp/blob/main/tests) for details.\n\n```\n# Database location (default: ./.claude/world-model/)\nexport WORLD_MODEL_DB_PATH=\"/custom/path\"\n\n# Anthropic API key (optional - enables LLM extraction)\n# IMPORTANT: Never commit this! Use .env file (see .env.example)\nexport ANTHROPIC_API_KEY=\"your-api-key-here\"\n\n# Model selection\nexport WORLD_MODEL_EXTRACTION_MODEL=\"claude-3-haiku-20240307\"  # Fast\nexport WORLD_MODEL_REASONING_MODEL=\"claude-3-5-sonnet-20241022\"  # Accurate\n\n# Debug mode\nexport WORLD_MODEL_DEBUG=1\n```\n\n**Note**: Create a `.env`\n\nfile in your project root (see `.env.example`\n\n) - it's automatically ignored by git.\n\nEdit `.claude/settings.json`\n\nto customize which tools trigger world model hooks:\n\n```\n{\n  \"hooks\": {\n    \"PostToolUse\": [{\n      \"matcher\": \"Edit|Write|Bash\",\n      \"hooks\": [...]\n    }]\n  }\n}\n```\n\n**Currently Supported**:\n\n- TypeScript / JavaScript\n- Python\n\n**Coming Soon**:\n\n- Go, Rust, Java, C++\n\n**Extensible Architecture**: Easy to add new language parsers (see [CONTRIBUTING.md](/SaravananJaichandar/world-model-mcp/blob/main/CONTRIBUTING.md))\n\n**Local-First**: All knowledge graph data stays on your machine.** Optional LLM**: Works without API key (uses regex patterns as fallback).** Encrypted Storage**: SQLite databases are local files (encrypt your disk for security).\n\nv0.7.3 added anonymous usage telemetry. It is:\n\n**Off by default.** You have to explicitly opt in.**Asked once** during`world-model setup`\n\n, with a clear`y/N`\n\nprompt.**Inspectable**:`world-model telemetry --status`\n\nshows the exact JSON payload that would be sent.**Disable any time** with`world-model telemetry --disable`\n\n, or globally with`WORLD_MODEL_TELEMETRY_DISABLE=1`\n\n.**Skipped in non-TTY environments**(CI, scripts) so it never blocks an automated setup.\n\n**What we send (only if you opt in):**\n\n| Field | Example | Why |\n|---|---|---|\n`event` |\n`setup_completed` , `demo_run` , `hook_fired` |\nWhich lifecycle step ran |\n`version` |\n`0.7.3` |\nWhich release you're on |\n`install_id` |\nrandom UUID at `~/.world-model/install_id` |\nDistinguish installs without identifying users |\n`ts` |\nunix timestamp | When the event fired |\n\n**What we never send:** file paths, file contents, rule names, hostnames, IP addresses, API keys, decision-trace text, fact text, or anything else that could identify a person or leak business logic. The full payload schema lives in `world_model_server/telemetry.py`\n\n.\n\n**Where it goes:** opt-in events are posted to a dedicated private GitHub repo (`SaravananJaichandar/world-model-telemetry`\n\n) as plain issues. There is no third-party analytics service, no cookie, no fingerprint. The PAT embedded in the client is scoped to that one repo with `Issues: write`\n\nonly.\n\n- Entity extraction from code changes\n- Constraint inference from corrections\n- Never sends: Credentials, secrets, PII\n\n- Never commit\n`.env`\n\nfiles - Use\n`.env.example`\n\nas template - Store API keys in environment variables or\n`.env`\n\nfiles only - The\n`.gitignore`\n\nautomatically excludes sensitive files\n\n- Auto-seeding: knowledge graph populates from existing codebase on setup\n- PR Review Intelligence: ingest GitHub review comments as constraints\n- Relationship tracking: import and dependency graph between entities\n- Multi-language support: Python, TypeScript/JavaScript, Solidity, Go, Rust\n- CLI query command for knowledge graph lookups\n- 40 tests, 8 MCP tools\n\n- Module-level matching: query by module name finds the file and its contents\n- Incremental re-seeding: only re-process files changed since last seed\n- Fuzzy entity matching: approximate name search for typos and abbreviations\n- Query caching: in-memory cache with TTL for repeated lookups\n- Java support: complete multi-language coverage\n- MCP server pipeline validation on real projects\n\n- Outcome linkage: test failures linked to code changes with facts\n- Trajectory learning: co-edit patterns tracked across sessions\n- Decision trace capture: structured log of agent proposals and human corrections\n- Cross-project entity search with project registry\n- 5 new MCP tools (13 total), 104 tests\n\n- Regression prediction, \"what if\" simulation, test failure prediction\n- Multi-project knowledge transfer, memory health, fact TTL/decay\n- get_context_for_action pre-edit bundle, constraint violation tracking, find_contradictions\n- 20 MCP tools, 151 tests\n\n- PreToolUse constraint enforcement hook: deny hard violations at the edit boundary\n- Indexed transcript pointers: hydrate any fact back to source conversation\n- Project identity decoupling: stable UUID across directory renames\n- Content-hash deduplication for facts and constraints\n- Auto-generate CLAUDE.md from the knowledge graph\n- BetaAbstractMemoryTool subclass for Anthropic SDK integration\n- Desktop Extension (.mcpb) packaging for Claude Desktop\n- 22 MCP tools, 13 CLI subcommands, 186 tests\n\n- PostCompact and UserPromptSubmit auto-injection: re-emit top constraints and recent facts after context loss\n-\n`defer`\n\nenforcement tier in PreToolUse: pause headless agents on recurring warning-level violations, with graceful fallback to`ask`\n\n- Confidence-weighted contradiction resolution: pick a winner using confidence, recency, or source count, with an\n`auto`\n\nstrategy - Compaction audit log: query and export what was remembered across each compaction boundary\n- Cursor adapter package\n- 25 MCP tools, 14 CLI subcommands, 220 tests\n\n- HTTP transport mode for remote / MCP-tunnel deployment\n- /healthz endpoint, Dockerfile.http, docker-compose.yml\n- docs/deployment/mcp-tunnel.md walkthrough for Claude Managed Agents\n- 236 tests\n\n-\n`world-model demo`\n\nguided tour for first-time users - Opt-in anonymous telemetry, off by default, inspectable\n- pi-package adapter (\n`adapters/pi/`\n\n,`install-pi`\n\nCLI) - 17 CLI subcommands, 256 tests\n\n- AGENTS.md /\n`.agents/skills/`\n\nconstraint reader (new MCP tool:`get_agents_md_constraints`\n\n) - Self-hosted Claude Managed Agents deployment guide + Modal quickstart\n- Reproducible contradiction-resolution benchmark (24-pair dataset, CI workflow, RESULTS.md)\n- 26 MCP tools, 17 CLI subcommands, 283 tests\n\n- Codex CLI adapter (\n`install-codex`\n\n, shipped 2026-06-05)\n\n- In-agent\n`/world-model`\n\nslash command (read-only: status, contradictions, recent, help) -\n`world-model status-watch`\n\nTUI status widget\n\n- Decay + provenance schema:\n`source_tool`\n\n,`confirmer`\n\n,`last_decay_at`\n\ncolumns on facts. Per-evidence-type TTL with domain-aware half-lives (source_code 365d, test 180d, session 14d, user_correction 730d, bug_fix 365d). - Slash command write operations (\n`/world-model resolve <id>`\n\n,`/world-model forget <id>`\n\n). -\n`resolve_contradiction`\n\naccepts`confirmer`\n\nto stamp the winning fact as settled.\n\n- Expanded contradiction-resolution benchmark: 24 → 105 pairs across 19 categories, including 6 new categories that test the v0.8.0 schema (decay, provenance, confirmer).\n- Honest per-strategy + per-category RESULTS.md with the v0.7.4 number preserved as baseline.\n\n-\n**Repeat-mistake benchmark on AI coding tasks**. The empirical test of the central wedge: does the learning loop measurably reduce repeated agent mistakes? Runs against a SWE-bench-style task corpus with Claude Code headless, measures delta in repeat-mistake rate with vs without world-model-mcp learning the constraint from the first attempt. This is the artifact the visibility plan has been reaching for; it maps directly to the[June 2026 essay](https://medium.com/@saravanan_2424/your-ai-model-is-temporary-your-learning-loop-should-not-be-874380e8ccf7)framing. -\n`auto`\n\nstrategy rewrite to fold in`confirmer`\n\n+ decay awareness (should lift the v0.8.1 benchmark's auto score from 77.1% past 90%). - Antigravity CLI adapter (held since 2026-06-13; SDK lacks a\n`TransformCompactionHook`\n\nfor the load-bearing memory-injection contract; re-verify 2026-06-27). - MCP spec 2026-07-28 readiness (stateless transport,\n`_meta`\n\nheaders,`InputRequiredResult`\n\n). - Cline adapter (lower urgency after they shipped global AGENTS rules in v3.86).\n\nContributions are welcome. See [CONTRIBUTING.md](/SaravananJaichandar/world-model-mcp/blob/main/CONTRIBUTING.md) for:\n\n- Development setup\n- Coding standards\n- Adding language support\n- Writing tests\n- Submitting PRs\n\n**Areas where help is needed**:\n\n- Language parsers (Go, Rust, Java, C++)\n- Performance optimization\n- Documentation improvements\n- Real-world testing feedback\n\n**Project Size**:\n\n- ~4,800 lines of code\n- 13 Python modules\n- 3 TypeScript hook implementations\n\n**Storage Efficiency**:\n\n- Empty database: ~155 KB\n- Per entity: ~500 bytes\n- Per fact: ~800 bytes\n\n[MIT License](/SaravananJaichandar/world-model-mcp/blob/main/LICENSE) - Free for commercial and personal use\n\n**Issues**:[GitHub Issues](https://github.com/SaravananJaichandar/world-model-mcp/issues)** Discussions**:[GitHub Discussions](https://github.com/SaravananJaichandar/world-model-mcp/discussions)", "url": "https://wpnews.pro/news/show-hn-memory-layer-for-claude-code-10-2-pts-on-swe-bench-verified-benchmark", "canonical_source": "https://github.com/SaravananJaichandar/world-model-mcp", "published_at": "2026-06-24 06:53:54+00:00", "updated_at": "2026-06-24 07:13:48.449947+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "machine-learning", "large-language-models", "ai-tools"], "entities": ["World Model MCP", "Claude Code", "Cursor", "SWE-bench Verified", "django", "sympy", "matplotlib", "scikit-learn"], "alternates": {"html": "https://wpnews.pro/news/show-hn-memory-layer-for-claude-code-10-2-pts-on-swe-bench-verified-benchmark", "markdown": "https://wpnews.pro/news/show-hn-memory-layer-for-claude-code-10-2-pts-on-swe-bench-verified-benchmark.md", "text": "https://wpnews.pro/news/show-hn-memory-layer-for-claude-code-10-2-pts-on-swe-bench-verified-benchmark.txt", "jsonld": "https://wpnews.pro/news/show-hn-memory-layer-for-claude-code-10-2-pts-on-swe-bench-verified-benchmark.jsonld"}}