Show HN: Memory layer for Claude Code(+10.2 pts on SWE-bench Verified benchmark)

wpnews.pro

Enforcement, provenance, and harness-neutral memory for AI coding agents. A temporal knowledge graph that validates code changes against learned constraints at the edit boundary, re-injects relevant context after compaction, tracks contradictions with confidence-weighted resolution, and runs across Claude Code, Cursor, and pi.

Status: v0.9.1— 26 MCP tools, 19 CLI subcommands, 375 tests, SWE-bench Verified repeat-mistake benchmark with +10.2 pts paired delta across 49 instances (+15.0 pts within-domain, +6.9 pts cross-domain), 105-pair contradiction-resolution benchmark. v0.9 ships the empirical wedge proof: a locked, pre-registered methodology tested whether the persistent-knowledge layer measurably reduces repeated coding-agent mistakes on a public task corpus. Result confirms positive within-domain and cross-domain effects with zero observed regressions on out-of-domain tasks. Full per-task tables, mechanistic analysis of the two cross-domain flips (sphinx-9461 is the cleanest case), and honest limitations in[. v0.8.1 expanded the contradiction-resolution benchmark to 105 pairs across 19 categories. v0.8.0 added domain-aware confidence decay with per-evidence-type TTL, per-item provenance fields]benchmarks/repeat-mistake/RESULTS.md

source_tool

andconfirmer

, slash command write operations, and aconfirmer

parameter onresolve_contradiction

. Antigravity adapter held for the fourth consecutive release pending aTransformCompactionHook

in the SDK; next re-verify 2026-07-24. v0.7.6 added the/world-model

slash command andstatus-watch

TUI widget. v0.7.5 added the Codex CLI adapter. v0.7.0 introduced PostCompact auto-injection, thedefer

enforcement tier, confidence-weighted contradiction resolution, and a compaction audit log. Contributions welcome.

mcp-name: io.github.SaravananJaichandar/world-model-mcp

If world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one and the feedback shapes what ships next.

World Model MCP creates a temporal knowledge graph of your codebase that learns from every coding session to:

Prevent Hallucinations-- Validates API/function references against known entities before use** Stop Repeated Mistakes**-- Learns constraints from corrections, applies them in future sessions** Reduce Regressions**-- Tracks bug fixes and warns when changes touch critical regions** Survive Compaction**-- Re-injects top constraints and recent facts after the agent's context window resets** Resolve Contradictions**-- Picks a winner between conflicting facts using confidence, recency, or source count

Think of it as a long-term memory layer that runs alongside Claude Code, Cursor, or any MCP-aware coding agent.

Repeat-mistake benchmark on SWE-bench Verified— the central wedge proof. 50 SWE-bench Verified tasks across django, sympy, matplotlib, scikit-learn, and sphinx, run as a paired baseline-vs-treatment comparison. Methodology was locked aton 2026-06-17 (before the data existed) so the result cannot be accused of goalpost-moving.benchmarks/repeat-mistake/DESIGN.md

Headline results— Subset 1 (within-domain: django + sympy) baseline 15/20 = 75.0 percent, treatment 18/20 = 90.0 percent, delta +15.0 pts with 4 FAIL to PASS flips and 1 regression. Subset 2 (cross-domain: matplotlib + scikit-learn + sphinx) baseline 18/29 = 62.1 percent, treatment 20/29 = 69.0 percent, delta +6.9 pts with 2 flips and zero regressions. Combined paired result across 49 instances: 33/49 to 38/49, delta +10.2 pts. - Cross-domain transfer isolated cleanly— the Subset 2 treatment arm loaded ONLY the 4 Subset 1 constraints (django and sympy directives), holding out the 11 Subset 2 constraints to test whether learning from one repo family generalizes to a different one. Two cross-domain flips with plausible mechanistic explanations grounded in the loaded constraints. Sphinx-9461 is the strongest case: a sympy classmethod constraint transferred to a sphinx classmethod-wrapper unwrapping bug. - Honest caveats embedded in RESULTS.md— seven explicit limitations including single-trial design, constraint-failure overlap on Subset 1, the small cross-domain transfer rate, one dropped instance due to an upstream SWE-bench pip flag issue, and judge-model self-reference risk. Stated verbatim rather than hidden in an appendix. - Full reproducibility artifacts— every progress JSONL, predictions JSON, results JSONL, classification JSONL, constraints JSON, and harness report JSON committed in. Locked judge prompts inbenchmarks/repeat-mistake/

failure_classifier.py

andlearning_hook.py

. Total agent cost across both arms was approximately 90 USD on a Claude Code subscription.

Contradiction-resolution benchmark expansion-- the v0.7.4 24-pair benchmark grew to 105 hand-curated pairs across 19 categories. Six new categories exercise the v0.8.0 schema specifically:source_tool_corroboration

,confirmer_overrides_pending

,decay_advantage_session_vs_source

,decay_advantage_stale_session

,evidence_type_user_correction

,settled_beats_higher_confidence

. Deterministic runner at; full per-strategy + per-category breakdown atbenchmarks/contradictions-200/run.py

.benchmarks/contradictions-200/RESULTS.md

Honest framing on the numbers: the new dataset is harder than v0.7.4's 24-pair set because the new categories deliberately test schema awareness (confirmer, evidence_type, decay) rather than raw confidence ranking. Headline numbers:keep_most_sources

99.0%,keep_higher_confidence

81.0%,auto

77.1%,keep_higher_confidence_decayed

90.5% (on the 21 pairs where evidence_type is present), overall 78.2% across all strategies. The original 24-pair v0.7.4 93.5% number is preserved unchanged atbenchmarks/contradictions/

and is not invalidated; it tested a different (smaller, easier) corpus. - The wedge benchmark is v0.9: "does the learning loop measurably reduce repeated coding-agent mistakes on a public task corpus?" The contradiction-resolution work in this release is internal schema-correctness validation. The empirical artifact that maps to the published essay framing — the learning loop is the durable layer — lands in v0.9 with a SWE-bench-style repeat-mistake benchmark.

Domain-aware confidence decay-- newworld_model_server/decay.py

module with exponential half-life decay perevidence_type

. Half-lives: source_code 365d, test 180d, session 14d, user_correction 730d, bug_fix 365d. Decay applies on read (no background task), so the nextquery_fact

call returns the time-corrected confidence. Settled facts (canonical

status, or any fact withconfirmer != NULL

) never auto-transition. Synthesized facts that decay below 0.2 confidence and corroborated facts that decay below 0.1 confidence auto-supersede on read, surfacing rot to the next compaction injection. - Per-item provenance fields on facts-- three additive columns (source_tool TEXT

,confirmer TEXT

,last_decay_at TIMESTAMP

), all NULL-defaulted, no backfill.source_tool

records which tool wrote the fact (e.g.claude_code

,codex

,cursor

,pi

,user

).confirmer

records who confirmed it, distinct from the asserter; NULL means pending, non-NULL means settled. Both are exposed on theFact

model and propagated throughcreate_fact

. Honors the public commitment to Patdolitse (anthropics/claude-code#47023) and ferhimedamine (openai/codex#19195). - Slash command write operations-- two new subcommands./world-model resolve <id>

marks a contradiction as resolved (manual; for confidence-weighted picking use theresolve_contradiction

MCP tool)./world-model forget <id>

setsinvalid_at

on a fact (preserved in the audit log; current-only reads skip it from then on). Both are idempotent and report cleanly on unknown ids. Help text now lists both alongside the read-only subcommands shipped in v0.7.6. - -- when aresolve_contradiction

acceptsconfirmer

confirmer

argument is provided to the MCP tool or its underlyingresolve

function, the winning fact gets itsconfirmer

column stamped with that value. This is the spec primitive that distinguishes "the asserter says X" from "X is confirmed by Y" per the working group sketch. - Antigravity adapter held for the third consecutive release. The 2026-06-13 re-verification foundOnCompactionHook

declared asInspectHook

in the SDK with noTransformCompactionHook

and noadditional_context

return field. The load-bearing memory-injection contract still does not exist in the SDK. Next re-verify 2026-06-27.

In-agent-- typed by the user inside the agent harness, surfaces the world model state without leaving the chat. Read-only in v0.7.6 (/world-model

slash commandstatus

,contradictions

,recent

,help

); write operations (resolve

,forget

) land in v0.8. Works across Claude Code, Cursor, Codex, and pi by interceptingUserPromptSubmit

in the existinginject_helper

. ReturnsadditionalContext

in the strict camelCase shape Codex enforces (deny_unknown_fields

), so the same wire-up serves all four harnesses without a per-harness branch.-- terminal pane that runs alongside the agent and refreshes every 5 seconds. Shows constraints (total, severity=error, severity=warning), unresolved contradictions, facts (canonical / synthesized / superseded), and last compaction time. Built on theworld-model status-watch

TUI widgetrich

library already in the dependency tree; falls back to a plain-text one-shot dump whenrich

is not installed.Antigravity CLI adapter intentionally NOT shipped in this release-- the re-verification on 2026-06-13 againstgoogle-antigravity/antigravity-sdk-python

HEAD surfaced an architectural gap:OnCompactionHook

is declared as anInspectHook

(read-only, non-blocking) with noadditional_context

return field and noTransformCompactionHook

subclass. The load-bearing memory-injection contract does not exist in the SDK today. Targeting 2026-06-27 for the next re-verification; v0.7.6 ships without Antigravity rather than against a contract that cannot do the work.

Codex CLI adapter-- newinstall-codex

CLI subcommand appends a[mcp_servers.world_model]

block plus PreToolUse, PostToolUse, PostCompact, and SessionStart hooks to~/.codex/config.toml

. The bundled snippet was verified againstopenai/codex@main

at v0.138.0-alpha (server name uses underscore to dodge the tool-name hyphen-strip incodex-rs/codex-mcp/src/mcp/mod.rs

; hook output sticks to camelCase withdeny_unknown_fields

compliance). Schema regression tests intests/test_v075_features.py

lock the contract down. Seeadapters/codex/README.md.Dual-shape payload normalization in-- both helpers now accept either Claude Code's payload shape (hook_helper

andinject_helper

event

,project_dir

) or Codex's (hook_event_name

,cwd

), so the same Python code drives all four adapters (Claude Code, Cursor, pi, Codex).Antigravity CLI adapter intentionally NOT shipped this release-- the Antigravity API surface is still settling (six 1.0.x releases in three weeks, theurl

field for HTTP MCP servers landed June 3, hook JSON event-name casing remains undocumented). Targeting June 25 for that adapter after the API stabilizes. Detailed reasoning in the v0.7.5 RELEASE_NOTES entry.

AGENTS.md /-- world-model-mcp now reads declarative project conventions from.agents/skills/

constraint readerAGENTS.md

,CLAUDE.md

,GEMINI.md

, and.agents/skills/*.md

files and mixes them into PreToolUse enforcement alongside the SQLite-backed constraints. Supports structured fence blocks (```` constraint`

and YAML frontmatter) and heuristic imperative-sentence extraction for prose-style AGENTS.md files. New MCP tool:get_agents_md_constraints

. (anthropics/claude-code#6235has 4,000+ thumbs-up for AGENTS.md as the cross-agent format.)Self-hosted Claude Managed Agents deployment guide-- Anthropic'sofficial position:*"Memory is not yet supported in self-hosted sessions."*world-model-mcp fills that gap. New guide at, with adocs/deployment/managed-agents-self-hosted.md

Modal quickstartyou can deploy in under five minutes.Reproducible contradiction-resolution benchmark-- 24-pair dataset at, runner atbenchmarks/contradictions/dataset.jsonl

, results atbenchmarks/contradictions/run.py

. Headline: 93.5% overall accuracy, 100% onbenchmarks/contradictions/RESULTS.md

keep_higher_confidence

andkeep_most_sources

, with documented honest weaknesses on tie-handling and small confidence gaps. Re-run withpython benchmarks/contradictions/run.py

. CI workflow guards regressions.

-- one command to see every primitive working. Initializes the knowledge graph, seeds reproducible demo data viaworld-model demo

scripts/demo_seed.py

, then exercises each primitive (PreToolUse enforcement, contradiction detection, PostCompact injection, audit log) with real outputs. New users can see the value without writing any code.Opt-in telemetry-- off by default, prompted once duringworld-model setup

, inspectable withworld-model telemetry --status

, disabled withworld-model telemetry --disable

. No file paths, no code, no identifiers tied to a person. SeePrivacy and Securityfor the exact payload.pi adapter-- newadapters/pi/

package. world-model-mcp now plugs intoearendil-works/pivia pi's extension API (tool_call

-> PreToolUse,context

-> auto-injection,session_compact

-> audit log). Install withworld-model install-pi

.

PostCompact / UserPromptSubmit auto-injection-- when the agent's context is compacted, the hook automatically splices the top constraints and recent canonical facts back into the next turn. Configurable, fails open.-- PreToolUse now classifies recurring warning-level violations asdefer

enforcement tierdefer

, which s headless agents (with graceful fallback toask

on older clients) instead of either hard-denying or silently passing through.Confidence-weighted contradiction resolution-- the newresolve_contradiction

tool picks a winner usingkeep_higher_confidence

,keep_most_recent

,keep_most_sources

, orauto

. The loser is marked superseded.Compaction audit log-- every PostCompact event writes a row with pre/post token counts and what was re-injected. Query with theaudit-compactions

CLI or export to JSONL.Cursor adapter-- harness-neutral hooks underadapters/cursor/

. Same Python helpers, different manifest format.Streamable HTTP transport (v0.7.2)--WORLD_MODEL_TRANSPORT=http

so the same 25 MCP tools work behind an MCP tunnel for Claude Managed Agents with self-hosted sandboxes. Seedocs/deployment/mcp-tunnel.md.

Download the latest .mcpb

from Releases and drag it into Claude Desktop. Auto-installs hooks, MCP server config, and dependencies.

pip install world-model-mcp

cd /path/to/your/project
python -m world_model_server.cli setup

You can also re-seed or seed manually at any time:

world-model seed

world-model seed --force

For Claude Managed Agents with self-hosted sandboxes, or any deployment where the MCP server lives behind a firewall and the agent reaches it from Anthropic-side infrastructure, run world-model-mcp in HTTP mode.

pip install 'world-model-mcp[http]'

export WORLD_MODEL_TRANSPORT=http
export WORLD_MODEL_HTTP_PORT=8765
python -m world_model_server.server

Or use the bundled image:

docker compose up -d                    # Dockerfile.http + persistent volume
curl http://127.0.0.1:8765/healthz      # {"status":"ok","version":"0.7.2"}

Full walkthrough including Anthropic MCP tunnels setup: docs/deployment/mcp-tunnel.md.

Stdio remains the default transport for Claude Code, Cursor, and .mcpb

installs. Nothing changes for those flows.

To see every primitive working with real outputs from a real SQLite database before committing to a full install:

pip install world-model-mcp
cd /tmp/wm-test && mkdir -p wm-test && cd wm-test
world-model demo

The demo initializes a knowledge graph, seeds reproducible data, and exercises PreToolUse enforcement, contradiction detection, the PostCompact injection bundle, and the compaction audit log -- with the actual JSON outputs. Re-runs are idempotent.

For users of earendil-works/pi:

pip install world-model-mcp           # the Python helpers
world-model install-pi                # writes adapters/world-model-pi/
pi install local:./adapters/world-model-pi

The pi adapter wires the same hook_helper

and inject_helper

you'd use from Claude Code into pi's tool_call

, context

, and session_compact

events. See adapters/pi/README.md.

For users of OpenAI's Codex CLI:

pip install world-model-mcp                # the Python helpers
python -m world_model_server.cli install-codex

--dry-run

prints what would be appended without writing; --force

re-appends even if the adapter marker is already present. The bundled snippet uses world_model

(underscore) as the MCP server name to dodge Codex's silent hyphen-strip in its tool-name sanitizer. Hook output is camelCase with deny_unknown_fields

compliance against Codex's strict Rust schema; the contract is locked down by tests in tests/test_v075_features.py

. See adapters/codex/README.md.

your-project/
├── .mcp.json                    # MCP server configuration
├── .claude/
│   ├── settings.json           # Hook configuration
│   ├── hooks/                  # Compiled TypeScript hooks
│   └── world-model/            # SQLite databases (~155 KB)

Before:

// Claude invents an API that doesn't exist
const user = await User.findByEmail(email); // This method doesn't exist

After:

// Claude checks the world model first
const user = await User.findOne({ email }); // Verified to exist

Goal: Reduce non-existent API references by validating against the knowledge graph

Session 1: User corrects Claude

// Claude writes:
console.log('debug info');

// User corrects to:
logger.debug('debug info');

// World model learns: "Use logger.debug() not console.log()"

Session 2: Claude uses the learned pattern

// Claude automatically writes:
logger.debug('debug info'); // No correction needed

Goal: Learned patterns persist across sessions and prevent repeat violations

// Week 1: Bug fixed (null check added)
if (user && user.email) { ... }

// Week 2: Refactoring
// World model warns: "This line preserves a critical bug fix"
// Claude preserves the null check

// Result: Bug not re-introduced

Goal: Detect potential regressions before code execution

┌──────────────────────────────────────────────────────────┐
│ Claude Code + Hooks                                      │
│ Captures: file edits, tool calls, user corrections       │
└──────────────────────────────────────────────────────────┘
                         |
                         v
┌──────────────────────────────────────────────────────────┐
│ MCP Server (Python)                                      │
│ - 22 MCP tools for querying/recording/predicting          │
│ - LLM-powered entity extraction (Claude Haiku)           │
│ - External linter integration (ESLint, Pylint, Ruff)     │
└──────────────────────────────────────────────────────────┘
                         |
                         v
┌──────────────────────────────────────────────────────────┐
│ Knowledge Graph (SQLite + FTS5)                          │
│ - entities.db: APIs, functions, classes                  │
│ - facts.db: Temporal assertions with evidence            │
│ - relationships.db: Entity dependency graph              │
│ - constraints.db: Learned rules from corrections         │
│ - sessions.db: Session history and outcomes              │
│ - events.db: Activity log with reasoning chains          │
└──────────────────────────────────────────────────────────┘

Temporal Facts: Every fact hasvalidAt

andinvalidAt

timestamps- "Function X existed from 2024-01-15 to 2024-03-20"

Query: "What was true on March 1st?"

Evidence Chains: Every assertion traces back to source- Fact -> Session -> Event -> Source Code Location

Constraint Learning: Pattern recognition from user corrections- Automatic rule type inference (linting, architecture, testing)

Severity detection (error, warning, info)
Example generation for future reference

Dual Validation: Combines two validation sources- World model constraints (learned from user)

External linters (ESLint, Pylint, Ruff)

Twenty-two MCP tools available to Claude Code:

Check if APIs/functions exist before using them

result = query_fact(
    query="Does User.findByEmail exist?",
    entity_type="function"
)

Capture development activity with reasoning chains

record_event(
    event_type="file_edit",
    file_path="src/api/auth.ts",
    reasoning="Added JWT authentication middleware"
)

Pre-execution validation against constraints and linters

result = validate_change(
    file_path="src/api/auth.ts",
    proposed_content="..."
)

Retrieve project-specific rules for a file

constraints = get_constraints(
    file_path="src/**/*.ts",
    constraint_types=["linting", "architecture"]
)

Learn from user edits (HIGH PRIORITY)

record_correction(
    claude_action={...},
    user_correction={...},
    reasoning="Use logger.debug instead of console.log"
)

Regression risk assessment

result = get_related_bugs(
    file_path="src/api/auth.ts",
    change_description="refactoring authentication logic"
)

Scan the codebase and populate the knowledge graph with entities and relationships

result = seed_project(
    project_dir=".",
    force=False
)

Pull GitHub PR review comments and convert team feedback into constraints

result = ingest_pr_reviews(
    repo="owner/repo",  # Auto-detected from git remote if omitted
    count=10
)

5-minute setup guideQUICKSTART.md- Contribution guidelinesCONTRIBUTING.md- Version history and featuresRELEASE_NOTES.md

pytest

pytest --cov=world_model_server --cov-report=html

186 tests covering knowledge graph CRUD, FTS5 search, constraint management, bug tracking, auto-seeding, PR review ingestion, decision traces, outcome linkage, trajectory learning, prediction layer, memory health, contradiction detection, transcript pointers, project identity, and PreToolUse enforcement. See tests/ for details.

export WORLD_MODEL_DB_PATH="/custom/path"

export ANTHROPIC_API_KEY="your-api-key-here"

export WORLD_MODEL_EXTRACTION_MODEL="claude-3-haiku-20240307"  # Fast
export WORLD_MODEL_REASONING_MODEL="claude-3-5-sonnet-20241022"  # Accurate

export WORLD_MODEL_DEBUG=1

Note: Create a .env

file in your project root (see .env.example

) - it's automatically ignored by git.

Edit .claude/settings.json

to customize which tools trigger world model hooks:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write|Bash",
      "hooks": [...]
    }]
  }
}

Currently Supported:

TypeScript / JavaScript
Python

Coming Soon:

Go, Rust, Java, C++

Extensible Architecture: Easy to add new language parsers (see CONTRIBUTING.md)

Local-First: All knowledge graph data stays on your machine.** Optional LLM**: Works without API key (uses regex patterns as fallback).** Encrypted Storage**: SQLite databases are local files (encrypt your disk for security).

v0.7.3 added anonymous usage telemetry. It is:

Off by default. You have to explicitly opt in.Asked once duringworld-model setup

, with a cleary/N

prompt.Inspectable:world-model telemetry --status

shows the exact JSON payload that would be sent.Disable any time withworld-model telemetry --disable

, or globally withWORLD_MODEL_TELEMETRY_DISABLE=1

.Skipped in non-TTY environments(CI, scripts) so it never blocks an automated setup.

What we send (only if you opt in):

Field	Example	Why
`event`
`setup_completed` , `demo_run` , `hook_fired`
Which lifecycle step ran
`version`
`0.7.3`
Which release you're on
`install_id`
random UUID at `~/.world-model/install_id`
Distinguish installs without identifying users
`ts`
unix timestamp	When the event fired

What we never send: file paths, file contents, rule names, hostnames, IP addresses, API keys, decision-trace text, fact text, or anything else that could identify a person or leak business logic. The full payload schema lives in world_model_server/telemetry.py

.

Where it goes: opt-in events are posted to a dedicated private GitHub repo (SaravananJaichandar/world-model-telemetry

) as plain issues. There is no third-party analytics service, no cookie, no fingerprint. The PAT embedded in the client is scoped to that one repo with Issues: write

only.

Entity extraction from code changes
Constraint inference from corrections
Never sends: Credentials, secrets, PII
Never commit .env

files - Use .env.example

as template - Store API keys in environment variables or .env

files only - The .gitignore

automatically excludes sensitive files

Auto-seeding: knowledge graph populates from existing codebase on setup
PR Review Intelligence: ingest GitHub review comments as constraints
Relationship tracking: import and dependency graph between entities
Multi-language support: Python, TypeScript/JavaScript, Solidity, Go, Rust
CLI query command for knowledge graph lookups
40 tests, 8 MCP tools
Module-level matching: query by module name finds the file and its contents
Incremental re-seeding: only re-process files changed since last seed
Fuzzy entity matching: approximate name search for typos and abbreviations
Query caching: in-memory cache with TTL for repeated lookups
Java support: complete multi-language coverage
MCP server pipeline validation on real projects
Outcome linkage: test failures linked to code changes with facts
Trajectory learning: co-edit patterns tracked across sessions
Decision trace capture: structured log of agent proposals and human corrections
Cross-project entity search with project registry
5 new MCP tools (13 total), 104 tests
Regression prediction, "what if" simulation, test failure prediction
Multi-project knowledge transfer, memory health, fact TTL/decay
get_context_for_action pre-edit bundle, constraint violation tracking, find_contradictions
20 MCP tools, 151 tests
PreToolUse constraint enforcement hook: deny hard violations at the edit boundary
Indexed transcript pointers: hydrate any fact back to source conversation
Project identity decoupling: stable UUID across directory renames
Content-hash deduplication for facts and constraints
Auto-generate CLAUDE.md from the knowledge graph
BetaAbstractMemoryTool subclass for Anthropic SDK integration
Desktop Extension (.mcpb) packaging for Claude Desktop
22 MCP tools, 13 CLI subcommands, 186 tests
PostCompact and UserPromptSubmit auto-injection: re-emit top constraints and recent facts after context loss

defer

enforcement tier in PreToolUse: headless agents on recurring warning-level violations, with graceful fallback toask

Confidence-weighted contradiction resolution: pick a winner using confidence, recency, or source count, with an auto

strategy - Compaction audit log: query and export what was remembered across each compaction boundary

Cursor adapter package
25 MCP tools, 14 CLI subcommands, 220 tests
HTTP transport mode for remote / MCP-tunnel deployment
/healthz endpoint, Dockerfile.http, docker-compose.yml
docs/deployment/mcp-tunnel.md walkthrough for Claude Managed Agents
236 tests

world-model demo

guided tour for first-time users - Opt-in anonymous telemetry, off by default, inspectable

pi-package adapter ( adapters/pi/

,install-pi

CLI) - 17 CLI subcommands, 256 tests

AGENTS.md / .agents/skills/

constraint reader (new MCP tool:get_agents_md_constraints

) - Self-hosted Claude Managed Agents deployment guide + Modal quickstart

Reproducible contradiction-resolution benchmark (24-pair dataset, CI workflow, RESULTS.md)
26 MCP tools, 17 CLI subcommands, 283 tests
Codex CLI adapter ( install-codex

, shipped 2026-06-05)

In-agent /world-model

slash command (read-only: status, contradictions, recent, help) - world-model status-watch

TUI status widget

Decay + provenance schema: source_tool

,confirmer

,last_decay_at

columns on facts. Per-evidence-type TTL with domain-aware half-lives (source_code 365d, test 180d, session 14d, user_correction 730d, bug_fix 365d). - Slash command write operations ( /world-model resolve <id>

,/world-model forget <id>

). - resolve_contradiction

acceptsconfirmer

to stamp the winning fact as settled.

Expanded contradiction-resolution benchmark: 24 → 105 pairs across 19 categories, including 6 new categories that test the v0.8.0 schema (decay, provenance, confirmer).
Honest per-strategy + per-category RESULTS.md with the v0.7.4 number preserved as baseline.

Repeat-mistake benchmark on AI coding tasks. The empirical test of the central wedge: does the learning loop measurably reduce repeated agent mistakes? Runs against a SWE-bench-style task corpus with Claude Code headless, measures delta in repeat-mistake rate with vs without world-model-mcp learning the constraint from the first attempt. This is the artifact the visibility plan has been reaching for; it maps directly to theJune 2026 essayframing. - auto

strategy rewrite to fold inconfirmer

decay awareness (should lift the v0.8.1 benchmark's auto score from 77.1% past 90%). - Antigravity CLI adapter (held since 2026-06-13; SDK lacks a TransformCompactionHook

for the load-bearing memory-injection contract; re-verify 2026-06-27). - MCP spec 2026-07-28 readiness (stateless transport, _meta

headers,InputRequiredResult

). - Cline adapter (lower urgency after they shipped global AGENTS rules in v3.86).

Contributions are welcome. See CONTRIBUTING.md for:

Development setup
Coding standards
Adding language support
Writing tests
Submitting PRs

Areas where help is needed:

Language parsers (Go, Rust, Java, C++)
Performance optimization
Documentation improvements
Real-world testing feedback

Project Size:

~4,800 lines of code
13 Python modules
3 TypeScript hook implementations

Storage Efficiency:

Empty database: ~155 KB
Per entity: ~500 bytes
Per fact: ~800 bytes

MIT License - Free for commercial and personal use

Issues:GitHub Issues** Discussions**:GitHub Discussions

source & further reading

github.com — original article

Show HN: Memory layer for Claude Code(+10.2 pts on SWE-bench Verified benchmark)

Run your AI side-project on zahid.host