{"slug": "compass-v1-1-0-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift", "title": "Compass v1.1.0 · we shipped a memory plugin that catches its own consumption drift", "summary": "Release of Compass v1.1.0, which fixes a critical failure where the memory plugin successfully recalled relevant files but agents failed to read the file bodies, leading to repeated mistakes. The update addresses this by embedding the first 800 characters of recalled file bodies directly in recall results and improving drift alerts to include actionable context from past lessons. Additionally, a new `recall_consumption.py` module audits whether agents actually open surfaced files, providing a direct signal for \"label vs. consumption\" drift.", "body_md": "# Compass v1.1.0 · the recall consumption fix\n\nWe shipped [nautilus-compass v1.1.0](https://github.com/chunxiaoxx/nautilus-compass)\n\n12 hours after v1.0.0. v1.0.0 was the public stable cut. v1.1.0 fixes a\n\nclass of failure that v1.0.0 surfaces but does not catch · which we\n\ncaught in our own usage 5 hours after launch.\n\n## The bug we caught in production\n\nA sister Claude Code dialog was supposed to publish a long-form article\n\nto wechat using a 6-step quality pipeline (audit-gate, xhs-cards-embed,\n\nspecific account login flow). The pipeline was documented in cross-session\n\nmemory · a file called `publisher_quality_pipeline_20260430.md`\n\n.\n\nCompass recall fired correctly · the file appeared in the agent's\n\n`UserPromptSubmit`\n\nhook output:\n\n```\n🟢 [3h old] memory/publisher_quality_pipeline_20260430.md\n       audit-gate / xhs-cards-embed / wxid · v6 必须先过 critic 6 维评分再发布\n```\n\nThe agent saw the title. Saw the 80-character description. Acted. **It\ndid not Read the file body.** The actual rules —\n\n*how*to walk audit-gate,\n\n*which*wxid,\n\n*what*xhs-cards-embed structure looks like — those rules\n\nwere in the body. None of them entered the agent's working context.\n\nThe agent then reproduced exactly the failure mode the file was written\n\nto prevent: ad-hoc `_tmp_publish_v8.cjs`\n\nscripts, no critic round, wrong\n\nlogin path.\n\nThe user's diagnosis was sharp:\n\ncompass 召回到了 · 我没消费 · 这是 agent 层的人格漂移 · 不是 compass 本身的失败\n\nThat's half right. Recall surfaced the right file. The agent failed to\n\nconsume. But the **shape of the recall response made the failure easy** —\n\nwe returned title + 120-char description. Easy to skim. Easy to assume\n\nyou have read it when you have only read the index.\n\nThis is structural. Not the agent's fault.\n\n## The three-layer fix in v1.1.0\n\n### v0 · embed body in top-3 hits\n\nTop-3 recall hits now embed the first 800 characters of post-frontmatter\n\nbody in an indented `│`\n\nblock:\n\n```\n🟢 score=0.84 · [3h old] memory/publisher_quality_pipeline_20260430.md\n       audit-gate / xhs-cards-embed / wxid · v6 必须先过 critic 6 维评分\n       │ # Publisher quality pipeline\n       │\n       │ Six-step pipeline mandatory before publishing to wechat:\n       │ 1. audit-gate · V6 critic checks against 6 dimensions ...\n       │ 2. xhs-cards-embed · embed cards into article body via ...\n       │ 3. wxid login flow · use wxid `chunxiaox` not openid_of_first_follower\n       │ ...\n       │ … (+1273 more · Read publisher_quality_pipeline_20260430.md for rest)\n```\n\nThe agent now has the rules in its working context. No additional `Read`\n\ntool call required. Tail hits 4..K stay header-only to keep the response\n\nbounded (~3KB total).\n\n### v1 · embed past-mistake body in anti-anchor alerts\n\nCompass's drift detector matches the current prompt against 35 negative\n\nanchors learned from prior mistakes (`\"我猜应该是这样 · 反正用户不查\"`\n\n,\n\n`\"假装上次说定了的方案 · 用户应该忘了\"`\n\n, ...).\n\nUntil v1.1.0 the alert just said: *\"matched anti-anchor X with cos=0.625\"*.\n\nSame problem as v0 — label visible, body invisible, agent shrugs.\n\nv1.1.0 alerts now embed body from the most-relevant past lesson session.\n\nTwo-tier match: substring 6-gram against the anchor + lesson-type\n\nfrontmatter (Tier 1, precise) · falls back to recent `drift!=green`\n\nsessions (Tier 2, the agent's own self-reported slip-ups). Every alert\n\nbecomes actionable, not decorative.\n\n### v2 · detect \"recall fired but not consumed\"\n\nThe most direct signal: did the agent actually open any of the files\n\nrecall surfaced?\n\n`recall_consumption.py`\n\n(new module) walks back through the live session\n\njsonl file, finds N most-recent recall blocks, extracts memory file\n\npaths, then checks subsequent assistant turns for matching `Read`\n\ntool\n\ncalls. If recall surfaced N paths and 0 got read, that is the failure\n\nsignature.\n\nWired into:\n\n-\n`drift_check`\n\nMCP tool result — runs even when the BGE daemon is unreachable, since the audit is pure file traversal -\n`mid_session_hook`\n\nevery 25 tool calls — only nags when ≥3 unconsumed AND ratio < 0.3 (real signal, not noise)\n\nTested on a 130MB / 32k-line session: 41 recall hits surfaced, 0 consumed.\n\nSmoking gun for \"label != consumption\" drift.\n\n## V7 v0.2 · the governance plan that scales without templates\n\nv1.0.0 shipped a thin V7 governance layer with three tools:\n\n`governance_dispatch`\n\n(fan-out router), `governance_audit`\n\n(cross-agent\n\nfake-closure scanner), `governance_lock_check`\n\n(L0 hash lock for the\n\nimmutable core). 13 MCP tools total.\n\nv0.1 dispatch worked but it was a fan-out router — given `channels=`\n\nit produced one bounty per channel via static dict\n\n[dev.to, x, github]\n\nlookup. A user asked the right question:\n\n千行百业有各种不同的任务类型永远不可能覆盖。\n\nRight. Templates cannot cover the long tail of industries. The platform\n\nside already solved this for *publishing* — channel adapters + anchor\n\npack registry — so adding a new channel or vertical = data change, not\n\ncode change.\n\nv1.1.0 brings the same idea to *decomposition*. The new\n\n`governance_plan`\n\nMCP tool reads two file-exported registries:\n\n-\n`_platform_registry/agents_capabilities.json`\n\n— what each executor declares it can do (id, outputs, optional domains, optional anchor packs) -\n`_platform_registry/anchor_packs_phases.json`\n\n— per-domain DAG of phases, each phase says`requires_capability`\n\nand`depends_on`\n\nFor each phase, V7 ranks executors by capability score (+10 capability\n\nmatch, +5 domain match, +3 anchor pack match), picks the highest, emits\n\na queue file with `depends_on_phase_ids`\n\nso platform-side cron mints\n\nbounties in the right order.\n\nVerified on two domains:\n\n-\n`marketing/dev-tools`\n\n→ 4 phases routed V5/V5/V5/Kairos -\n`caishen-finance/audit`\n\n→ 5 phases · V6 wins for`numeric-audit`\n\n(V5 doesn't declare it · V5 takes write+publish)\n\nAdding `medical/literature-review`\n\nnext: 1 row in `platform_anchor_packs`\n\n- 1 row in\n`platform_agents.metadata.capabilities[]`\n\n. Zero V7 source change. Zero MCP tool surface change.\n\n## What stayed unchanged · the eval headlines\n\nEval numbers are still the v1.0.0 locked numbers from 2026-05-08:\n\n| Metric | nautilus-compass | best public baseline |\n|---|---|---|\n| LongMemEval-S (n=500) | 56.6% |\nZep 55-60% (different judge) |\n| EverMemBench-Dynamic Run 1 |\n44.4% (n=500) |\nMemOS 42.55 |\n| EverMemBench-Dynamic Run 2 |\n47.3% (n=497) |\n— |\n| Drift detector ROC AUC (held-out) | 0.83 |\n— |\n| Reproduction cost |\n$3.50 end-to-end |\n$50+ for GPT-4o-judge stacks |\n\nv1.1.0 doesn't move the eval numbers. It moves the *consumption*\n\nnumbers — the ratio of recall hits whose body actually lands in the\n\nagent's working context. We do not have a clean benchmark for that yet\n\n(suggestions welcome) but in our own sessions it went from \"skim the\n\ntitle and proceed\" to \"rules-in-context by default.\"\n\n## Try it\n\n```\npip install nautilus-compass==1.1.0\n# or\nnpm install nautilus-compass@1.1.0\n```\n\nTwo papers on arxiv (drift detection + memory pipeline). 228 pytests\n\nall green. MIT (anchors CC0).\n\nRepo: [github.com/chunxiaoxx/nautilus-compass](https://github.com/chunxiaoxx/nautilus-compass)\n\nIn-browser drift demo (no install): [huggingface.co/spaces/chunxiaox/nautilus-compass](https://huggingface.co/spaces/chunxiaox/nautilus-compass)\n\n## Postscript · what we believe\n\nRecall != consumption · 看正文才算消费 · 不然命中等于零\n\nLong-running agents drift. They forget rules they read three sessions\n\nago. They reproduce mistakes someone else already paid for. The fix is\n\nnot a smarter model · it is making the rules unmissably present in the\n\nworking context, then auditing whether they were actually consumed,\n\nthen making the audit cheap enough to run every 25 tool calls.\n\nThat is what v1.1.0 ships.", "url": "https://wpnews.pro/news/compass-v1-1-0-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift", "canonical_source": "https://dev.to/chunxiaoxx/compass-v110-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift-4fa0", "published_at": "2026-05-21 18:00:52+00:00", "updated_at": "2026-05-21 18:02:16.882484+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "products"], "entities": ["Compass", "Claude Code", "nautilus-compass", "WeChat"], "alternates": {"html": "https://wpnews.pro/news/compass-v1-1-0-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift", "markdown": "https://wpnews.pro/news/compass-v1-1-0-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift.md", "text": "https://wpnews.pro/news/compass-v1-1-0-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift.txt", "jsonld": "https://wpnews.pro/news/compass-v1-1-0-we-shipped-a-memory-plugin-that-catches-its-own-consumption-drift.jsonld"}}