{"slug": "your-ai-agent-forgets-mine-doesn-t-and-it-works-on-a-plane-in-a-hospital-with", "title": "Your AI agent forgets. Mine doesn't - and it works on a plane, in a hospital, with wifi off.", "summary": "A developer built velesdb-memory, an MCP server that gives AI agents three distinct memory structures—semantic, episodic, and procedural—based on cognitive science principles. Unlike vector-only retrieval, it supports multi-hop graph traversal to answer 'why' questions, and runs entirely offline as a single binary.", "body_md": "Six months ago you recommended switching your client's invoicing tool. Last week they asked why. You have no idea - the conversation happened in three meetings, a Slack thread, and a spreadsheet comparison no one archived. Your AI assistant is useless here too: it only knows what you paste into the prompt.\n\nThis is not a context-window problem. It is a memory architecture problem.\n\nMost \"persistent memory\" solutions for LLMs work by storing past exchanges as text chunks and retrieving them by cosine similarity. Ask \"what did we decide about the invoicing tool?\" and a chunk mentioning the decision floats to the top - if your query looks like the answer.\n\nIt breaks the moment you ask *why*. The reason the CFO pushed back on the original tool was buried in a budget meeting note that shares no words with \"invoicing decision\". Pure vector search is blind to it by construction.\n\nWhat you actually need is three distinct memory structures - the same ones cognitive science has described since the 1970s:\n\n```\n+-----------------+------------------------------+----------------------------------+\n| Type            | What it stores               | Answers                          |\n+-----------------+------------------------------+----------------------------------+\n| Semantic        | Facts, decisions             | What? Why? What is our position? |\n| Episodic        | Events with a timestamp      | When? Who said what?             |\n| Procedural      | Learned patterns + steps     | How do we usually handle this?   |\n+-----------------+------------------------------+----------------------------------+\n```\n\n`velesdb-memory`\n\nis an MCP server that exposes exactly these three subsystems - as five high-level tools your agent can call without knowing anything about vectors, graphs, or databases.\n\nIt is a single binary that speaks the [Model Context Protocol](https://modelcontextprotocol.io/) over stdio. Client and server run on the same machine. Memory never leaves your machine.\n\n```\n+------------------+        stdio/MCP        +-------------------+\n| Claude Code      |  ───────────────────►   | velesdb-memory    |\n| Cursor           |                         | (one binary)      |\n| Cline / Zed      |  ◄───────────────────   |                   |\n| Codex / opencode |                         | vector + graph    |\n+------------------+                         | + columnar store  |\n                                             +-------------------+\n                                                      │\n                                               ~/.velesdb-memory/\n                                               (stays on your disk)\n```\n\nFive tools, all JSON:\n\n| Tool | What it does |\n|---|---|\n`remember` |\nstore a fact, optionally tagged and linked to other memories |\n`recall` |\nsemantic search, with optional metadata filter |\n`relate` |\ncreate a typed edge between two memories |\n`forget` |\ndelete a memory by id |\n`why` |\nrecall + multi-hop graph traversal (the differentiator) |\n\nThere is a sixth tool, `remember_extracted`\n\n, that passes raw text through a local LLM and builds the graph automatically - but you do not need it to understand the core idea.\n\nSofia advises companies on digital transformation. She runs three to five simultaneous engagements, each lasting six months. She needs her AI assistant to remember:\n\nLet us build her memory layer.\n\n```\n# build the binary (Rust toolchain required)\ncargo build --release -p velesdb-memory\n\n# or: cargo install velesdb-memory (when published on crates.io)\n```\n\nThe default build is dependency-free. For real semantic recall, build with Ollama support:\n\n```\ncargo build --release -p velesdb-memory --features ollama\nollama pull all-minilm\n```\n\nThen configure your client. For Claude Code:\n\n```\nclaude mcp add velesdb-memory \\\n  --env VELESDB_MEMORY_PATH=\"$HOME/.velesdb-memory\" \\\n  -- /path/to/velesdb-memory\n```\n\nFor Cursor (`~/.cursor/mcp.json`\n\n), Cline (`cline_mcp_settings.json`\n\n), or any other MCP client:\n\n```\n{\n  \"mcpServers\": {\n    \"velesdb-memory\": {\n      \"command\": \"/path/to/velesdb-memory\",\n      \"env\": { \"VELESDB_MEMORY_PATH\": \"/home/you/.velesdb-memory\" }\n    }\n  }\n}\n```\n\nZed uses a slightly different key (`context_servers`\n\n), Codex uses `codex mcp add`\n\nor a TOML config - full snippets in the [README](https://github.com/cyberlife-coder/VelesDB/tree/develop/crates/velesdb-memory).\n\nOnce configured, the agent discovers the tools automatically. No restarts, no plugins, no API keys.\n\nAt the end of a vendor selection meeting, Sofia's agent calls:\n\n```\n// remember - store a fact with metadata and a typed link to another memory\nremember {\n  \"fact\": \"We recommended Pennylane over Sage for Acme Corp invoicing because Sage lacks multi-currency support and Pennylane's API team offered a 6-month implementation guarantee.\",\n  \"metadata\": { \"project\": \"acme-corp\", \"type\": \"decision\", \"author\": \"sofia\" },\n  \"links\": [ { \"target\": 4820193847, \"relation\": \"follows_from\" } ]\n}\n→ { \"id\": 9876543210 }\n```\n\nThe returned `id`\n\nis stable and derived from the content - storing the same fact twice is idempotent.\n\n```\n// remember - the CFO meeting that triggered the re-evaluation\nremember {\n  \"fact\": \"CFO at Acme Corp: budget cap is 12k EUR per year. Sage renewal is 14.8k. This is the hard constraint that ruled out Sage.\",\n  \"metadata\": { \"project\": \"acme-corp\", \"type\": \"meeting\", \"date\": \"2026-01-15\" },\n  \"links\": [ { \"target\": 9876543210, \"relation\": \"motivated\" } ]\n}\n→ { \"id\": 4820193847 }\nremember {\n  \"fact\": \"Vendor selection for SME finance tools - step 1: map hard constraints (budget, compliance, integration). Step 2: shortlist to 3. Step 3: run a 2-week pilot on live data. Step 4: present with a documented decision matrix.\",\n  \"metadata\": { \"type\": \"procedure\", \"domain\": \"vendor-selection\" }\n}\n→ { \"id\": 1122334455 }\n```\n\nAfter the client signed the contract:\n\n```\nrelate {\n  \"from\": 9876543210,\n  \"to\": 4820193847,\n  \"relation\": \"decided_in\"\n}\n→ { \"edge_id\": 7 }\n```\n\n`why`\n\nquery: what changes everything\nSix months later, Acme Corp asks Sofia why they switched invoicing tools. She asks her agent:\n\n```\nwhy {\n  \"decision\": \"why did we switch from Sage to Pennylane\",\n  \"filter\": { \"project\": \"acme-corp\" },\n  \"max_hops\": 2\n}\n```\n\nThe response:\n\n```\n{\n  \"nodes\": [\n    { \"id\": 9876543210, \"hop\": 0, \"content\": \"We recommended Pennylane over Sage... multi-currency... 6-month implementation guarantee.\" },\n    { \"id\": 4820193847, \"hop\": 1, \"content\": \"CFO at Acme Corp: budget cap is 12k EUR... Sage renewal is 14.8k. This is the hard constraint that ruled out Sage.\" }\n  ],\n  \"edges\": [\n    { \"from\": 9876543210, \"to\": 4820193847, \"relation\": \"decided_in\" }\n  ]\n}\n```\n\nA plain `recall`\n\nquery would have returned the decision text (hop 0, shares words with the query). It would not have returned the CFO meeting note (hop 1) - that note contains \"budget cap\" and \"14.8k\", no words in common with \"why did we switch from Sage to Pennylane\".\n\nThe graph reaches it because the relation exists. That is the gap.\n\nThe `why`\n\nwedge is not a claim - it is measured. The repo ships three reproducible benchmarks with no LLM in the scoring loop (pure retrieval metrics on public datasets):\n\n**Multi-hop recall (graph engine) - HotpotQA, 3000 dev questions:**\n\n```\nvector only:   both bridge facts recalled  →  baseline\nvector + graph: both bridge facts recalled →  +7.2 percentage points on bridge questions\n```\n\nThe win replicates on 2WikiMultiHopQA (+3.1pp on bridged types).\n\n**Time-scoped recall (ColumnStore) - TimeQA (real Wikipedia bios):**\n\n```\nvector only:   gold-sentence recall  →  baseline\nvector + filter: year-range predicate →  +9.7 percentage points\n```\n\nA pure cosine score cannot distinguish \"she won the award in 1987\" from \"she won the award in 2003\". A numeric filter can.\n\n**The engines compound (tri-engine benchmark):**\n\nOn a task that requires both multi-hop traversal and time-scoped filtering:\n\n```\ngraph alone:       +7.2pp\ncolumnstore alone: +9.7pp\nboth together:     +29pp  (more than the sum)\n```\n\nRun any of these yourself:\n\n```\n# multi-hop benchmark\ncargo run --release -p velesdb-memory --example bench_multihop\n\n# time-scoped benchmark\ncargo run --release -p velesdb-memory --example timeqa\n```\n\nThe default binary has zero network dependencies. The memory store is a directory on your disk (`~/.velesdb-memory/`\n\n). The binary is around 9 MB.\n\nWith the default `hash`\n\nembedder, recall is keyword-style (deterministic, good for `why`\n\nbecause the graph does the heavy lifting). For real semantic recall, add Ollama - the model runs locally, so memory still never reaches the internet:\n\n```\nVELESDB_MEMORY_EMBEDDER=ollama \\\nVELESDB_MEMORY_OLLAMA_MODEL=all-minilm \\\n  /path/to/velesdb-memory\n```\n\nThis is not \"privacy-preserving mode\" - it is the only mode. There is no cloud path.\n\nIf you do not want to call `remember`\n\nand `relate`\n\nmanually, the `remember_extracted`\n\ntool does it in one step. It sends raw text to a local LLM (via Ollama), extracts individual facts, wires the entity graph automatically, and stores everything:\n\n```\nremember_extracted {\n  \"text\": \"Met Yannick from the Acme procurement team. He confirmed the board approved the Pennylane migration. The CFO's concern about training cost has been resolved by the vendor's onboarding package.\"\n}\n→ { \"ids\": [11122233, 44455566, 77788899] }\n```\n\nThree facts stored, entity relationships auto-wired, all reachable by `why`\n\n. To enable it:\n\n```\ncargo build --release -p velesdb-memory --features extract\nVELESDB_MEMORY_EXTRACTOR=ollama \\\nVELESDB_MEMORY_EXTRACTOR_MODEL=qwen3:8b \\\n  /path/to/velesdb-memory\n```\n\nThe standard build does not include this - it keeps the default binary tiny and offline.\n\nIf you prefer to embed memory into your own application rather than use the MCP server, the same engine is available as a Python package:\n\n``` python\nimport velesdb\nimport numpy as np\n\ndb = velesdb.Database(\"./sofia_memory\")\nmemory = db.agent_memory(384, snapshot_dir=\"./sofia_memory/snapshots\")  # 384-dim embeddings\n\n# store a fact\ndef embed(text):\n    # use sentence-transformers, Ollama, or any embedder\n    from sentence_transformers import SentenceTransformer\n    m = SentenceTransformer(\"all-MiniLM-L6-v2\")\n    return m.encode(text, normalize_embeddings=True).tolist()\n\nmemory.semantic.store(\n    id=1,\n    content=\"Pennylane chosen over Sage: multi-currency support + budget fits 12k EUR cap\",\n    embedding=embed(\"Pennylane Sage invoicing decision\")\n)\n\n# query\nresults = memory.semantic.query(embed(\"why Pennylane\"), top_k=3)\nfor r in results:\n    print(f\"[{r['score']:.2f}] {r['content']}\")\n\n# episodic: the CFO meeting\nimport time\nmemory.episodic.record(\n    event_id=2,\n    description=\"CFO confirmed: Sage renewal quote is 14.8k, over 12k cap\",\n    timestamp=int(time.time()) - 30 * 86400,  # 30 days ago\n    embedding=embed(\"CFO budget constraint Sage renewal\")\n)\n\n# procedural: a reusable pattern\nmemory.procedural.learn(\n    procedure_id=3,\n    name=\"SME vendor selection\",\n    steps=[\"map hard constraints\", \"shortlist to 3\", \"run 2-week pilot\", \"present decision matrix\"],\n    embedding=embed(\"vendor selection SME procedure\"),\n    confidence=0.9\n)\n\n# reinforce if the pattern worked well\nmemory.procedural.reinforce(procedure_id=3, success=True)\n\n# snapshot to survive restarts\nmemory.snapshot()\npython\npip install velesdb\npython3 -c \"import velesdb; print(velesdb.__version__)\"\n# 3.4.0\n```\n\nThe same engine ships as an npm package with prebuilt platform binaries — no Rust toolchain needed at install time:\n\n```\nnpm install @wiscale/velesdb-memory-node\n```\n\nThe API is a single async class — no subsystems, no embeddings to manage yourself:\n\n``` js\nimport { MemoryService } from '@wiscale/velesdb-memory-node'\n\n// Open (or create) a persistent store. Sync factory, all methods are async.\nconst mem = MemoryService.open('./sofia_memory', 'hash')\n// Use 'ollama' as second arg for real semantic recall (requires Ollama running locally)\n\n// Store a fact — returns its id as a decimal string\nconst decisionId = await mem.remember(\n  'We recommended Pennylane over Sage: multi-currency support + 12k EUR budget cap',\n  [],\n  { project: 'acme-corp', type: 'decision' }\n)\n\n// Store the reason and link it\nconst reasonId = await mem.remember(\n  'CFO confirmed: Sage renewal quote is 14.8k EUR, over the 12k annual cap',\n  [],\n  { project: 'acme-corp', type: 'meeting', date: '2026-01-15' }\n)\n\n// Typed link: decision was motivated by the CFO meeting\nawait mem.relate(decisionId, reasonId, 'decided_in')\n\n// Plain recall — vector similarity\nconst hits = await mem.recall('why Pennylane', 3)\nhits.forEach(h => console.log(`[${h.score.toFixed(2)}] ${h.content}`))\n\n// why() — vector seed + multi-hop graph traversal\nconst { nodes, edges } = await mem.why('why did we switch from Sage to Pennylane', 2)\nnodes.forEach(n => console.log(`hop ${n.hop}: ${n.content}`))\n// hop 0: the decision  →  hop 1: the CFO meeting (no shared words — graph found it)\n```\n\nOne feature is exclusive to the Node.js binding: `recallWhere`\n\n, which combines vector search with ColumnStore range filters in a single call — no Python counterpart:\n\n``` js\n// Recall meetings from the last 90 days only\nconst recent = await mem.recallWhere(\n  'budget constraint',\n  [{ field: 'date', op: 'ge', value: '2026-01-01' }],\n  5\n)\n```\n\nvelesdb-memory is a single-process embedded library. It is not designed for concurrent access from multiple processes, nor for storing millions of memories on behalf of many users. It fits one agent, one user, one machine - which is exactly the shape the use cases above require.\n\nExtraction quality depends on the local model you point `remember_extracted`\n\nat. A smaller model extracts noisier facts than a larger one. The graph and the retrieval engine are solid; the extraction layer is as good as the model you bring.\n\n```\ngit clone https://github.com/cyberlife-coder/VelesDB\ncd VelesDB\ncargo build --release -p velesdb-memory\n./target/release/velesdb-memory --help\n```\n\nDocumentation and examples are at [velesdb.com](https://velesdb.com). If this was useful, a star on the [GitHub repo](https://github.com/cyberlife-coder/VelesDB) helps other developers find the project, and we are always looking for partners with local-first or sovereign data requirements - details on [velesdb.com](https://velesdb.com).\n\nWhich use case resonates most with you - knowledge work (consulting, research, legal), coding assistance, or something else entirely? Drop a comment below.", "url": "https://wpnews.pro/news/your-ai-agent-forgets-mine-doesn-t-and-it-works-on-a-plane-in-a-hospital-with", "canonical_source": "https://dev.to/wiscale-fr/your-ai-agent-forgets-mine-doesnt-and-it-works-on-a-plane-in-a-hospital-with-wifi-off-1ahp", "published_at": "2026-06-29 19:52:58+00:00", "updated_at": "2026-06-29 20:18:47.624844+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools", "artificial-intelligence", "machine-learning"], "entities": ["velesdb-memory", "Claude Code", "Cursor", "Cline", "Ollama", "MCP"], "alternates": {"html": "https://wpnews.pro/news/your-ai-agent-forgets-mine-doesn-t-and-it-works-on-a-plane-in-a-hospital-with", "markdown": "https://wpnews.pro/news/your-ai-agent-forgets-mine-doesn-t-and-it-works-on-a-plane-in-a-hospital-with.md", "text": "https://wpnews.pro/news/your-ai-agent-forgets-mine-doesn-t-and-it-works-on-a-plane-in-a-hospital-with.txt", "jsonld": "https://wpnews.pro/news/your-ai-agent-forgets-mine-doesn-t-and-it-works-on-a-plane-in-a-hospital-with.jsonld"}}