Your AI agent forgets. Mine doesn't - and it works on a plane, in a hospital, with wifi off. A developer built velesdb-memory, an MCP server that gives AI agents three distinct memory structures—semantic, episodic, and procedural—based on cognitive science principles. Unlike vector-only retrieval, it supports multi-hop graph traversal to answer 'why' questions, and runs entirely offline as a single binary. Six months ago you recommended switching your client's invoicing tool. Last week they asked why. You have no idea - the conversation happened in three meetings, a Slack thread, and a spreadsheet comparison no one archived. Your AI assistant is useless here too: it only knows what you paste into the prompt. This is not a context-window problem. It is a memory architecture problem. Most "persistent memory" solutions for LLMs work by storing past exchanges as text chunks and retrieving them by cosine similarity. Ask "what did we decide about the invoicing tool?" and a chunk mentioning the decision floats to the top - if your query looks like the answer. It breaks the moment you ask why . The reason the CFO pushed back on the original tool was buried in a budget meeting note that shares no words with "invoicing decision". Pure vector search is blind to it by construction. What you actually need is three distinct memory structures - the same ones cognitive science has described since the 1970s: +-----------------+------------------------------+----------------------------------+ | Type | What it stores | Answers | +-----------------+------------------------------+----------------------------------+ | Semantic | Facts, decisions | What? Why? What is our position? | | Episodic | Events with a timestamp | When? Who said what? | | Procedural | Learned patterns + steps | How do we usually handle this? | +-----------------+------------------------------+----------------------------------+ velesdb-memory is an MCP server that exposes exactly these three subsystems - as five high-level tools your agent can call without knowing anything about vectors, graphs, or databases. It is a single binary that speaks the Model Context Protocol https://modelcontextprotocol.io/ over stdio. Client and server run on the same machine. Memory never leaves your machine. +------------------+ stdio/MCP +-------------------+ | Claude Code | ───────────────────► | velesdb-memory | | Cursor | | one binary | | Cline / Zed | ◄─────────────────── | | | Codex / opencode | | vector + graph | +------------------+ | + columnar store | +-------------------+ │ ~/.velesdb-memory/ stays on your disk Five tools, all JSON: | Tool | What it does | |---|---| remember | store a fact, optionally tagged and linked to other memories | recall | semantic search, with optional metadata filter | relate | create a typed edge between two memories | forget | delete a memory by id | why | recall + multi-hop graph traversal the differentiator | There is a sixth tool, remember extracted , that passes raw text through a local LLM and builds the graph automatically - but you do not need it to understand the core idea. Sofia advises companies on digital transformation. She runs three to five simultaneous engagements, each lasting six months. She needs her AI assistant to remember: Let us build her memory layer. build the binary Rust toolchain required cargo build --release -p velesdb-memory or: cargo install velesdb-memory when published on crates.io The default build is dependency-free. For real semantic recall, build with Ollama support: cargo build --release -p velesdb-memory --features ollama ollama pull all-minilm Then configure your client. For Claude Code: claude mcp add velesdb-memory \ --env VELESDB MEMORY PATH="$HOME/.velesdb-memory" \ -- /path/to/velesdb-memory For Cursor ~/.cursor/mcp.json , Cline cline mcp settings.json , or any other MCP client: { "mcpServers": { "velesdb-memory": { "command": "/path/to/velesdb-memory", "env": { "VELESDB MEMORY PATH": "/home/you/.velesdb-memory" } } } } Zed uses a slightly different key context servers , Codex uses codex mcp add or a TOML config - full snippets in the README https://github.com/cyberlife-coder/VelesDB/tree/develop/crates/velesdb-memory . Once configured, the agent discovers the tools automatically. No restarts, no plugins, no API keys. At the end of a vendor selection meeting, Sofia's agent calls: // remember - store a fact with metadata and a typed link to another memory remember { "fact": "We recommended Pennylane over Sage for Acme Corp invoicing because Sage lacks multi-currency support and Pennylane's API team offered a 6-month implementation guarantee.", "metadata": { "project": "acme-corp", "type": "decision", "author": "sofia" }, "links": { "target": 4820193847, "relation": "follows from" } } → { "id": 9876543210 } The returned id is stable and derived from the content - storing the same fact twice is idempotent. // remember - the CFO meeting that triggered the re-evaluation remember { "fact": "CFO at Acme Corp: budget cap is 12k EUR per year. Sage renewal is 14.8k. This is the hard constraint that ruled out Sage.", "metadata": { "project": "acme-corp", "type": "meeting", "date": "2026-01-15" }, "links": { "target": 9876543210, "relation": "motivated" } } → { "id": 4820193847 } remember { "fact": "Vendor selection for SME finance tools - step 1: map hard constraints budget, compliance, integration . Step 2: shortlist to 3. Step 3: run a 2-week pilot on live data. Step 4: present with a documented decision matrix.", "metadata": { "type": "procedure", "domain": "vendor-selection" } } → { "id": 1122334455 } After the client signed the contract: relate { "from": 9876543210, "to": 4820193847, "relation": "decided in" } → { "edge id": 7 } why query: what changes everything Six months later, Acme Corp asks Sofia why they switched invoicing tools. She asks her agent: why { "decision": "why did we switch from Sage to Pennylane", "filter": { "project": "acme-corp" }, "max hops": 2 } The response: { "nodes": { "id": 9876543210, "hop": 0, "content": "We recommended Pennylane over Sage... multi-currency... 6-month implementation guarantee." }, { "id": 4820193847, "hop": 1, "content": "CFO at Acme Corp: budget cap is 12k EUR... Sage renewal is 14.8k. This is the hard constraint that ruled out Sage." } , "edges": { "from": 9876543210, "to": 4820193847, "relation": "decided in" } } A plain recall query would have returned the decision text hop 0, shares words with the query . It would not have returned the CFO meeting note hop 1 - that note contains "budget cap" and "14.8k", no words in common with "why did we switch from Sage to Pennylane". The graph reaches it because the relation exists. That is the gap. The why wedge is not a claim - it is measured. The repo ships three reproducible benchmarks with no LLM in the scoring loop pure retrieval metrics on public datasets : Multi-hop recall graph engine - HotpotQA, 3000 dev questions: vector only: both bridge facts recalled → baseline vector + graph: both bridge facts recalled → +7.2 percentage points on bridge questions The win replicates on 2WikiMultiHopQA +3.1pp on bridged types . Time-scoped recall ColumnStore - TimeQA real Wikipedia bios : vector only: gold-sentence recall → baseline vector + filter: year-range predicate → +9.7 percentage points A pure cosine score cannot distinguish "she won the award in 1987" from "she won the award in 2003". A numeric filter can. The engines compound tri-engine benchmark : On a task that requires both multi-hop traversal and time-scoped filtering: graph alone: +7.2pp columnstore alone: +9.7pp both together: +29pp more than the sum Run any of these yourself: multi-hop benchmark cargo run --release -p velesdb-memory --example bench multihop time-scoped benchmark cargo run --release -p velesdb-memory --example timeqa The default binary has zero network dependencies. The memory store is a directory on your disk ~/.velesdb-memory/ . The binary is around 9 MB. With the default hash embedder, recall is keyword-style deterministic, good for why because the graph does the heavy lifting . For real semantic recall, add Ollama - the model runs locally, so memory still never reaches the internet: VELESDB MEMORY EMBEDDER=ollama \ VELESDB MEMORY OLLAMA MODEL=all-minilm \ /path/to/velesdb-memory This is not "privacy-preserving mode" - it is the only mode. There is no cloud path. If you do not want to call remember and relate manually, the remember extracted tool does it in one step. It sends raw text to a local LLM via Ollama , extracts individual facts, wires the entity graph automatically, and stores everything: remember extracted { "text": "Met Yannick from the Acme procurement team. He confirmed the board approved the Pennylane migration. The CFO's concern about training cost has been resolved by the vendor's onboarding package." } → { "ids": 11122233, 44455566, 77788899 } Three facts stored, entity relationships auto-wired, all reachable by why . To enable it: cargo build --release -p velesdb-memory --features extract VELESDB MEMORY EXTRACTOR=ollama \ VELESDB MEMORY EXTRACTOR MODEL=qwen3:8b \ /path/to/velesdb-memory The standard build does not include this - it keeps the default binary tiny and offline. If you prefer to embed memory into your own application rather than use the MCP server, the same engine is available as a Python package: python import velesdb import numpy as np db = velesdb.Database "./sofia memory" memory = db.agent memory 384, snapshot dir="./sofia memory/snapshots" 384-dim embeddings store a fact def embed text : use sentence-transformers, Ollama, or any embedder from sentence transformers import SentenceTransformer m = SentenceTransformer "all-MiniLM-L6-v2" return m.encode text, normalize embeddings=True .tolist memory.semantic.store id=1, content="Pennylane chosen over Sage: multi-currency support + budget fits 12k EUR cap", embedding=embed "Pennylane Sage invoicing decision" query results = memory.semantic.query embed "why Pennylane" , top k=3 for r in results: print f" {r 'score' :.2f} {r 'content' }" episodic: the CFO meeting import time memory.episodic.record event id=2, description="CFO confirmed: Sage renewal quote is 14.8k, over 12k cap", timestamp=int time.time - 30 86400, 30 days ago embedding=embed "CFO budget constraint Sage renewal" procedural: a reusable pattern memory.procedural.learn procedure id=3, name="SME vendor selection", steps= "map hard constraints", "shortlist to 3", "run 2-week pilot", "present decision matrix" , embedding=embed "vendor selection SME procedure" , confidence=0.9 reinforce if the pattern worked well memory.procedural.reinforce procedure id=3, success=True snapshot to survive restarts memory.snapshot python pip install velesdb python3 -c "import velesdb; print velesdb. version " 3.4.0 The same engine ships as an npm package with prebuilt platform binaries — no Rust toolchain needed at install time: npm install @wiscale/velesdb-memory-node The API is a single async class — no subsystems, no embeddings to manage yourself: js import { MemoryService } from '@wiscale/velesdb-memory-node' // Open or create a persistent store. Sync factory, all methods are async. const mem = MemoryService.open './sofia memory', 'hash' // Use 'ollama' as second arg for real semantic recall requires Ollama running locally // Store a fact — returns its id as a decimal string const decisionId = await mem.remember 'We recommended Pennylane over Sage: multi-currency support + 12k EUR budget cap', , { project: 'acme-corp', type: 'decision' } // Store the reason and link it const reasonId = await mem.remember 'CFO confirmed: Sage renewal quote is 14.8k EUR, over the 12k annual cap', , { project: 'acme-corp', type: 'meeting', date: '2026-01-15' } // Typed link: decision was motivated by the CFO meeting await mem.relate decisionId, reasonId, 'decided in' // Plain recall — vector similarity const hits = await mem.recall 'why Pennylane', 3 hits.forEach h = console.log ${h.score.toFixed 2 } ${h.content} // why — vector seed + multi-hop graph traversal const { nodes, edges } = await mem.why 'why did we switch from Sage to Pennylane', 2 nodes.forEach n = console.log hop ${n.hop}: ${n.content} // hop 0: the decision → hop 1: the CFO meeting no shared words — graph found it One feature is exclusive to the Node.js binding: recallWhere , which combines vector search with ColumnStore range filters in a single call — no Python counterpart: js // Recall meetings from the last 90 days only const recent = await mem.recallWhere 'budget constraint', { field: 'date', op: 'ge', value: '2026-01-01' } , 5 velesdb-memory is a single-process embedded library. It is not designed for concurrent access from multiple processes, nor for storing millions of memories on behalf of many users. It fits one agent, one user, one machine - which is exactly the shape the use cases above require. Extraction quality depends on the local model you point remember extracted at. A smaller model extracts noisier facts than a larger one. The graph and the retrieval engine are solid; the extraction layer is as good as the model you bring. git clone https://github.com/cyberlife-coder/VelesDB cd VelesDB cargo build --release -p velesdb-memory ./target/release/velesdb-memory --help Documentation and examples are at velesdb.com https://velesdb.com . If this was useful, a star on the GitHub repo https://github.com/cyberlife-coder/VelesDB helps other developers find the project, and we are always looking for partners with local-first or sovereign data requirements - details on velesdb.com https://velesdb.com . Which use case resonates most with you - knowledge work consulting, research, legal , coding assistance, or something else entirely? Drop a comment below.