Your AI agent forgets. Mine doesn't - and it works on a plane, in a hospital, with wifi off.

wpnews.pro

Six months ago you recommended switching your client's invoicing tool. Last week they asked why. You have no idea - the conversation happened in three meetings, a Slack thread, and a spreadsheet comparison no one archived. Your AI assistant is useless here too: it only knows what you paste into the prompt.

This is not a context-window problem. It is a memory architecture problem.

Most "persistent memory" solutions for LLMs work by storing past exchanges as text chunks and retrieving them by cosine similarity. Ask "what did we decide about the invoicing tool?" and a chunk mentioning the decision floats to the top - if your query looks like the answer.

It breaks the moment you ask why. The reason the CFO pushed back on the original tool was buried in a budget meeting note that shares no words with "invoicing decision". Pure vector search is blind to it by construction.

What you actually need is three distinct memory structures - the same ones cognitive science has described since the 1970s:

+-----------------+------------------------------+----------------------------------+
| Type            | What it stores               | Answers                          |
+-----------------+------------------------------+----------------------------------+
| Semantic        | Facts, decisions             | What? Why? What is our position? |
| Episodic        | Events with a timestamp      | When? Who said what?             |
| Procedural      | Learned patterns + steps     | How do we usually handle this?   |
+-----------------+------------------------------+----------------------------------+

velesdb-memory

is an MCP server that exposes exactly these three subsystems - as five high-level tools your agent can call without knowing anything about vectors, graphs, or databases.

It is a single binary that speaks the Model Context Protocol over stdio. Client and server run on the same machine. Memory never leaves your machine.

+------------------+        stdio/MCP        +-------------------+
| Claude Code      |  ───────────────────►   | velesdb-memory    |
| Cursor           |                         | (one binary)      |
| Cline / Zed      |  ◄───────────────────   |                   |
| Codex / opencode |                         | vector + graph    |
+------------------+                         | + columnar store  |
                                             +-------------------+
                                                      │
                                               ~/.velesdb-memory/
                                               (stays on your disk)

Five tools, all JSON:

Tool	What it does
`remember`
store a fact, optionally tagged and linked to other memories
`recall`
semantic search, with optional metadata filter
`relate`
create a typed edge between two memories
`forget`
delete a memory by id
`why`
recall + multi-hop graph traversal (the differentiator)

There is a sixth tool, remember_extracted

, that passes raw text through a local LLM and builds the graph automatically - but you do not need it to understand the core idea.

Sofia advises companies on digital transformation. She runs three to five simultaneous engagements, each lasting six months. She needs her AI assistant to remember:

Let us build her memory layer.

cargo build --release -p velesdb-memory

The default build is dependency-free. For real semantic recall, build with Ollama support:

cargo build --release -p velesdb-memory --features ollama
ollama pull all-minilm

Then configure your client. For Claude Code:

claude mcp add velesdb-memory \
  --env VELESDB_MEMORY_PATH="$HOME/.velesdb-memory" \
  -- /path/to/velesdb-memory

For Cursor (~/.cursor/mcp.json

), Cline (cline_mcp_settings.json

), or any other MCP client:

{
  "mcpServers": {
    "velesdb-memory": {
      "command": "/path/to/velesdb-memory",
      "env": { "VELESDB_MEMORY_PATH": "/home/you/.velesdb-memory" }
    }
  }
}

Zed uses a slightly different key (context_servers

), Codex uses codex mcp add

or a TOML config - full snippets in the README.

Once configured, the agent discovers the tools automatically. No restarts, no plugins, no API keys.

At the end of a vendor selection meeting, Sofia's agent calls:

// remember - store a fact with metadata and a typed link to another memory
remember {
  "fact": "We recommended Pennylane over Sage for Acme Corp invoicing because Sage lacks multi-currency support and Pennylane's API team offered a 6-month implementation guarantee.",
  "metadata": { "project": "acme-corp", "type": "decision", "author": "sofia" },
  "links": [ { "target": 4820193847, "relation": "follows_from" } ]
}
→ { "id": 9876543210 }

The returned id

is stable and derived from the content - storing the same fact twice is idempotent.

// remember - the CFO meeting that triggered the re-evaluation
remember {
  "fact": "CFO at Acme Corp: budget cap is 12k EUR per year. Sage renewal is 14.8k. This is the hard constraint that ruled out Sage.",
  "metadata": { "project": "acme-corp", "type": "meeting", "date": "2026-01-15" },
  "links": [ { "target": 9876543210, "relation": "motivated" } ]
}
→ { "id": 4820193847 }
remember {
  "fact": "Vendor selection for SME finance tools - step 1: map hard constraints (budget, compliance, integration). Step 2: shortlist to 3. Step 3: run a 2-week pilot on live data. Step 4: present with a documented decision matrix.",
  "metadata": { "type": "procedure", "domain": "vendor-selection" }
}
→ { "id": 1122334455 }

After the client signed the contract:

relate {
  "from": 9876543210,
  "to": 4820193847,
  "relation": "decided_in"
}
→ { "edge_id": 7 }

why

query: what changes everything Six months later, Acme Corp asks Sofia why they switched invoicing tools. She asks her agent:

why {
  "decision": "why did we switch from Sage to Pennylane",
  "filter": { "project": "acme-corp" },
  "max_hops": 2
}

The response:

{
  "nodes": [
    { "id": 9876543210, "hop": 0, "content": "We recommended Pennylane over Sage... multi-currency... 6-month implementation guarantee." },
    { "id": 4820193847, "hop": 1, "content": "CFO at Acme Corp: budget cap is 12k EUR... Sage renewal is 14.8k. This is the hard constraint that ruled out Sage." }
  ],
  "edges": [
    { "from": 9876543210, "to": 4820193847, "relation": "decided_in" }
  ]
}

A plain recall

query would have returned the decision text (hop 0, shares words with the query). It would not have returned the CFO meeting note (hop 1) - that note contains "budget cap" and "14.8k", no words in common with "why did we switch from Sage to Pennylane".

The graph reaches it because the relation exists. That is the gap.

The why

wedge is not a claim - it is measured. The repo ships three reproducible benchmarks with no LLM in the scoring loop (pure retrieval metrics on public datasets):

Multi-hop recall (graph engine) - HotpotQA, 3000 dev questions:

vector only:   both bridge facts recalled  →  baseline
vector + graph: both bridge facts recalled →  +7.2 percentage points on bridge questions

The win replicates on 2WikiMultiHopQA (+3.1pp on bridged types).

Time-scoped recall (ColumnStore) - TimeQA (real Wikipedia bios):

vector only:   gold-sentence recall  →  baseline
vector + filter: year-range predicate →  +9.7 percentage points

A pure cosine score cannot distinguish "she won the award in 1987" from "she won the award in 2003". A numeric filter can.

The engines compound (tri-engine benchmark):

On a task that requires both multi-hop traversal and time-scoped filtering:

graph alone:       +7.2pp
columnstore alone: +9.7pp
both together:     +29pp  (more than the sum)

Run any of these yourself:

cargo run --release -p velesdb-memory --example bench_multihop

cargo run --release -p velesdb-memory --example timeqa

The default binary has zero network dependencies. The memory store is a directory on your disk (~/.velesdb-memory/

). The binary is around 9 MB.

With the default hash

embedder, recall is keyword-style (deterministic, good for why

because the graph does the heavy lifting). For real semantic recall, add Ollama - the model runs locally, so memory still never reaches the internet:

VELESDB_MEMORY_EMBEDDER=ollama \
VELESDB_MEMORY_OLLAMA_MODEL=all-minilm \
  /path/to/velesdb-memory

This is not "privacy-preserving mode" - it is the only mode. There is no cloud path.

If you do not want to call remember

and relate

manually, the remember_extracted

tool does it in one step. It sends raw text to a local LLM (via Ollama), extracts individual facts, wires the entity graph automatically, and stores everything:

remember_extracted {
  "text": "Met Yannick from the Acme procurement team. He confirmed the board approved the Pennylane migration. The CFO's concern about training cost has been resolved by the vendor's onboarding package."
}
→ { "ids": [11122233, 44455566, 77788899] }

Three facts stored, entity relationships auto-wired, all reachable by why

. To enable it:

cargo build --release -p velesdb-memory --features extract
VELESDB_MEMORY_EXTRACTOR=ollama \
VELESDB_MEMORY_EXTRACTOR_MODEL=qwen3:8b \
  /path/to/velesdb-memory

The standard build does not include this - it keeps the default binary tiny and offline.

If you prefer to embed memory into your own application rather than use the MCP server, the same engine is available as a Python package:

import velesdb
import numpy as np

db = velesdb.Database("./sofia_memory")
memory = db.agent_memory(384, snapshot_dir="./sofia_memory/snapshots")  # 384-dim embeddings

def embed(text):
    from sentence_transformers import SentenceTransformer
    m = SentenceTransformer("all-MiniLM-L6-v2")
    return m.encode(text, normalize_embeddings=True).tolist()

memory.semantic.store(
    id=1,
    content="Pennylane chosen over Sage: multi-currency support + budget fits 12k EUR cap",
    embedding=embed("Pennylane Sage invoicing decision")
)

results = memory.semantic.query(embed("why Pennylane"), top_k=3)
for r in results:
    print(f"[{r['score']:.2f}] {r['content']}")

import time
memory.episodic.record(
    event_id=2,
    description="CFO confirmed: Sage renewal quote is 14.8k, over 12k cap",
    timestamp=int(time.time()) - 30 * 86400,  # 30 days ago
    embedding=embed("CFO budget constraint Sage renewal")
)

memory.procedural.learn(
    procedure_id=3,
    name="SME vendor selection",
    steps=["map hard constraints", "shortlist to 3", "run 2-week pilot", "present decision matrix"],
    embedding=embed("vendor selection SME procedure"),
    confidence=0.9
)

memory.procedural.reinforce(procedure_id=3, success=True)

memory.snapshot()
python
pip install velesdb
python3 -c "import velesdb; print(velesdb.__version__)"

The same engine ships as an npm package with prebuilt platform binaries — no Rust toolchain needed at install time:

npm install @wiscale/velesdb-memory-node

The API is a single async class — no subsystems, no embeddings to manage yourself:

import { MemoryService } from '@wiscale/velesdb-memory-node'

// Open (or create) a persistent store. Sync factory, all methods are async.
const mem = MemoryService.open('./sofia_memory', 'hash')
// Use 'ollama' as second arg for real semantic recall (requires Ollama running locally)

// Store a fact — returns its id as a decimal string
const decisionId = await mem.remember(
  'We recommended Pennylane over Sage: multi-currency support + 12k EUR budget cap',
  [],
  { project: 'acme-corp', type: 'decision' }
)

// Store the reason and link it
const reasonId = await mem.remember(
  'CFO confirmed: Sage renewal quote is 14.8k EUR, over the 12k annual cap',
  [],
  { project: 'acme-corp', type: 'meeting', date: '2026-01-15' }
)

// Typed link: decision was motivated by the CFO meeting
await mem.relate(decisionId, reasonId, 'decided_in')

// Plain recall — vector similarity
const hits = await mem.recall('why Pennylane', 3)
hits.forEach(h => console.log(`[${h.score.toFixed(2)}] ${h.content}`))

// why() — vector seed + multi-hop graph traversal
const { nodes, edges } = await mem.why('why did we switch from Sage to Pennylane', 2)
nodes.forEach(n => console.log(`hop ${n.hop}: ${n.content}`))
// hop 0: the decision  →  hop 1: the CFO meeting (no shared words — graph found it)

One feature is exclusive to the Node.js binding: recallWhere

, which combines vector search with ColumnStore range filters in a single call — no Python counterpart:

// Recall meetings from the last 90 days only
const recent = await mem.recallWhere(
  'budget constraint',
  [{ field: 'date', op: 'ge', value: '2026-01-01' }],
  5
)

velesdb-memory is a single-process embedded library. It is not designed for concurrent access from multiple processes, nor for storing millions of memories on behalf of many users. It fits one agent, one user, one machine - which is exactly the shape the use cases above require.

Extraction quality depends on the local model you point remember_extracted

at. A smaller model extracts noisier facts than a larger one. The graph and the retrieval engine are solid; the extraction layer is as good as the model you bring.

git clone https://github.com/cyberlife-coder/VelesDB
cd VelesDB
cargo build --release -p velesdb-memory
./target/release/velesdb-memory --help

Documentation and examples are at velesdb.com. If this was useful, a star on the GitHub repo helps other developers find the project, and we are always looking for partners with local-first or sovereign data requirements - details on velesdb.com.

Which use case resonates most with you - knowledge work (consulting, research, legal), coding assistance, or something else entirely? Drop a comment below.

source & further reading

dev.to — original article Need a break? Play today's game from The Daily Context. The $500M Claude Code Problem: Why Most Teams Pay 3x What They Should for AI Coding 🤖 I Built an AI Company With Paperclip AI

Your AI agent forgets. Mine doesn't - and it works on a plane, in a hospital, with wifi off.

Run your AI side-project on zahid.host