Decisions, lessons and project facts live in one SQLite file you own. Fed back to Claude Code, Cursor, Codex and Zed through MCP. Offline, no API keys, no cloud, recall in ~35 ms.
pip install pmb-ai
100% on your machineNo API keysNo cloud, no telemetry
Apache-2.0, open source
Memory that doesn't wait to be asked #
Hooks inject the right memory before the model thinks, and journal the agent's work after, no LLM call on the read path, no tool the agent has to remember to call.
1· Any agent records
2· Surfaced before it answers
Auto-recall on every prompt
Every message is classified in sub-millisecond; the matching lessons, decisions and project overview are fetched for the agent before it reasons.
Sub-millisecond async writes
The MCP tool returns instantly. SQLite first; the embed and LanceDB vector insert run on a background thread, never blocking the turn.
Hybrid recall, ranked
BM25 + dense vectors + entity graph + optional rerank, fused with Reciprocal-Rank-Fusion. One call returns the right thing, ranked.
Lessons that earn their place
Every rule is scored by whether the agent actually follows it. Useful ones get starred; ignored ones are flagged dead, so you prune what doesn't help.
Your memory, as a graph you can explore #
Every fact, decision, lesson, file and entity becomes a node, color-coded by type, sized by importance. Hover one to dim the rest, light up its neighbors, and read the full memory chunk.
0 entities ·
0 connections ·
8 clusters
Every decision, lesson and commit, newest first #
One lane per project, nodes color-coded by event type, connected by soft curves. The same journal that ships in the dashboard, written automatically as you work.
This is the actual dashboard #
A local web app served from your machine. The Map and Timeline above are live recreations, here is the real thing, rendering one project's memory.
The Map · 65,005 connections across 149 clusters, color-coded by kind
What changes when your agent remembers #
Not features, outcomes. This is what persistent memory actually does to your day.
Stop re-explaining your project
Every session starts already knowing your decisions, conventions and the bug you hit last Tuesday. No more pasting the same context into a fresh chat.
Switch tools without losing context
Claude Code, Cursor, Codex and Zed all read the same memory. Your context follows you, not your editor, so changing agents costs nothing.
Memory you can actually trust
PMB scores whether each lesson gets followed and flags the dead ones. It tells you when a memory isn't helping, so your context stays honest, not bloated.
Seven commands, then just talk to your agent #
No account, no keys, nothing leaves your machine. Inspect everything from the terminal, or open the dashboard.
35 ms hybrid recall
One command wires your agent to MCP #
Everything runs over stdio, the server is a child process of your agent. No network, no port, no token.
Claude Code
- Rules appended to your agent's config automatically
- Point several agents at one shared workspace
- Verify the wiring with pmb doctor
Bring your own model, or run it offline #
PMB never calls an LLM on the read path. The optional summarize and graph-extract passes run on whatever you point them at, including a fully local Ollama. Your memory stays yours.
Running in 60 seconds #
Three commands, no account, no config. Then just work the way you already do.
Install
One pip install. Pure Python, runs on macOS, Linux and Windows.
pip install pmb-ai
Connect your agent
Wires PMB into your agent over MCP. Swap in cursor, codex, zed, and more.
pmb connect claude-code
Just talk to it
Work as usual, PMB records and recalls automatically. Open the dashboard any time to explore.
pmb dashboard
Files on your disk, all the way down #
Every event lives in SQLite; vectors live in LanceDB next to it. Copy them anywhere with cp. No server to trust.
Fast, local, and honest about it #
Every number here is measured on PMB's own engine and reproducible from the repo. No cloud, no LLM in the read path, no per-query cost.
Retrieval quality (recall@k)
MRR 0.774 · nDCG@10 0.816LoCoMo-10 · 997 questions · no LLM grader · cache off
Recall latency vs memory size (p50 / p95)
Warm daemon, cache off, local CPU. Real ~100-memory workspace: p50 24 ms. Cached: ~0.15 ms.
It tells you when a memory isn't helping #
Every lesson carries a surface_id. PMB tracks whether the agent actually followed it, confirmed or auto-detected from activity. Rules that get ignored are flagged dead. The ones that earn their place are starred. No vanity metrics.
Built on boring, durable pieces #
No exotic infrastructure. Local files and well-worn libraries, the kind you can still open in five years.
It tends itself #
A year in, recall is still sharp. Memory decays, archives, and dedupes on its own, and never deletes anything behind your back.
Write
Active
Read
Decay
Compact
Archived
You
Daemon
Memory flows left to right and tends itself. Hover a stage to follow the path.
one SQLite file## Straight answers
Does my code or data ever leave my machine? #
No. Everything lives in a local SQLite file with vectors in LanceDB right next to it. There are no network calls on the read path, no account and no telemetry, ever. Unplug the internet and it still works.
How is this different from RAG or a vector database? #
Two ways. Recall is hybrid, BM25 plus dense vectors plus an entity graph, fused and ranked. And it's automatic: the right memory is injected before the model thinks. You don't build a pipeline or hope the agent remembers to call a tool.
Will it slow my agent down? #
No. Recall lands in about 35 ms and writes return in under a millisecond, the embedding and vector insert happen on a background thread, so the turn is never blocked.
Which agents and operating systems are supported? #
Any MCP-aware agent: Claude Code, Cursor, Codex, Zed, Windsurf and more, wired in with one command. PMB is pure Python and tested on macOS, Linux and Windows.
What if a memory is wrong or unhelpful? #
PMB scores whether each lesson actually gets followed and flags the dead ones so you can prune them. It's the rare tool that tells you when its own memory isn't earning its place.
Is it really free? #
Yes. Apache-2.0, open source, free forever. No paid tier, no seats, no telemetry. You own the file and the code.