Two Pools, One Record: The Architecture of a Memory Engine for AI Agents

wpnews.pro

Personal memory and project memory, joined by a pointer. Async writes, three read depths, time as a first-class citizen, and memory that grows skills. The blueprint, with the honest line between designed and built.

Part 2 of 2 on building memory for AI agents. Part 1 was the “why”: the seven problems. This part is the architecture that answers them. ~12 min read.

Part 1 ended with a claim: memory is seven problems wearing one name, and most of them are data-model, time, and governance problems rather than retrieval problems.

This part is the design that answers them, one by one. It is the memory layer of a desktop AI workspace I am building: one chat surface that handles general work, coding, connected apps, and questions over your own data (the analytics engine from my earlier series is the structured-data half of that same system). The memory engine is designed standalone: a local-first library with a CLI and an MCP server, usable from any agent that speaks the protocol, with the desktop app as just another consumer.

Before the architecture, the honesty box. I did this in my earlier series and readers told me it was the most useful part.

Where this stands, up front:

The analytics engine from my earlier series: ships today; the desktop shell now being built wires it in as its first engine.

The memory engine in this article: design complete, build under way. The design is grounded in a research pass over the published field (the memory products, the agent frameworks, and the academic line of work), and every load-bearing choice below cites the evidence that earned it.

Nothing in this article is speculative hand-waving, but it is a blueprint being executed, not a retrospective. I describe the design in the present tense throughout, the normal register for an architecture; the status ledger at the end is the record of what actually runs today.

TL;DR

Two pools, never merged: personal memory (about you) and project memory (about a codebase), joined by a pointer. Coding reads both; neither leaks into the other.

One typed record with five kinds, two timelines on every row, and provenance stamped from day one.

All model work is asynchronous, on the write path. The read path is pure search in three depths: an always-on profile (about 50 ms), on-demand recall, and a deep dive into the raw archive.

Forgetting is a reversible score, configurable per kind and volatility, with new memories quarantined until reinforced.

Memory enforces access but never decides it: it is handed a pass, honours it on every read, and drops memories whose sources you lost.

Project memory rides the repo itself (an op-log on a git ref), so a team gets shared memory with no server.

The payoff: memory does not just recall facts, it grows skills, and keeps only a reference to them.

Where memory sits in the desktop

The workspace composes a small number of engines behind stable contracts. The memory engine is one of them, and the diagram below is the map of how the pieces connect. Everything else in this article zooms into the memory box.

How to read it:

The desktop is a thin surface: it holds no source data and no enterprise credentials. The orchestrator picks tools per turn and calls every engine over the same protocol (MCP).

The memory engine is not a feature of the app. It is a standalone product the app embeds. The same gateway serves the desktop, a terminal CLI, and external coding agents. One core, many adapters: a lesson learned from watching a memory vendor sunset its separate server in favor of one unified one.

The red box matters most: an identity layer signs an access pass (who is asking, what they may currently see), and every engine honours it. Memory consumes this pass; it never computes policy. More on that below, because it is the most opinionated boundary in the design.

Model access is a configurable provider layer: your own key, your own endpoint, or a bundled local model. Memory’s neutrality is structural, not a promise.

Two pools, one pointer

The spine of the whole design. Memory is not one undifferentiated blob; it is two pools with different owners, different privacy, and different lifecycles, and they never merge.

Personal memory is about you: your style, your preferences, your drafts, the way you like reports formatted. It is built from your chats and your connected-app work, it lives in a store you own, and it follows you.

Project memory is about a codebase: its decisions, conventions, and gotchas (“uses Snowflake,” “auth is fragile,” “tests live in /spec”). It is built when a coding agent works in the repo, and it belongs with the repo and the team who share it.

How to read it:

The two pools are joined by a pointer: a graph edge in your personal pool that says you work on repo X. A coding turn resolves both pools through the pointer and merges results at read time, with your own facts ranked above generic project facts.

Why not one store? Different owners (you versus the repo), different privacy (private versus team-shared), different lifecycles. When you leave a project, the project memory stays for the team and your personal memory leaves with you. A merged store can do neither cleanly. This was problem seven from Part 1, answered structurally.

A detail that took real thought: the project pool is keyed by the repo’s own identity, chosen deliberately so that a teammate cloning the same repo shares the pool while a fork starts clean instead of silently inheriting (and leaking back into) the upstream team’s memory.

One typed record

Everything in both pools is one record shape. A kind selector picks the class; everything downstream (decay, sharing, audit, access) is written once against the one shape.

In sketch form (the conceptual shape, not the schema):

a memory item =  a kind        (one of five: fact, note, edge, procedure reference, persona)  the content   plus a model-written one-line description, which is the retrieval key  two timelines (true-in-the-world from/until · learned/retired by the system)  provenance    (who wrote it · which source it came from)  signal        (how much it mattered at write time · whether use keeps reinforcing it)

The five kinds, and why five rather than one:

exact fact (What goes in it: stable exact attributes: timezone, “uses Vim”; How it is retrieved: exact key lookup, hash-deduplicated)

semantic note (What goes in it: long-tail prose preferences and notes; How it is retrieved: vector search, hybrid with keyword)

relationship edge (What goes in it: anything relational or true-over-time: works on, switched from, decided; How it is retrieved: graph traversal, plus the cross-pool pointer)

procedure reference (What goes in it: a pointer plus stats for a learned procedure (feeds skills); How it is retrieved: by tags and usage stats)

persona block (What goes in it: the always-on identity and tone block; How it is retrieved: never searched; injected)

Three design notes that carry more weight than they look like they do:

The embedding key is the enriched one-line description, not the raw content. This comes straight from the agentic-memory research line, where embedding enriched text measurably improved recall.

The relation vocabulary is deliberately tiny: a handful of predicates (prefers, works on, decided, and a few peers), and each one exists because a feature reads it. I am not building an ontology. The field’s evidence is blunt here: one major memory product removed its heavy graph store entirely, and the schema-light systems kept winning. Heavy ontology belongs in my analytics engine’s schema graph, which is a different graph for a different job.

Provenance is not optional metadata. Who wrote a memory makes multi-author merge possible; which source it came from is what access revocation queries. Sharing and security are built on those two facts, and retrofitting them later would mean rewriting the store.

The write path: the model works the night shift

All model intelligence runs asynchronously, triggered at session end, on idle, near context-window pressure, or on an explicit “remember this.” The only synchronous writes are a cheap raw append (the safety net) and a tiny profile delta, so personalization stays fresh without the user ever waiting on a model.

The pipeline: extract typed salient facts, enrich (description, keywords, importance, world-time), embed the enriched text, route to one of the five kinds, stamp provenance, check for skill candidates (a repeated successful procedure), then reconcile.

Reconcile is where the design earns its keep, because it is asymmetric by realm:

Local (personal or private project): retrieve the nearest neighbors and make one decision per candidate: add, update, invalidate, or no-op. A conservative writer, because the benchmark evidence says over-retention hurts.

Shared (a team repo):append-or-invalidate only, never an in-place edit. Every write is an operation: add this fact, or invalidate that one by id. This single property is why multiple authors merge without conflicts, and it pairs with the storage model below.

A deterministic security scan (injection patterns, secrets, invisible Unicode) gates every write, and every operation lands in a journal. The journal is the audit trail and the cache invalidation signal at once.

The read path: three depths, zero model calls

Reads are pure search. Three escalating depths:

HOT (When: every turn, automatic; Cost: zero model calls, ~50 ms; What it returns: a cached ~500-token profile (static + dynamic))

WARM (When: on demand, the recall tool; Cost: zero model calls, fast search; What it returns: ranked facts and edges relevant to the query)

DEEP (When: explicit, rare; Cost: heavier: graph walk + archive read; What it returns: a memory neighborhood, the original transcript, the as-of history)

HOT answers the “agent may not search” failure from Part 1 by removing the choice: the profile is injected every turn by a hook, deterministically. It sits at the stable prompt prefix, so it is prompt-cache friendly, and it is re-injected from the durable store after every compaction. Inside a project, the budget splits roughly half personal, half project summary.

WARM is hybrid retrieval: vector candidates, boosted by keyword match and graph proximity, reranked with decay applied, merged across the personal pool and the active project pool through the pointer, access-checked before ranking.

DEEP is the double-click: walk the graph around a memory, open the original transcript it came from, see what was believed when.

One honesty-driven mechanism sits in front of all three: a corpus-size router. The Salesforce ConvoMem results showed naive context stuffing beats structured memory below roughly 150 conversations. So on a small corpus this engine deliberately does the dumb thing (recent turns, cheap scan) and engages the full machinery only as the corpus grows past the point where brute force stops winning. Building sophisticated machinery and then not using it when simple wins is, I have come to believe, what production-grade actually means.

Under the reads sits a three-store flow, and it is the same pattern git uses:

How to read it:

The op-log is the truth and the sync unit; the database is a disposable projection rebuilt by replaying it; the raw archive is local-only fuel for deep dives and for the curator. Reads never touch the log directly.

Git users will recognize the shape: the commit log is truth, the working tree is the fast view built from it. That resemblance is not cosmetic, and the sharing section below cashes it in.

Time and forgetting: the two unglamorous pillars

Time. Every record carries both timelines from Part 1: world time (true from, true until) and system time (learned at, retired at). On contradiction, the old fact is invalidated, never deleted: it keeps its history, the new fact takes over, and the store can answer both “what was true in March” and “what did the agent believe in March.” Hard deletion exists for exactly one purpose: an explicit “forget this” (and its regulatory cousins).

Forgetting is a score, not a delete. Each memory’s effective salience is its relevance multiplied by an exponential decay over age and last access, by its importance, and by a reinforcement term: memories that keep getting used resist decay.

The configuration surface is deliberately small: half-lives set per volatility (months for stable preferences like “writes British English,” weeks for situational facts like “busy with the Q3 launch”), a first-session quarantine, kinds that never time-decay, a reversible archive sweep at a floor, and one switch to turn the whole thing off per pool.

Three details I consider load-bearing:

First-session quarantine. Onboarding brain-dumps and inherited defaults are down-weighted until reinforced by real use. Day-one enthusiasm should not masquerade as long-term signal.

Skills never time-decay. A procedure that worked fifty times does not become false by sitting unused. Skills retire on evidence (unused and success rate dropped), with proposed retirement, not silent deletion.

The honesty note: these half-lives are design defaults, not measured truths. As Part 1 showed, no public benchmark scores forgetting, so an in-house evaluation harness ships as a build item, not a someday: accuracy, latency, and token cost as a Pareto triple, with explicit credit for abstention, contradiction handling, and correct forgetting. The first gate is deliberately tool-free: bootstrap the engine on my own history, review 50 extractions by hand, require at least 45 correct and zero fabricated.

The design gives the desktop a native memory panel to surface all of this: the profile in plain language (editable, edits write back through the gateway), an interactive graph of your memory with an as-of time slider (drag to see what the store believed at any date, powered by the bitemporal model), a decay view with a “what will be forgotten soon” preview, and a consent inbox for anything proposed to be shared. Memory you cannot inspect is memory you will not trust.

The boundary I refuse to cross: memory does not do access control

This is the most opinionated decision in the design, and I think the most defensible.

The memory engine does not implement permissions. It is handed a signed access pass by the identity layer (who is asking, what they may currently see), verifies it locally, and honours it. Two mechanical rules fall out:

No laundering on write. A fact derived from a source a person could not see never enters a pool wider than that source. Unknown provenance fails closed: it stays personal-only, never shareable.

Re-check on every recall. Every record carries the source it came from. When a grant is revoked, everything derived from that source is quarantined in one pass at the revocation event, and recall re-checks cheaply from then on. The property this buys is worth stating plainly: memory never outlives a grant. Disconnect a data source and the memories distilled from it stop surfacing.

The same structural humility applies to the learning itself. Where the model runs is tiered by data sensitivity, not by task difficulty: raw, undistilled content (transcripts, mailboxes) is processed by a small local model on infrastructure the user controls, full stop. A larger cloud model is used only for already-cleaned, consented material. Extraction is a narrow task; a small model is good enough exactly where it matters. The field has already produced a cautionary tale here: a recent tool marketed as local-first was found up file contents and shell history to its cloud. Trust in a memory product is structural or it is nothing.

And one flag I keep open because intellectual honesty demands it: per-item visibility checks cannot stop a model from blending several individually-allowed memories into a conclusion the person was never cleared for. That is a synthesis-governance problem above the memory layer, and anyone who claims their memory store solves it is selling something.

Memory grows skills

Here is where memory stops being a notebook. Most memory products remember facts. The more valuable thing to remember is how you do things.

The engine watches the trace of what you and your agents actually did. When it sees a repeated, successful procedure, it distills it into a candidate skill: a plain-markdown skill file (the emerging cross-agent SKILL.md format), proposed to the user as a diff, never auto-saved. Memory keeps a procedure reference: the pointer plus success statistics. The skill body lives in a skill folder as a real file; memory holds the reference, never the body.

How to read it:

Skills follow the two-pool spine: a personal habit graduates into your personal skill folder; a repo pattern graduates into the project’s folder and travels with the repo to the whole team.

Maturity is a ladder (candidate, verified, honed, tuned), and the top rung is mechanical: an offline skill optimizer that proposes bounded text edits accepted only if a held-out score strictly improves, landing as a reviewed pull request affecting new sessions only. Published work on exactly this loop reports gains around twenty points on coding-agent harnesses, which is why the optimizer is a port in the design with two interchangeable implementations behind it.

The same no-laundering rule applies to graduation: a skill distilled from private context never auto-promotes into a shared folder. And precedence on a name clash (personal beats project beats organization) applies to how-to skills only; policy is never an overridable skill.

Candor about the competitive landscape, because pretending otherwise would be silly: the capture-a-pattern-grow-a-skill loop is rapidly becoming table stakes; large open-source agent ecosystems ship versions of it today. So the interesting engineering is not the loop. It is governing it: local-first distillation, no laundering, evidence-based retirement, auditable history, and one engine growing skills across every surface instead of a separate learner per tool.

Sharing without a server: memory rides the repo

The last piece, and my favorite, because it converts infrastructure into a property of something teams already have.

Personal memory never syncs to anyone else. But project memory is only useful if the team shares it, and standing up a memory server for a five-person repo is friction nobody wants. The answer falls out of the storage model: the project pool’s op-log lives on a dedicated git ref in the repo itself and travels on push and pull, exactly like the code. (Precedent: the git-bug project has run an operation-based store over git refs for years, serverless.)

Writes are per-author: each contributor appends only to their own partition file, so git never sees a merge conflict.

Reading is the merge: recall loads the union of every author’s partition and searches across all of it. A departed teammate’s still-true facts remain; wrong ones get invalidated by a newer op on the corrector’s own partition; unused ones fade by decay.

Access control at this level is the repo’s own permission, deterministically: whoever the git host lets fetch the repo gets the pool. The signed access pass takes over in organization-scale deployments, where an identity layer actually exists.

One caveat I will not hide: revocation here is git-grade, exactly like the code itself. A removed collaborator stops receiving new operations but keeps what they already pulled. Grant-grade drop-on-revoke is precisely what the identity layer adds at organization scale, and pretending the repo level already has it would be the kind of dishonesty this series exists to avoid.

Onboarding to a codebase stops being “read the wiki and bother the senior engineer.” Clone the repo, and the team’s accumulated memory of why arrives with it.

The status ledger: designed versus building

The same honesty as Part 1 of my earlier series, in one table.

The analytics engine (earlier series): shipping; its desktop integration is the next milestone

The memory blueprint (record, pools, paths, decay, access model): design complete, grounded in the published evidence, frozen as the build contract

The spine: record + store + gateway + journal, round-tripping a fact: in build now (the walking skeleton, with CI and tests from day one)

Bootstrap: distilling my own coding history into day-one memory: next, with the hand-graded 50-extraction gate (at least 45 correct, zero fabricated)

Bitemporal contradiction handling, decay, the evaluation harness: designed, sequenced behind the spine

Skills graduation, repo-ref sharing, the memory panel: designed, sequenced further out

Organization-scale deployment (team synthesis, signed-pass enforcement end to end): designed as placeholders: the foundations exist day one; they activate later without a rewrite

That last row reflects the build philosophy in one line: architect for the team and the org, build for the solo user first, and make every later stage a matter of filling fields and pointing ports at new backends rather than rewriting foundations.

Key takeaways

Two pools, one pointer. Personal and project memory have different owners, privacy, and lifecycles. Keep them separate, join them with a graph edge, and read both at coding time.

One typed record, five kinds, two timelines, provenance from day one. Every downstream mechanism is written once. Retrofitting bitemporality or provenance later is the expensive path.

Model on the write path, never the read path. Async curation writes; deterministic search serves. The always-on profile removes the “agent may not search” failure; the corpus-size router keeps the machinery honest on small data.

Forgetting is a reversible, configurable score, and the evaluation harness for it has to be built in-house, because public benchmarks reward hoarding.

Memory honours access, never computes it. No laundering on write, re-gate on recall, memory never outlives a grant in pass-enforced deployments, and raw content never leaves infrastructure the user controls for distillation.

Facts are table stakes; skills are the payoff. Repeated successful procedures graduate into portable skill files, proposed never auto-saved, governed by the same provenance rules as everything else.

The repo is the team’s memory server. An append-only op-log on a git ref gives a team shared project memory with zero infrastructure.

This closes the two-part memory series. With the analytics series (the four parts on conversational analytics) and this pair, the written record now covers two of the engines behind the workspace: the one that answers questions about your data, and the one that remembers. The connective tissue between them, the desktop surface that composes the engines and routes every request, gets its own write-up next.

Further reading on the mechanisms borrowed here: Zep’s bitemporal memory graph (invalidate-don’t-delete), the A-MEM paper (embed enriched descriptions), Salesforce’s ConvoMem (when naive beats structured), the SkillOpt paper (validation-gated skill optimization), and git-bug (operation logs on git refs, the sharing precedent).

Part 1 of this pair covered the why: the measurable cliff, the seven problems, and why a bigger context window is the wrong layer.

source & further reading

pub.towardsai.net — original article AsyncIO in Python: What It Actually Is and Why Your ‘Async’ Code Might Not Be Async Building Long-Running Claude Managed Agents: Why State Matters More Than Compute The Building Blocks of LangGraph (Part 0)

Two Pools, One Record: The Architecture of a Memory Engine for AI Agents

Run your AI side-project on zahid.host