cd /news/ai-agents/your-ai-agent-craves-curation-heres-… · home topics ai-agents article
[ARTICLE · art-21990] src=dev.to pub= topic=ai-agents verified=true sentiment=↑ positive

Your AI Agent Craves Curation. Here’s the FADEMEM Memory Architecture That Delivers It.

VEKTOR Slipstream v1.6.3, a local-first memory SDK for AI agents, introduces the FADEMEM memory architecture that actively curates and decays agent memories rather than storing everything indefinitely. The update, based on a February 2026 research paper from Alibaba and Peking University, classifies memories into long-term and short-term layers with different decay rates—long-term memories have an 11-day half-life while short-term memories decay four times faster. The SDK runs entirely on the user's machine with no cloud backend, storing all memory in a single SQLite file and running embeddings locally on CPU with no API calls or per-token costs.

read15 min publishedJun 4, 2026

You have explained your tech stack to your coding agent four times this month. You mentioned your preferred approach to a problem in January, and your agent has no idea it ever happened.

You corrected a decision last week and the old version is still surfacing. You set up context at the start of every session because there is nowhere for it to go at the end.

This is not a model problem, as GPT-4, Claude, and Gemini all have the same limitations. The model is stateless. They all have inbuilt memory, and still every session starts from zero unless you have the infrastructure to persist what matters and surface it at the right moment. That sophisticated memory infrastructure is what most developers do not have.

VEKTOR Slipstream v1.6.3 is a local-first memory SDK for AI agents. This release adds the layer most memory systems skip: not just storing what you tell it, but managing what should still be there months later: curation.

What you actually get

Before the architecture: What changes for you as a developer embedding this SDK.

Every AI memory system forces decisions you didn’t realise you were making. Where does your agent’s context actually lives, is it on your machine or on someone else’s server? Are you paying per token every time your agent understands a memory, or does that happen locally? When you connect your GitHub, your calendar, your files — where does all that data go, and who can see it? Most memory systems answer all four questions for you, quietly, in their terms of service.

VEKTOR’s answer to all four is the same: your machine, your data, your rules. Memory lives in a single SQLite file you own. Embeddings run locally on CPU — no API calls, no per-token cost, no data leaving the process. MCP connectors spawn as local stdio processes; nothing is routed through an external service. There is no telemetry, no cloud sync, no account required. If you want to understand exactly what your agent knows about you, you open the database with any SQLite browser and read it. That is what local-first actually means.

Your agent stops asking you to repeat yourself. Decisions, preferences, project context, and personal facts persist across sessions and surface when relevant without being re-explained. A context you registered in January is still there in June — if it is still relevant. If it is not, it has faded and stopped competing with what is actually current.

Your agent stops surfacing contradictions. When you update a fact, the old version does not linger as an equally valid memory. The conflict resolver determines which one wins based on source trust and recency, and the loser is quietly retired rather than deleted — preserved for audit but excluded from recall.

Your agent’s memory stays a manageable size. Without active management, memory graphs grow indefinitely. Every new project adds nodes that never leave. v1.6.3 introduces per-source budgets, automatic decay, and cold storage, so the graph reflects what is currently relevant rather than everything that has ever been stored.

You do not need a cloud backend. One SQLite file. Runs on a laptop. No API calls to a cloud host memory service, no extra costs for connectors. No data leaving your machine.

The architecture: what is new in v1.6.3

Decay: memory that fades when it should

The new vektor-decay.js implementation uses the FadeMem architecture from a February 2026 paper https://arxiv.org/abs/2601.18642 by researchers at Alibaba and Peking University. To our current knowledge, at this time VEKTOR is one of the first production SDK implementations of this research.

The core idea: memories age differently depending on whether you use them. Every memory is classified as Long-term Memory Layer (high importance, frequently recalled) or Short-term Memory Layer (lower importance, infrequently accessed). LML memories decay slowly—roughly an 11-day half-life at default settings. SML memories decay four times faster.

What drives the tier assignment is not just what you set when you stored it. Importance recalculates as a weighted function of semantic relevance to your current goals, access frequency, and position in the causal graph. A memory you actually revisit weekly climbs. One you flagged as important and never touched again gradually drifts down.

The FadeMem paper reports 45% storage reduction versus append-only systems at equivalent recall quality. Their ablation shows that removing the dual-layer architecture alone drops multi-hop reasoning F1 by 33.9%. Conflict resolution removal drops it by 22.4%. These are the components now live in VEKTOR’s REM cycle.

Conflict resolution: memory that keeps itself consistent

vektor-conflict.js compares every new memory against existing ones above a similarity threshold. When it finds overlap, it classifies the relationship across five outcomes: the new memory supersedes the old, both coexist as independently valid, the new is subsumed by something already known, the new is more general and absorbs the old, or it is a duplicate and nothing changes.

Trust determines who wins. The system maps source type and actor type to a trust score. A direct user note scores 1.0. An automated bot event scores 0.28. A low-trust source cannot overwrite a high-trust one regardless of recency — your CI pipeline cannot quietly overwrite a decision your team made.

The FadeMem paper measures 68.9% macro-averaged accuracy across three conflict types (contradiction, update, overlap). That is the baseline the production implementation is building toward.

Standing queries: memory that knows what you are working on

vektor-standing.js synthesises your current priorities weekly from your top-importance recent memories. The output is a small set of embedded goal statements stored in the database. Every new memory that arrives is scored for relevance against these goals before being assigned a tier.

A commit directly relevant to an active project gets a higher initial importance score than one with no connection to your current work. This is what makes the system context-aware rather than just content-aware — it knows what matters to you right now, not just what was true in general.

The standing queries are rebuilt automatically. They expire after 14 days and are replaced by a fresh synthesis from whatever the graph currently shows as important.

The Curated Graph Problem

There is a generation of developers who already solved the memory problem for themselves—manually. They have Obsidian vaults with thousands of notes. Daily journals. Project folders. Linked references between decisions and their outcomes. Graph views that map the shape of their working life.

These are people who recognized something real: continuity matters. Context that survives across days, projects, and collaborators is worth maintaining. The graph view in Obsidian is not a gimmick. It is a legible map of how knowledge connects.

The problem is the maintenance commitment. A well-kept vault is a part-time job. You have to decide what to keep. Prune notes that became irrelevant when a project died. Resolve the tension when two notes contradict each other. Make sure the decision from January does not sit alongside the reversal from March as if both are equally true. Most vaults, if you are honest about it, are archaeological dig sites. Layers of old context competing with new ones, none of it expiring, all of it demanding your attention to sort, refine and interpret.

VEKTOR is a different answer to the same instinct. Not a vault you curate — a memory graph that curates itself, as in the future your llm tools will curate all of your data for you anyway; we are getting closer to that realization every day.

When you store a fact about a project, it arrives with an importance score derived from how relevant it is to what you are currently working on. When you update a decision, the old version does not persist as an equally valid note. The conflict resolver determines which one wins and retires the other to cold storage, as they are still there if you need the history, but excluded from active recall. When a project ends and you stop referencing its memories, they decay naturally over weeks without you deleting anything. When you start a new project, the context that matters most surfaces on its own because the standing query system has been tracking what you are actually focused on.

The underlying structure is SQL, not markdown. That means it cannot be opened in Obsidian. But it means the graph can do things a vault cannot: enforce consistency, expire relevance, weight connections by causal importance, and stay bounded without manual intervention.

If Obsidian is a garden you tend yourself, VEKTOR is a garden that automates based on the season and plants needs. The memory that your agent needs is not a folder of markdown files. It is a living structure that knows what is still true, what has been superseded, and what you care about right now. That is what v1.6.3 delivers.

Staggered ingestion: memory that does not flood the DB

Large initial syncs are throttled to 200 items per run with a 5ms stagger between writes. Source budgets enforce per-connector node limits. A sync cursor table ensures subsequent runs start from the last timestamp rather than re-evaluating the same items. The REM cycle completed in 716ms during testing — fast enough to run every six hours in the background without the user noticing.

The numbers

We validated retrieval against the LoCoMo dataset — 419 stored dialog turns, 199 annotated question-answer pairs, retrieval only, no LLM assistance at query time:

VEKTOR Recall@10 (LoCoMo conv 0): 71.9%

GPT-4 with RAG baseline (LoCoMo): 37–42% F1

Human ceiling (LoCoMo): ~88% F1

The gap between 42% and 71.9% is what the four-channel recall pipeline (semantic + BM25 + enriched semantic + HyDE, fused via RRF) delivers over standard RAG. The gap between 71.9% and 88% is the remaining distance to human-level recall. That is the target for the full conversation benchmark currently under development.

And yes, there are other systems that have higher benchmarks, but we are quickly catching up.

Running the benchmark also caught a small production bug: question marks were reaching SQLite’s FTS5 engine as special syntax, silently falling back to semantic-only recall on every conversational query.

Every question ends with a question mark. The fix is one line. Without end-to-end recall testing against real conversational data it would have persisted indefinitely. This is why we test and test again often for every addition and revision.

What this means practically

Most agent memory systems available today are append-only stores with sophisticated retrieval. They get better at finding what you put in. They have no opinion about what should still be there.

The practical consequence of that design, the one developers hit after three to six months of use, is an agent that answers confidently from stale context, contradicts itself across sessions, and surfaces old decisions alongside new ones with equal weight.

v1.6.3 is the management layer that retrieval-only systems do not have. If you are building an agent that needs to work well for months rather than sessions, the primitives are now in the SDK.

v1.6.3

05 Jun 2026 — FadeMem Intelligence Layer · MCP Connectors · Adaptive Decay · Provider-Agnostic LLM · Graph Fix

FadeMem Intelligence Architecture — Layers 0–6

Full implementation of the FadeMem decay architecture (arXiv:2601.18642, Feb 2026) and Adaptive Budgeted Forgetting (arXiv:2604.02280, Apr 2026) into the VEKTOR memory pipeline. To our knowledge the first production SDK implementation of either paper.

Layer 0 — Pre-ingest signal filter (vektor-intake.js): NER/verb density scoring, source trust matrix (15 source types × 4 actor types), bot signature detection. Drops structural noise before any DB write.

Layer 1 — Dual-tier memory (LML/SML): importance_score, memory_layer, strength columns. Initial importance computed from FadeMem formula I = 0.4×rel + 0.3×freq_sat + 0.3×recency after embedding, scored against standing query vectors.

Layer 2 — Adaptive decay (vektor-decay.js): Stretched exponential v(t) = v(0) × exp(-λ × t^β), β=0.8 LML / 1.2 SML. Causal decay suppression via trigger-cached max_child_importance. Access reinforcement with diminishing returns. LML half-life ~11d, SML ~5d.

Layer 3 — Conflict resolution (vektor-conflict.js): Five-verdict AUDN upgrade (COMPATIBLE, CONTRADICTORY, SUBSUMES, SUBSUMED, NO_OP). 2D trust matrix prevents automated sources suppressing human ones.

Layer 4 — Memory fusion (vektor-fusion.js): LLM-guided cluster consolidation during REM cycle. Variance-boosted strength on fused nodes. Source memories moved to cold storage.

Layer 5 — Budgeted pruning (vektor-prune.js): Knapsack pruning with sub-linear token cost sqrt(tokens). Per-source node limits enforced at sync time. Source budget table seeded at migration.

Layer 6 — Additive reranking (vektor-recall-ranked.js): Composite score 0.5×sim + 0.2×strength + 0.15×importance + 0.15×causal_weight applied as final pass after cross-encoder rerank.

Schema Migration 162 — 21 New Migrations

migrate-162.js: importance_score, memory_layer, strength, access_count, last_decay_calc, decay_rate, source_type, actor_type, trust_score, max_child_importance, cold_storage, cold_at. Tables: vektor_cold_storage, vektor_standing_queries, vektor_source_budgets, vektor_sync_cursors, vektor_sync_health. Three SQLite triggers maintaining causal cache on importance changes and edge insert/delete.

MCP Connector Layer

vektor-mcp-reader.js and vektor-connector-base.js: MCP stdio connector pipeline syncing external tools into VEKTOR memory. Filesystem and GitHub connectors added to setup wizard Step 10. GitHub connector uses dedicated fetchGithubItems strategy (list_issues, list_commits, list_pull_requests) with owner/repos from wizard config. Staggered ingestion (5ms between writes, 200-item cap per run). Sync cursor table prevents re-scanning history.

Provider-Agnostic LLM

vektor-llm-provider.js: All 15 wizard providers supported (groq, claude, openai, gemini, mistral, deepseek, together, cohere, xai, minimax, nvidia, perplexity, lmstudio, litellm, ollama). Reads user config — no hardcoded API keys. Replaces Groq hardcoding in vektor-conflict.js, vektor-fusion.js, vektor-standing.js, vektor-sleep.js.

Standing Queries — Auto-Evolving Context

vektor-standing.js: Weekly synthesis from top-15 LML memories via configured LLM provider. Goal statements embedded with local model and stored as vectors. Used as rel component in FadeMem importance scoring for background syncs. 14-day TTL.

Graph Visualisation Fix

vektor-graph-server.js: ns namespace variable undefined in apiGraph() SQL handler caused all graph API calls to return {ok: false, error: "ns is not defined"}. Graph UI showed spinner indefinitely. Fix: extract ns from URL params before SQL clause construction.

REM Cycle

vektor-sleep.js: Orchestrates decay → fusion → prune → standing in sequence. All apiKey guards removed — provider config used instead. REM cycle confirmed at 716ms on 17,523-node graph.

Causal Inference Engine — Four-Phase, Zero Dependencies

Full causal reasoning layer deployed to src/causal/. Node ≥18 required, no external dependencies.

Phase 1 — G-Formula estimator (gformula-estimator.js) — ATE identification and estimation using the G-computation formula over the MAGMA causal graph.

Phase 2 — MSM / IPW estimator (msm-estimator.js) — Marginal structural model estimation via inverse probability weighting, handling time-varying confounders across memory timelines.

Phase 3 — IV Bounds estimator (iv-bounds-estimator.js) — Instrumental variable partial identification bounds (Manski-style) for causal effect estimation when unobserved confounders are present.

Phase 4 — Root Cause Analysis Engine (vektor-rca-engine.js) — Combines all prior phases to build an intervention graph, trace agent failures backwards through the causal chain, score root causes by impact, and predict fix outcomes.

CLI test harness (cli-test.js) ships with --verbose and --phase flags for targeted phase testing. 31 tests passing across all four phases. DeepFlow v2 — Deterministic 8-Step Pipeline

The vektor.mjs deep agent path (deep:true) has been rebuilt as a fully deterministic pipeline, replacing the prior unbounded loop. Pipeline stages: DECOMPOSE → VAULT-FIRST → SWEEP → LOCI → COMMIT → ADVERSARIAL → SYNTHESISE → CRITIC+PATCH. Three new tools added: adversarial_search, loci_rank, and patch. DeerFlow renamed to DeepFlow throughout. The /agent path (deep:false) is unchanged. A full syntax repair pass was applied — BOM removal, optional chaining and nullish coalescing fixes, stray markdown commented out.

JOT Collab — Two-Pass Article Generation

Groq LLaMA two-pass generation system integrated into the JOT SDK: rate-limit handling with automatic backoff, API key rotation across multiple Groq keys, APA7 citation infrastructure, and a post-generation citation scanner. Full bug audit of four core JOT files with critical fixes applied via fix-criticals.js. JOT v1.5.x additions also included: TAG pill and /api/ai/transform tag prompt (v1.5.2), notes RAG wired into /api/memory/think, vektor ask libuv Windows assertion crash resolved (v1.5.7), and lightbulb indicator overlap fix (v1.5.8).

Download Server — Version Mount Fix

The licence-gated download endpoint was serving vektor-slipstream-1.5.8.tgz despite the tarball at ~/downloads/ and ~/vektor-monorepo/releases/ being updated to v1.6.3. Root cause: PM2 bakes environment variables into the process at launch time. dotenv does not override variables already present in process.env, so updating .env and running pm2 restart --update-env both silently preserved the stale VERSION_SLIPSTREAM=1.5.8 value. Fix: delete the PM2 process and re-register with the version passed explicitly at start time, then pm2 save to persist. Affected service: vektor-server (vektor-monorepo).

better-sqlite3 — Bundled Binary (Windows) better-sqlite3 moved from optionalDependencies to dependencies with a pre-built Windows binary bundled under bundled/better-sqlite3/build/Release/. Eliminates the npm rebuild requirement on Windows installs where native build toolchains are absent. The uses process.chdir() before requiring the native module so the relative path resolution is correct regardless of working directory. postinstall.js silently skips the rebuild step when the bundled binary is present.

sqlite-vec — ANN Recall Wired

sqlite-vec upgraded to ^0.1.9. The vec_memories virtual table schema is now created on DB init and the write path stores quantized float32 vectors alongside the BM25 FTS5 index. Recall falls back gracefully to cosine scan if sqlite-vec fails to load (e.g. architecture mismatch). ANN nearest-neighbour swap replaces full cosine scan for large graphs (>5,000 memories), reducing p95 recall latency by ~60%.

MAGMA Graph — vektor_status and vektor_related Tools Two new MCP tools shipped in the CLOAK layer:

vektor_status — lightweight memory health check returning memory count, namespace, last store timestamp, and embedder mode. Designed for session auto-probe without triggering a full recall pass.

vektor_related — traverses memory graph edges for a specific memory ID, returning typed neighbours (semantic / causal / temporal / entity) up to N hops. Replaces manual memory.graph() calls in agentic workflows.

Bug Fix — Percept isOnTopic Threshold

The Percept Chat Layer was firing topic-match hints too aggressively. The isOnTopic cosine score threshold was lowered from 0.35 to 0.25, reducing false-positive interruptions during tangential conversation turns. Affected module: vektor-percept-chat.js.

Bug Fix — vektor rem (memory.dream() removed) The npx vektor rem CLI command was calling memory.dream(), a method removed in v1.5.4. The command now uses memory.stats() to retrieve fragment counts and memory.recall() to seed the compression pass, matching the current API surface. Affected module: vektor.mjs.

Infrastructure — GUI API Proxy Routes

Relative /api/memory/* calls from vektor-graph-ui.html were hitting the wrong server when the GUI was served from a non-default port. Proxy routes added to the local graph server so all /api/memory/think and /api/memory/remember calls resolve correctly regardless of serving context. Affected module: vektor-graph-server.js.

VEKTOR Slipstream is available at vektormemory.com. The Vex migration tool exports memory graphs to .vmig.jsonl with connectors for Pinecone, Qdrant, Chroma, Weaviate, pgvector, and VEKTOR. Local-first and sovereign by design.

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/your-ai-agent-craves…] indexed:0 read:15min 2026-06-04 ·