Neonmem 0.9.7 is out.

Neonmem 0.9.7 introduces a two-level importer that separates folders and files into a searchable knowledge pool and agent chats into typed memories, using IBM Granite-30M embeddings via ONNX Runtime for offline, grounded recall. The update also adds persistent tags, deduplication, and optional AES-256-GCM encryption, all running locally without cloud dependencies.

1. A two-level importer — two kinds of "stuff," treated differently The big change. Your project doesn't come in one shape, so the importer no longer flattens it into one pile: - Folders & files → a searchable knowledge pool. Your docs, code and notes are vectorised into a lossless, deduplicated facts pool — the same fact stated three ways becomes one fact, every source kept. Nothing is summarised away. - Agent chats → typed memories. Point Neonmem at a Claude or other agent transcript and it pulls out only what's worth keeping — the decisions, dead-ends and rules — as clean, typed memories. A decision is stored as a decision; a dead-end stays a warning. The process-narration "I read the file…", "please check…" is dropped. - Links become knowledge. If a chat references a file on disk, that file is pulled into the pool automatically, with a memory that points back to it. The result is labelled honestly in the UI: Facts loaded the pool and Memories created the kept decisions . 2. Grounded, offline recall 0.9.7 replaces the old embedder with IBM Granite-30M , run as a fused fp16 ONNX graph through ONNX Runtime : - Database-class retrieval quality on any CPU — no GPU, no PyTorch, no API key, no cloud. - Every prompt walks memory in order — reflexes → short-term → long-term → facts pool — and answers from what you actually imported, or honestly says it doesn't know . This is the headline behaviour: ask "what is ARC?" and you get your definition from your docs — not the textbook expansion the model would otherwise guess. A memory that's occasionally wrong is worse than no memory at all, so the rule is: answer from the user's sources, or abstain. Never invent. 3. Tags that stick Tag an import with a topic e.g. Specific API and Neonmem mints one clean, canonical memory for it, linked back to the source — even when your docs never write the term verbatim, as long as they clearly describe it. If the corpus genuinely has nothing on a tag, it's left out rather than faked. 4. Clean by construction Memories follow one golden rule : a single concise statement ARC — your provisioning platform linked to the full source, not a messy pile of raw chunks. Chat capture deduplicates through the same facts layer, so re-importing a conversation never doubles up. 5. One durable cartridge - The importer keeps the full source corpus inside the cartridge content-addressed + compressed — one file replaces the scattered docs and transcripts, and the facts are always rebuildable from ground truth. - Opt-in AES-256-GCM encryption at rest — your whole corpus as a private vault. - Imported knowledge is long-term and survives reopening the project. Built on all open, permissively licensed Embeddings: IBM Granite-30M Apache-2.0 via ONNX Runtime MIT . Vector search: FAISS MIT . Agent integration: the Model Context Protocol . Full attributions ship with every download. No third-party LLM, nothing phones home. Get it Windows signed installer + portable and Linux AppImage ; macOS on the way. Local, private, and free for personal use. → neonmem.com https://neonmem.com Import a project, then ask it the one thing your assistant always gets confidently wrong about your codebase. That question is the whole test.