# Neonmem 0.9.7 is out.

> Source: <https://dev.to/neonmem_dev/neonmem-097-is-out-59cp>
> Published: 2026-06-24 16:34:22+00:00

##
1. A two-level importer — two kinds of "stuff," treated differently

The big change. Your project doesn't come in one shape, so the importer no longer flattens

it into one pile:

-
**Folders & files → a searchable knowledge pool.** Your docs, code and notes are
vectorised into a lossless, deduplicated facts pool — the same fact stated three ways
becomes one fact, every source kept. Nothing is summarised away.
-
**Agent chats → typed memories.** Point Neonmem at a Claude (or other agent) transcript
and it pulls out only what's worth keeping — the **decisions, dead-ends and rules** — as
clean, typed memories. A decision is stored as a decision; a dead-end stays a warning.
The process-narration ("I read the file…", "please check…") is dropped.
-
**Links become knowledge.** If a chat references a file on disk, that file is pulled into
the pool automatically, with a memory that points back to it.

The result is labelled honestly in the UI: **Facts loaded** (the pool) and

**Memories created** (the kept decisions).

##
2. Grounded, offline recall

0.9.7 replaces the old embedder with **IBM Granite-30M**, run as a fused **fp16 ONNX** graph

through **ONNX Runtime**:

- Database-class retrieval quality on
**any CPU** — no GPU, no PyTorch, no API key, no cloud.
- Every prompt walks memory in order —
**reflexes → short-term → long-term → facts pool** —
and answers from what you actually imported, or **honestly says it doesn't know**.

This is the headline behaviour: ask *"what is ARC?"* and you get **your** definition from

**your** docs — not the textbook expansion the model would otherwise guess. A memory that's

occasionally wrong is worse than no memory at all, so the rule is: answer from the user's

sources, or abstain. Never invent.

##
3. Tags that stick

Tag an import with a topic (e.g. `Specific API`

) and Neonmem mints **one clean, canonical**

memory for it, linked back to the source — *even when your docs never write the term*

verbatim, as long as they clearly describe it. If the corpus genuinely has nothing on a

tag, it's left out rather than faked.

##
4. Clean by construction

Memories follow one **golden rule**: a single concise statement (`ARC — your provisioning`

platform

) linked to the full source, not a messy pile of raw chunks. Chat capture

deduplicates through the same facts layer, so re-importing a conversation never doubles up.

##
5. One durable cartridge

- The importer keeps the
**full source corpus** inside the cartridge (content-addressed +
compressed) — one file replaces the scattered docs and transcripts, and the facts are
always rebuildable from ground truth.
-
**Opt-in AES-256-GCM encryption at rest** — your whole corpus as a private vault.
- Imported knowledge is long-term and
**survives reopening** the project.

##
Built on (all open, permissively licensed)

Embeddings: **IBM Granite-30M** (Apache-2.0) via **ONNX Runtime** (MIT). Vector search:

**FAISS** (MIT). Agent integration: the **Model Context Protocol**. Full attributions ship

with every download. No third-party LLM, nothing phones home.

##
Get it

Windows (signed installer + portable) and Linux (AppImage); macOS on the way. Local,

private, and free for personal use.

→ [neonmem.com](https://neonmem.com)

*Import a project, then ask it the one thing your assistant always gets confidently wrong*

about your codebase. That question is the whole test.