Neonmem 0.9.7 is out.

wpnews.pro

cd /news/developer-tools/neonmem-0-9-7-is-out · home › topics › developer-tools › article

[ARTICLE · art-38097] src=dev.to ↗ pub=2026-06-24T16:34Z topic=developer-tools verified=true sentiment=↑ positive

Neonmem 0.9.7 is out.

Neonmem 0.9.7 introduces a two-level importer that separates folders and files into a searchable knowledge pool and agent chats into typed memories, using IBM Granite-30M embeddings via ONNX Runtime for offline, grounded recall. The update also adds persistent tags, deduplication, and optional AES-256-GCM encryption, all running locally without cloud dependencies.

read3 min views1 publishedJun 24, 2026

#

A two-level importer — two kinds of "stuff," treated differently

The big change. Your project doesn't come in one shape, so the importer no longer flattens

it into one pile:

Folders & files → a searchable knowledge pool. Your docs, code and notes are vectorised into a lossless, deduplicated facts pool — the same fact stated three ways becomes one fact, every source kept. Nothing is summarised away. #

Agent chats → typed memories. Point Neonmem at a Claude (or other agent) transcript and it pulls out only what's worth keeping — the decisions, dead-ends and rules — as clean, typed memories. A decision is stored as a decision; a dead-end stays a warning. The process-narration ("I read the file…", "please check…") is dropped. #

Links become knowledge. If a chat references a file on disk, that file is pulled into the pool automatically, with a memory that points back to it.

The result is labelled honestly in the UI: Facts loaded (the pool) and

Memories created (the kept decisions).

#

Grounded, offline recall

0.9.7 replaces the old embedder with IBM Granite-30M, run as a fused fp16 ONNX graph

through ONNX Runtime:

Database-class retrieval quality on any CPU — no GPU, no PyTorch, no API key, no cloud.
Every prompt walks memory in order — reflexes → short-term → long-term → facts pool — and answers from what you actually imported, or honestly says it doesn't know.

This is the headline behaviour: ask "what is ARC?" and you get your definition from

your docs — not the textbook expansion the model would otherwise guess. A memory that's

occasionally wrong is worse than no memory at all, so the rule is: answer from the user's

sources, or abstain. Never invent.

#

Tags that stick

Tag an import with a topic (e.g. Specific API

) and Neonmem mints one clean, canonical

memory for it, linked back to the source — even when your docs never write the term

verbatim, as long as they clearly describe it. If the corpus genuinely has nothing on a

tag, it's left out rather than faked.

#

Clean by construction

Memories follow one golden rule: a single concise statement (ARC — your provisioning

platform

) linked to the full source, not a messy pile of raw chunks. Chat capture

deduplicates through the same facts layer, so re-importing a conversation never doubles up.

#

One durable cartridge

The importer keeps the full source corpus inside the cartridge (content-addressed + compressed) — one file replaces the scattered docs and transcripts, and the facts are always rebuildable from ground truth.

Opt-in AES-256-GCM encryption at rest — your whole corpus as a private vault.

Imported knowledge is long-term and survives reopening the project.

#

Built on (all open, permissively licensed)

Embeddings: IBM Granite-30M (Apache-2.0) via ONNX Runtime (MIT). Vector search: FAISS (MIT). Agent integration: the Model Context Protocol. Full attributions ship

with every download. No third-party LLM, nothing phones home.

#

Get it

Windows (signed installer + portable) and Linux (AppImage); macOS on the way. Local, private, and free for personal use.

→ neonmem.com Import a project, then ask it the one thing your assistant always gets confidently wrong

about your codebase. That question is the whole test.

source & further reading

dev.to — original article Three packages claim 'SkillsGuard'. One shipped malware. Klein Blue scores Lc -12 as text — here's the two-slot fix How I Use AI Councils to Solve Ambiguous Engineering Problems

~/api · this article 200

$curl api.wpnews.pro/v1/news/neonmem-0-9-7-is-out

Read original on dev.to → dev.to/neonmem_dev/neonmem-097-is-out-59cp

mentioned entities

Neonmem

IBM Granite-30M

ONNX Runtime

FAISS

Model Context Protocol

Claude

metadata

slugneonmem-0-9-7-is-out

topic#developer-tools

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← prevSentrup – AI Customer Support Pl…

next →Walmart truckers are getting a s…

── more in #developer-tools 4 stories · sorted by recency

thenextweb.com · 24 Jun · #developer-tools

Your AI coding tools may soon cost more than you

thenextweb.com · 24 Jun · #developer-tools

Mistral OCR 4 targets the enterprise back office

fastly.com · 24 Jun · #developer-tools

Using the Gini Coefficient to Plan Edge Capacity

dev.to · 24 Jun · #developer-tools

How AI Is Actually Being Used in Healthcare Systems Right Now

── more on @neonmem 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 22 Jun · #large-language-models

MCP vs Skills: Why Skills Save Context Tokens

wpnews · 22 Jun · #artificial-intelligence

Value for Money Is All You Need

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required