cd /news/ai-tools/show-hn-mnemo-local-first-ai-memory-… · home topics ai-tools article
[ARTICLE · art-20809] src=github.com pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Zayd Mulani released Mnemo, an open-source, local-first AI memory layer that builds a persistent knowledge graph from conversations using SQLite and petgraph. The sidecar service extracts named entities and relationships via any LLM, stores them locally, and injects relevant context into future prompts in under 50 milliseconds. Mnemo operates as a single static binary with zero cloud dependency, supporting Ollama, OpenAI, Anthropic, or any OpenAI-compatible API.

read6 min publishedJun 3, 2026

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required.

Most LLMs forget everything the moment a conversation ends. mnemo fixes that.

mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency.

  your app
     │
     ▼
  POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph)
                                                        │
  POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search
     │
     ▼
  context_prompt  ──► inject into your LLM prompt
  • You POST raw text to /ingest

(a conversation turn, a document, a note). - mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them.

  • Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically.
  • On POST /retrieve

, mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble acontext_prompt

string. - You inject context_prompt

into your LLM's system prompt. Done.

git clone https://github.com/zaydmulani09/mnemo
cd mnemo
docker compose up -d

docker exec mnemo-ollama ollama pull llama3

curl http://localhost:8080/health
cargo install --path crates/mnemo-api

export MNEMO_LLM_BASE_URL=http://localhost:11434/v1
mnemo-api

export MNEMO_LLM_BASE_URL=https://api.openai.com/v1
export MNEMO_LLM_API_KEY=sk-...
export MNEMO_LLM_MODEL=gpt-4o-mini
export MNEMO_LLM_PROVIDER=openai
mnemo-api
pip install mnemo-sdk
python
from mnemo import MnemoClient

client = MnemoClient()  # server at http://localhost:8080

client.ingest("I'm building a Rust vector database called vecdb")

print(client.get_context("what am I working on?"))

All endpoints accept and return application/json

. Base URL: http://localhost:8080

.

Method Path Description Request body Response
GET
/health
Server + DB + LLM status HealthResponse
POST
/ingest
Store text, extract entities IngestRequest
IngestResponse
POST
/retrieve
Retrieve ranked memory context RetrievalQuery
RetrievalResult
GET
/entities
List entities (paginated) ?limit&offset
Entity[]
GET
/entities/:id
Get entity by UUID Entity
DELETE
/entities/:id
Delete entity (cascades) {"deleted":true}
GET
/entities/:id/neighbors
Knowledge graph neighbors ?depth (max 5)
GraphNode[]
GET
/chunks
List memory chunks (paginated) ?limit&offset&session_id
MemoryChunk[]
GET
/chunks/:id
Get chunk by UUID MemoryChunk
DELETE
/chunks/:id
Delete chunk {"deleted":true}
POST
/search
Full-text search entities + chunks {"query","limit"}
{"entities","chunks"}
DELETE
/wipe
Delete all memory (irreversible) header: X-Confirm-Wipe: true
{"wiped":true}
GET
/stats
Entity/chunk/graph counts + uptime StatsResponse

Key request/response types:

// IngestRequest
{
  "content": "string",         // required — text to store
  "source":  "string",         // required — e.g. "chat", "email", "cli"
  "session_id": "string|null", // optional — group related chunks
  "metadata": {}               // optional — arbitrary JSON
}

// RetrievalQuery
{
  "text": "string",            // required — query text
  "session_id": "string|null", // optional — filter by session
  "max_chunks": 10,            // default 10
  "max_entities": 20,          // default 20
  "min_confidence": 0.5,       // default 0.5
  "include_graph": true,       // default true — expand via knowledge graph
  "graph_depth": 2             // default 2 — BFS depth for graph expansion
}

Full endpoint documentation with curl examples: docs/api.md

Variable Default Description
MNEMO_DB_PATH
mnemo.db
SQLite database file path
MNEMO_PORT
8080
API server port
MNEMO_LLM_BASE_URL
http://localhost:11434/v1
OpenAI-compatible LLM base URL
MNEMO_LLM_MODEL
llama3
Model name for entity extraction
MNEMO_LLM_API_KEY
ollama
API key (any value works for Ollama)
MNEMO_LLM_PROVIDER
ollama
Provider type: ollama , openai , anthropic , custom

Pass --config path/to/config.toml

to mnemo-api

. See mnemo.example.toml

:

db_path = "mnemo.db"
port = 8080

[llm]
provider = "ollama"
base_url = "http://localhost:11434/v1"
model = "llama3"
api_key = "ollama"
timeout_secs = 30
max_retries = 3
max_tokens = 2048
temperature = 0.1

Environment variables take precedence over TOML values. The active config source is reported in GET /health

config_source

.

Install:

cargo install --path crates/mnemo-cli

Usage:

mnemo ingest "I use Neovim and prefer dark mode"

mnemo search "what editor do I use?"

mnemo entities

mnemo entity <uuid> --neighbors

mnemo chunks

mnemo health

mnemo stats

mnemo wipe

mnemo wipe --yes

mnemo --server http://192.168.1.10:8080 stats

Install:

pip install mnemo-sdk

See sdk/python/README.md for the full API reference.

Async example:

import asyncio
from mnemo import AsyncMnemoClient

async def main():
    async with AsyncMnemoClient() as client:
        await client.ingest(
            "Alice is a principal engineer at Stripe working on payment infrastructure.",
            session_id="session-001",
        )
        context = await client.get_context(
            "what does Alice work on?",
            session_id="session-001",
        )
        print(context)

asyncio.run(main())

A working standalone example: examples/basic_usage.py

Four Rust crates wired together:

Crate Type Role
mnemo-core
lib Entity extraction, graph ops, retrieval engine, DB layer
mnemo-api
bin Axum REST API — thin handler layer over mnemo-core
mnemo-cli
bin CLI tool using blocking reqwest against the API
mnemo-bench
bin Performance benchmarks (12 suites)

Full architecture documentation: docs/architecture.md

Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (--release

) is 3–5× faster.

Operation Avg latency Throughput
Entity insert (SQLite) ~0.12 ms ~8,300 ops/s
Entity lookup by ID ~0.08 ms ~12,500 ops/s
Chunk insert ~0.14 ms ~7,100 ops/s
Full-text chunk search ~0.28 ms ~3,500 ops/s
Graph neighbor (depth=1) ~0.21 ms ~4,700 ops/s
Graph neighbor (depth=2) ~0.89 ms ~1,100 ops/s
Full retrieval pipeline ~4.2 ms ~238 ops/s

Run cargo run -p mnemo-bench

to benchmark on your hardware.

cargo test --workspace          # run all 122 tests
make coverage                  # HTML coverage report (requires cargo-llvm-cov)
make coverage-summary          # summary to stdout
cd sdk/python && pytest tests/ -v
cargo run -p mnemo-bench                    # all 12 benchmarks
cargo run -p mnemo-bench -- --filter graph  # graph benchmarks only
cargo run -p mnemo-bench -- --json out.json # save results to JSON

Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks

PRs welcome. Please run make fmt && make lint

before submitting. Open an issue first for large changes.

See CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider.

MIT — see LICENSE

── more in #ai-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-mnemo-local-…] indexed:0 read:6min 2026-06-03 ·