Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Zayd Mulani released Mnemo, an open-source, local-first AI memory layer that builds a persistent knowledge graph from conversations using SQLite and petgraph. The sidecar service extracts named entities and relationships via any LLM, stores them locally, and injects relevant context into future prompts in under 50 milliseconds. Mnemo operates as a single static binary with zero cloud dependency, supporting Ollama, OpenAI, Anthropic, or any OpenAI-compatible API.

Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. Most LLMs forget everything the moment a conversation ends. mnemo fixes that. mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama fully local, free , OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency. your app │ ▼ POST /ingest ──► entity extraction LLM ──► knowledge graph SQLite + petgraph │ POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search │ ▼ context prompt ──► inject into your LLM prompt - You POST raw text to /ingest a conversation turn, a document, a note . - mnemo sends it to your configured LLM and extracts entities people, tools, places, concepts and the relationships between them. - Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically. - On POST /retrieve , mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion BFS over the knowledge graph → relation filter → score+rank → assemble a context prompt string. - You inject context prompt into your LLM's system prompt. Done. git clone https://github.com/zaydmulani09/mnemo cd mnemo docker compose up -d Pull the llama3 model the first time ~4 GB docker exec mnemo-ollama ollama pull llama3 Verify everything is healthy curl http://localhost:8080/health cargo install --path crates/mnemo-api With Ollama export MNEMO LLM BASE URL=http://localhost:11434/v1 mnemo-api With OpenAI export MNEMO LLM BASE URL=https://api.openai.com/v1 export MNEMO LLM API KEY=sk-... export MNEMO LLM MODEL=gpt-4o-mini export MNEMO LLM PROVIDER=openai mnemo-api pip install mnemo-sdk python from mnemo import MnemoClient client = MnemoClient server at http://localhost:8080 Store a memory client.ingest "I'm building a Rust vector database called vecdb" Get context for injection into your next LLM prompt print client.get context "what am I working on?" All endpoints accept and return application/json . Base URL: http://localhost:8080 . | Method | Path | Description | Request body | Response | |---|---|---|---|---| GET | /health | Server + DB + LLM status | — | HealthResponse | POST | /ingest | Store text, extract entities | IngestRequest | IngestResponse | POST | /retrieve | Retrieve ranked memory context | RetrievalQuery | RetrievalResult | GET | /entities | List entities paginated | ?limit&offset | Entity | GET | /entities/:id | Get entity by UUID | — | Entity | DELETE | /entities/:id | Delete entity cascades | — | {"deleted":true} | GET | /entities/:id/neighbors | Knowledge graph neighbors | ?depth max 5 | GraphNode | GET | /chunks | List memory chunks paginated | ?limit&offset&session id | MemoryChunk | GET | /chunks/:id | Get chunk by UUID | — | MemoryChunk | DELETE | /chunks/:id | Delete chunk | — | {"deleted":true} | POST | /search | Full-text search entities + chunks | {"query","limit"} | {"entities","chunks"} | DELETE | /wipe | Delete all memory irreversible | header: X-Confirm-Wipe: true | {"wiped":true} | GET | /stats | Entity/chunk/graph counts + uptime | — | StatsResponse | Key request/response types: // IngestRequest { "content": "string", // required — text to store "source": "string", // required — e.g. "chat", "email", "cli" "session id": "string|null", // optional — group related chunks "metadata": {} // optional — arbitrary JSON } // RetrievalQuery { "text": "string", // required — query text "session id": "string|null", // optional — filter by session "max chunks": 10, // default 10 "max entities": 20, // default 20 "min confidence": 0.5, // default 0.5 "include graph": true, // default true — expand via knowledge graph "graph depth": 2 // default 2 — BFS depth for graph expansion } Full endpoint documentation with curl examples: docs/api.md | Variable | Default | Description | |---|---|---| MNEMO DB PATH | mnemo.db | SQLite database file path | MNEMO PORT | 8080 | API server port | MNEMO LLM BASE URL | http://localhost:11434/v1 | OpenAI-compatible LLM base URL | MNEMO LLM MODEL | llama3 | Model name for entity extraction | MNEMO LLM API KEY | ollama | API key any value works for Ollama | MNEMO LLM PROVIDER | ollama | Provider type: ollama , openai , anthropic , custom | Pass --config path/to/config.toml to mnemo-api . See mnemo.example.toml : db path = "mnemo.db" port = 8080 llm provider = "ollama" base url = "http://localhost:11434/v1" model = "llama3" api key = "ollama" timeout secs = 30 max retries = 3 max tokens = 2048 temperature = 0.1 Environment variables take precedence over TOML values. The active config source is reported in GET /health → config source . Install: cargo install --path crates/mnemo-cli Usage: Store a memory mnemo ingest "I use Neovim and prefer dark mode" Retrieve relevant context mnemo search "what editor do I use?" List all extracted entities mnemo entities Show entity detail + graph neighbors mnemo entity <uuid --neighbors List memory chunks mnemo chunks Server health mnemo health Memory statistics mnemo stats Delete everything prompts for confirmation mnemo wipe Skip confirmation prompt mnemo wipe --yes Point at a non-default server mnemo --server http://192.168.1.10:8080 stats Install: pip install mnemo-sdk See sdk/python/README.md /zaydmulani09/mnemo/blob/main/sdk/python/README.md for the full API reference. Async example: python import asyncio from mnemo import AsyncMnemoClient async def main : async with AsyncMnemoClient as client: await client.ingest "Alice is a principal engineer at Stripe working on payment infrastructure.", session id="session-001", context = await client.get context "what does Alice work on?", session id="session-001", print context asyncio.run main A working standalone example: examples/basic usage.py Four Rust crates wired together: | Crate | Type | Role | |---|---|---| mnemo-core | lib | Entity extraction, graph ops, retrieval engine, DB layer | mnemo-api | bin | Axum REST API — thin handler layer over mnemo-core | mnemo-cli | bin | CLI tool using blocking reqwest against the API | mnemo-bench | bin | Performance benchmarks 12 suites | Full architecture documentation: docs/architecture.md Benchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build --release is 3–5× faster. | Operation | Avg latency | Throughput | |---|---|---| | Entity insert SQLite | ~0.12 ms | ~8,300 ops/s | | Entity lookup by ID | ~0.08 ms | ~12,500 ops/s | | Chunk insert | ~0.14 ms | ~7,100 ops/s | | Full-text chunk search | ~0.28 ms | ~3,500 ops/s | | Graph neighbor depth=1 | ~0.21 ms | ~4,700 ops/s | | Graph neighbor depth=2 | ~0.89 ms | ~1,100 ops/s | | Full retrieval pipeline | ~4.2 ms | ~238 ops/s | Run cargo run -p mnemo-bench to benchmark on your hardware. cargo test --workspace run all 122 tests make coverage HTML coverage report requires cargo-llvm-cov make coverage-summary summary to stdout cd sdk/python && pytest tests/ -v cargo run -p mnemo-bench all 12 benchmarks cargo run -p mnemo-bench -- --filter graph graph benchmarks only cargo run -p mnemo-bench -- --json out.json save results to JSON Current test counts: 122 Rust tests · 21 Python tests · 12 benchmarks PRs welcome. Please run make fmt && make lint before submitting. Open an issue first for large changes. See CONTRIBUTING.md /zaydmulani09/mnemo/blob/main/CONTRIBUTING.md for full setup instructions, code style guide, and how to add a new LLM provider. MIT — see LICENSE /zaydmulani09/mnemo/blob/main/LICENSE