{"slug": "show-hn-mnemo-local-first-ai-memory-layer-for-any-llm-rust-sqlite-petgraph", "title": "Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)", "summary": "Zayd Mulani released Mnemo, an open-source, local-first AI memory layer that builds a persistent knowledge graph from conversations using SQLite and petgraph. The sidecar service extracts named entities and relationships via any LLM, stores them locally, and injects relevant context into future prompts in under 50 milliseconds. Mnemo operates as a single static binary with zero cloud dependency, supporting Ollama, OpenAI, Anthropic, or any OpenAI-compatible API.", "body_md": "Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required.\n\nMost LLMs forget everything the moment a conversation ends. mnemo fixes that.\n\nmnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with **Ollama** (fully local, free), OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency.\n\n```\n  your app\n     │\n     ▼\n  POST /ingest ──► entity extraction (LLM) ──► knowledge graph (SQLite + petgraph)\n                                                        │\n  POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search\n     │\n     ▼\n  context_prompt  ──► inject into your LLM prompt\n```\n\n- You POST raw text to\n`/ingest`\n\n(a conversation turn, a document, a note). - mnemo sends it to your configured LLM and extracts entities (people, tools, places, concepts) and the relationships between them.\n- Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically.\n- On POST\n`/retrieve`\n\n, mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion (BFS over the knowledge graph) → relation filter → score+rank → assemble a`context_prompt`\n\nstring. - You inject\n`context_prompt`\n\ninto your LLM's system prompt. Done.\n\n```\ngit clone https://github.com/zaydmulani09/mnemo\ncd mnemo\ndocker compose up -d\n\n# Pull the llama3 model the first time (~4 GB)\ndocker exec mnemo-ollama ollama pull llama3\n\n# Verify everything is healthy\ncurl http://localhost:8080/health\ncargo install --path crates/mnemo-api\n\n# With Ollama\nexport MNEMO_LLM_BASE_URL=http://localhost:11434/v1\nmnemo-api\n\n# With OpenAI\nexport MNEMO_LLM_BASE_URL=https://api.openai.com/v1\nexport MNEMO_LLM_API_KEY=sk-...\nexport MNEMO_LLM_MODEL=gpt-4o-mini\nexport MNEMO_LLM_PROVIDER=openai\nmnemo-api\npip install mnemo-sdk\npython\nfrom mnemo import MnemoClient\n\nclient = MnemoClient()  # server at http://localhost:8080\n\n# Store a memory\nclient.ingest(\"I'm building a Rust vector database called vecdb\")\n\n# Get context for injection into your next LLM prompt\nprint(client.get_context(\"what am I working on?\"))\n```\n\nAll endpoints accept and return `application/json`\n\n. Base URL: `http://localhost:8080`\n\n.\n\n| Method | Path | Description | Request body | Response |\n|---|---|---|---|---|\n`GET` |\n`/health` |\nServer + DB + LLM status | — | `HealthResponse` |\n`POST` |\n`/ingest` |\nStore text, extract entities | `IngestRequest` |\n`IngestResponse` |\n`POST` |\n`/retrieve` |\nRetrieve ranked memory context | `RetrievalQuery` |\n`RetrievalResult` |\n`GET` |\n`/entities` |\nList entities (paginated) | `?limit&offset` |\n`Entity[]` |\n`GET` |\n`/entities/:id` |\nGet entity by UUID | — | `Entity` |\n`DELETE` |\n`/entities/:id` |\nDelete entity (cascades) | — | `{\"deleted\":true}` |\n`GET` |\n`/entities/:id/neighbors` |\nKnowledge graph neighbors | `?depth` (max 5) |\n`GraphNode[]` |\n`GET` |\n`/chunks` |\nList memory chunks (paginated) | `?limit&offset&session_id` |\n`MemoryChunk[]` |\n`GET` |\n`/chunks/:id` |\nGet chunk by UUID | — | `MemoryChunk` |\n`DELETE` |\n`/chunks/:id` |\nDelete chunk | — | `{\"deleted\":true}` |\n`POST` |\n`/search` |\nFull-text search entities + chunks | `{\"query\",\"limit\"}` |\n`{\"entities\",\"chunks\"}` |\n`DELETE` |\n`/wipe` |\nDelete all memory (irreversible) | header: `X-Confirm-Wipe: true` |\n`{\"wiped\":true}` |\n`GET` |\n`/stats` |\nEntity/chunk/graph counts + uptime | — | `StatsResponse` |\n\n**Key request/response types:**\n\n```\n// IngestRequest\n{\n  \"content\": \"string\",         // required — text to store\n  \"source\":  \"string\",         // required — e.g. \"chat\", \"email\", \"cli\"\n  \"session_id\": \"string|null\", // optional — group related chunks\n  \"metadata\": {}               // optional — arbitrary JSON\n}\n\n// RetrievalQuery\n{\n  \"text\": \"string\",            // required — query text\n  \"session_id\": \"string|null\", // optional — filter by session\n  \"max_chunks\": 10,            // default 10\n  \"max_entities\": 20,          // default 20\n  \"min_confidence\": 0.5,       // default 0.5\n  \"include_graph\": true,       // default true — expand via knowledge graph\n  \"graph_depth\": 2             // default 2 — BFS depth for graph expansion\n}\n```\n\nFull endpoint documentation with curl examples: `docs/api.md`\n\n| Variable | Default | Description |\n|---|---|---|\n`MNEMO_DB_PATH` |\n`mnemo.db` |\nSQLite database file path |\n`MNEMO_PORT` |\n`8080` |\nAPI server port |\n`MNEMO_LLM_BASE_URL` |\n`http://localhost:11434/v1` |\nOpenAI-compatible LLM base URL |\n`MNEMO_LLM_MODEL` |\n`llama3` |\nModel name for entity extraction |\n`MNEMO_LLM_API_KEY` |\n`ollama` |\nAPI key (any value works for Ollama) |\n`MNEMO_LLM_PROVIDER` |\n`ollama` |\nProvider type: `ollama` , `openai` , `anthropic` , `custom` |\n\nPass `--config path/to/config.toml`\n\nto `mnemo-api`\n\n. See `mnemo.example.toml`\n\n:\n\n```\ndb_path = \"mnemo.db\"\nport = 8080\n\n[llm]\nprovider = \"ollama\"\nbase_url = \"http://localhost:11434/v1\"\nmodel = \"llama3\"\napi_key = \"ollama\"\ntimeout_secs = 30\nmax_retries = 3\nmax_tokens = 2048\ntemperature = 0.1\n```\n\nEnvironment variables take precedence over TOML values. The active config source is reported in `GET /health`\n\n→ `config_source`\n\n.\n\nInstall:\n\n```\ncargo install --path crates/mnemo-cli\n```\n\nUsage:\n\n```\n# Store a memory\nmnemo ingest \"I use Neovim and prefer dark mode\"\n\n# Retrieve relevant context\nmnemo search \"what editor do I use?\"\n\n# List all extracted entities\nmnemo entities\n\n# Show entity detail + graph neighbors\nmnemo entity <uuid> --neighbors\n\n# List memory chunks\nmnemo chunks\n\n# Server health\nmnemo health\n\n# Memory statistics\nmnemo stats\n\n# Delete everything (prompts for confirmation)\nmnemo wipe\n\n# Skip confirmation prompt\nmnemo wipe --yes\n\n# Point at a non-default server\nmnemo --server http://192.168.1.10:8080 stats\n```\n\nInstall:\n\n```\npip install mnemo-sdk\n```\n\nSee [ sdk/python/README.md](/zaydmulani09/mnemo/blob/main/sdk/python/README.md) for the full API reference.\n\n**Async example:**\n\n``` python\nimport asyncio\nfrom mnemo import AsyncMnemoClient\n\nasync def main():\n    async with AsyncMnemoClient() as client:\n        await client.ingest(\n            \"Alice is a principal engineer at Stripe working on payment infrastructure.\",\n            session_id=\"session-001\",\n        )\n        context = await client.get_context(\n            \"what does Alice work on?\",\n            session_id=\"session-001\",\n        )\n        print(context)\n\nasyncio.run(main())\n```\n\nA working standalone example: `examples/basic_usage.py`\n\nFour Rust crates wired together:\n\n| Crate | Type | Role |\n|---|---|---|\n`mnemo-core` |\nlib | Entity extraction, graph ops, retrieval engine, DB layer |\n`mnemo-api` |\nbin | Axum REST API — thin handler layer over mnemo-core |\n`mnemo-cli` |\nbin | CLI tool using blocking reqwest against the API |\n`mnemo-bench` |\nbin | Performance benchmarks (12 suites) |\n\nFull architecture documentation: `docs/architecture.md`\n\nBenchmarked on Apple M2, SQLite WAL mode, in-memory petgraph. Debug build numbers — release build (`--release`\n\n) is 3–5× faster.\n\n| Operation | Avg latency | Throughput |\n|---|---|---|\n| Entity insert (SQLite) | ~0.12 ms | ~8,300 ops/s |\n| Entity lookup by ID | ~0.08 ms | ~12,500 ops/s |\n| Chunk insert | ~0.14 ms | ~7,100 ops/s |\n| Full-text chunk search | ~0.28 ms | ~3,500 ops/s |\n| Graph neighbor (depth=1) | ~0.21 ms | ~4,700 ops/s |\n| Graph neighbor (depth=2) | ~0.89 ms | ~1,100 ops/s |\n| Full retrieval pipeline | ~4.2 ms | ~238 ops/s |\n\nRun `cargo run -p mnemo-bench`\n\nto benchmark on your hardware.\n\n```\ncargo test --workspace          # run all 122 tests\nmake coverage                  # HTML coverage report (requires cargo-llvm-cov)\nmake coverage-summary          # summary to stdout\ncd sdk/python && pytest tests/ -v\ncargo run -p mnemo-bench                    # all 12 benchmarks\ncargo run -p mnemo-bench -- --filter graph  # graph benchmarks only\ncargo run -p mnemo-bench -- --json out.json # save results to JSON\n```\n\nCurrent test counts: **122 Rust tests** · **21 Python tests** · **12 benchmarks**\n\nPRs welcome. Please run `make fmt && make lint`\n\nbefore submitting.\nOpen an issue first for large changes.\n\nSee [ CONTRIBUTING.md](/zaydmulani09/mnemo/blob/main/CONTRIBUTING.md) for full setup instructions, code style guide, and how to add a new LLM provider.\n\nMIT — see [LICENSE](/zaydmulani09/mnemo/blob/main/LICENSE)", "url": "https://wpnews.pro/news/show-hn-mnemo-local-first-ai-memory-layer-for-any-llm-rust-sqlite-petgraph", "canonical_source": "https://github.com/zaydmulani09/mnemo", "published_at": "2026-06-03 20:32:10+00:00", "updated_at": "2026-06-03 20:49:02.594459+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "large-language-models", "ai-products", "ai-agents"], "entities": ["Mnemo", "Ollama", "OpenAI", "Anthropic", "SQLite", "petgraph"], "alternates": {"html": "https://wpnews.pro/news/show-hn-mnemo-local-first-ai-memory-layer-for-any-llm-rust-sqlite-petgraph", "markdown": "https://wpnews.pro/news/show-hn-mnemo-local-first-ai-memory-layer-for-any-llm-rust-sqlite-petgraph.md", "text": "https://wpnews.pro/news/show-hn-mnemo-local-first-ai-memory-layer-for-any-llm-rust-sqlite-petgraph.txt", "jsonld": "https://wpnews.pro/news/show-hn-mnemo-local-first-ai-memory-layer-for-any-llm-rust-sqlite-petgraph.jsonld"}}