Show HN: Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph) Zayd Mulani released Mnemo, an open-source, local-first AI memory layer that builds a persistent knowledge graph from conversations using SQLite and petgraph. The sidecar service extracts named entities and relationships via any LLM, stores them locally, and injects relevant context into future prompts in under 50 milliseconds. Mnemo operates as a single static binary with zero cloud dependency, supporting Ollama, OpenAI, Anthropic, or any OpenAI-compatible API. Local-first AI memory layer for any LLM. Persistent knowledge graph, entity extraction, semantic retrieval — no cloud required. Most LLMs forget everything the moment a conversation ends. mnemo fixes that. mnemo is a sidecar service that watches every conversation you feed it, extracts named entities and relationships using an LLM, builds a persistent knowledge graph in SQLite, and injects relevant context back into future prompts — automatically, in under 50ms. It works with Ollama fully local, free , OpenAI, Anthropic, or any OpenAI-compatible API. It ships as a single static binary with zero cloud dependency. your app │ ▼ POST /ingest ──► entity extraction LLM ──► knowledge graph SQLite + petgraph │ POST /retrieve ◄── scoring + ranking ◄── graph traversal + full-text search │ ▼ context prompt ──► inject into your LLM prompt - You POST raw text to /ingest a conversation turn, a document, a note . - mnemo sends it to your configured LLM and extracts entities people, tools, places, concepts and the relationships between them. - Entities are deduplicated by name+type, aliases are merged, and everything is written to SQLite. The in-memory petgraph is updated atomically. - On POST /retrieve , mnemo runs a 6-stage pipeline: full-text chunk search → entity name search → graph expansion BFS over the knowledge graph → relation filter → score+rank → assemble a context prompt string. - You inject context prompt into your LLM's system prompt. Done. git clone https://github.com/zaydmulani09/mnemo cd mnemo docker compose up -d Pull the llama3 model the first time ~4 GB docker exec mnemo-ollama ollama pull llama3 Verify everything is healthy curl http://localhost:8080/health cargo install --path crates/mnemo-api With Ollama export MNEMO LLM BASE URL=http://localhost:11434/v1 mnemo-api With OpenAI export MNEMO LLM BASE URL=https://api.openai.com/v1 export MNEMO LLM API KEY=sk-... export MNEMO LLM MODEL=gpt-4o-mini export MNEMO LLM PROVIDER=openai mnemo-api pip install mnemo-sdk python from mnemo import MnemoClient client = MnemoClient server at http://localhost:8080 Store a memory client.ingest "I'm building a Rust vector database called vecdb" Get context for injection into your next LLM prompt print client.get context "what am I working on?" All endpoints accept and return application/json . Base URL: http://localhost:8080 . | Method | Path | Description | Request body | Response | |---|---|---|---|---| GET | /health | Server + DB + LLM status | — | HealthResponse | POST | /ingest | Store text, extract entities | IngestRequest | IngestResponse | POST | /retrieve | Retrieve ranked memory context | RetrievalQuery | RetrievalResult | GET | /entities | List entities paginated | ?limit&offset | Entity | GET | /entities/:id | Get entity by UUID | — | Entity | DELETE | /entities/:id | Delete entity cascades | — | {"deleted":true} | GET | /entities/:id/neighbors | Knowledge graph neighbors | ?depth max 5 | GraphNode | GET | /chunks | List memory chunks paginated | ?limit&offset&session id | MemoryChunk | GET | /chunks/:id | Get chunk by UUID | — | MemoryChunk | DELETE | /chunks/:id | Delete chunk | — | {"deleted":true} | POST | /search | Full-text search entities + chunks | {"query","limit"} | {"entities","chunks"} | DELETE | /wipe | Delete all memory irreversible | header: X-Confirm-Wipe: true | {"wiped":true} | GET | /stats | Entity/chunk/graph counts + uptime | — | StatsResponse | Key request/response types: // IngestRequest { "content": "string", // required — text to store "source": "string", // required — e.g. "chat", "email", "cli" "session id": "string|null", // optional — group related chunks "metadata": {} // optional — arbitrary JSON } // RetrievalQuery { "text": "string", // required — query text "session id": "string|null", // optional — filter by session "max chunks": 10, // default 10 "max entities": 20, // default 20 "min confidence": 0.5, // default 0.5 "include graph": true, // default true — expand via knowledge graph "graph depth": 2 // default 2 — BFS depth for graph expansion } Full endpoint documentation with curl examples: docs/api.md | Variable | Default | Description | |---|---|---| MNEMO DB PATH | mnemo.db | SQLite database file path | MNEMO PORT | 8080 | API server port | MNEMO LLM BASE URL | http://localhost:11434/v1 | OpenAI-compatible LLM base URL | MNEMO LLM MODEL | llama3 | Model name for entity extraction | MNEMO LLM API KEY | ollama | API key any value works for Ollama | MNEMO LLM PROVIDER | ollama | Provider type: ollama , openai , anthropic , custom | Pass --config path/to/config.toml to mnemo-api . See mnemo.example.toml : db path = "mnemo.db" port = 8080 llm provider = "ollama" base url = "http://localhost:11434/v1" model = "llama3" api key = "ollama" timeout secs = 30 max retries = 3 max tokens = 2048 temperature = 0.1 Environment variables take precedence over TOML values. The active config source is reported in GET /health → config source . Install: cargo install --path crates/mnemo-cli Usage: Store a memory mnemo ingest "I use Neovim and prefer dark mode" Retrieve relevant context mnemo search "what editor do I use?" List all extracted entities mnemo entities Show entity detail + graph neighbors mnemo entity