Commonplace: Self-hosted, privacy-tiered memory for your AI agents Commonplace launches a self-hosted, privacy-tiered memory system for AI agents, using a two-tier Graphiti knowledge graph that runs entirely on local hardware by default, with a personal tier optionally using hosted models for non-confidential data and a client-confidential tier that never leaves the machine. A self-hosted, two-tier Graphiti https://github.com/getzep/graphiti knowledge graph that MCP clients for example Claude Code and Pi read from and write to over a private Tailscale https://tailscale.com network. It's offline-first: by default every part — including the LLM that extracts your graph — runs on your own hardware, so nothing leaves the box. It runs on a single always-on Linux host with Docker and a consumer NVIDIA GPU. Your laptops and other devices are pure clients — they host nothing. Knowledge-graph ingestion uses an LLM to extract entities and relationships from text. That extraction is where your data would be exposed to a model — so by default commonplace does it locally , on your GPU, for both tiers. The two tiers split memory by confidentiality and by whether you're allowed to trade locality for quality: | Tier | Graph | Extraction default | Where it runs | Use for | |---|---|---|---|---| personal | commonplace personal | mistral:7b-instruct-q4 0 local | the host's GPU | your own notes, projects, life — optionally a hosted model for quality | client-confidential | commonplace client | mistral:7b-instruct-q4 0 local | the host's GPU | confidential / NDA material that must never leave the machine | The personal tier is local by default but may be pointed at a hosted model e.g. Claude Haiku for higher-quality graphs on non-confidential data — opt in via .env see Hosted upgrade? under Setup setup . The client tier is always local; that's the whole point of it. Retrieval is cheap and private on both tiers. Search is embeddings + BM25 + graph traversal with no LLM in the query path . The GPU only ever does slow, asynchronous background extraction — query latency is never affected. Slow local extraction is therefore fine. Both tiers share one embedder Ollama nomic-embed-text , 768-dim and one FalkorDB holding two separate graphs, so the two memories stay isolated but the infrastructure stays simple. flowchart TB CC "Claude Code