Commonplace: Self-hosted, privacy-tiered memory for your AI agents

Commonplace launches a self-hosted, privacy-tiered memory system for AI agents, using a two-tier Graphiti knowledge graph that runs entirely on local hardware by default, with a personal tier optionally using hosted models for non-confidential data and a client-confidential tier that never leaves the machine.

A self-hosted, two-tier Graphiti https://github.com/getzep/graphiti knowledge graph that MCP clients for example Claude Code and Pi read from and write to over a private Tailscale https://tailscale.com network. It's offline-first: by default every part — including the LLM that extracts your graph — runs on your own hardware, so nothing leaves the box. It runs on a single always-on Linux host with Docker and a consumer NVIDIA GPU. Your laptops and other devices are pure clients — they host nothing. Knowledge-graph ingestion uses an LLM to extract entities and relationships from text. That extraction is where your data would be exposed to a model — so by default commonplace does it locally , on your GPU, for both tiers. The two tiers split memory by confidentiality and by whether you're allowed to trade locality for quality: | Tier | Graph | Extraction default | Where it runs | Use for | |---|---|---|---|---| personal | commonplace personal | mistral:7b-instruct-q4 0 local | the host's GPU | your own notes, projects, life — optionally a hosted model for quality | client-confidential | commonplace client | mistral:7b-instruct-q4 0 local | the host's GPU | confidential / NDA material that must never leave the machine | The personal tier is local by default but may be pointed at a hosted model e.g. Claude Haiku for higher-quality graphs on non-confidential data — opt in via .env see Hosted upgrade? under Setup setup . The client tier is always local; that's the whole point of it. Retrieval is cheap and private on both tiers. Search is embeddings + BM25 + graph traversal with no LLM in the query path . The GPU only ever does slow, asynchronous background extraction — query latency is never affected. Slow local extraction is therefore fine. Both tiers share one embedder Ollama nomic-embed-text , 768-dim and one FalkorDB holding two separate graphs, so the two memories stay isolated but the infrastructure stays simple. flowchart TB CC "Claude Code<br/ client " PI "Pi<br/ client " TS{{"Tailscale<br/ MagicDNS · tailnet-only"}} ANT "Anthropic API<br/ Claude Haiku 4.5 · hosted" CC -- TS PI -- TS subgraph HOST "your server — Docker" direction TB GW "<b gateway</b :8000 / :8001<br/ per-tier auth · logging · metrics" MP "<b mcp-personal</b <br/ personal tier · internal" MC "<b mcp-client</b <br/ client-confidential · internal" GW -- MP GW -- MC OL "<b Ollama</b :11434<br/ nomic-embed-text · mistral:7b<br/ local GPU" subgraph FALKOR "FalkorDB :6379 · browser UI :3000" direction LR GP "commonplace personal" GC "commonplace client" end MP -- |store| GP MC -- |store| GC MP -. embed .- OL MC -. embed .- OL MC -- |extract · local| OL end TS -- |Bearer token| GW MP -- |extract · hosted| ANT classDef ext fill: fff3e0,stroke: e67e22,color: 111; classDef tier fill: e8f0fe,stroke: 4285f4,color: 111; class ANT ext class MP,MC tier One FalkorDB , two graphs selected per-instance by FALKORDB DATABASE commonplace personal vs commonplace client . Two Graphiti MCP instances commonplace-mcp:local , built from zepai/knowledge-graph-mcp:standalone — see Dockerfile , HTTP transport, served at path trailing slash . /mcp/ One shared Ollama embedder nomic-embed-text , 768-dim used by both instances. Do not mix embedders — vectors from different embedders are not comparable. A gateway Caddy fronts both tiers: it owns the host ports, requires a per-tier bearer token so a client with only the client token can't reach the personal tier , and emits access logs audit + Prometheus metrics. The MCP containers themselves are internal-only. Replace your-server.your-tailnet.ts.net with your host's Tailscale MagicDNS name throughout run tailscale status on the host to find it . | Tier | Host endpoint tailnet | Internal port | Graph FALKORDB DATABASE | LLM | SEMAPHORE LIMIT | |---|---|---|---|---|---| | personal | http://your-server.your-tailnet.ts.net:8000/mcp/ | 8000 | commonplace personal | mistral:7b… local, default | 1 | | client | http://your-server.your-tailnet.ts.net:8001/mcp/ | 8000 | commonplace client | mistral:7b-instruct-q4 0 | 1 | | FalkorDB | 127.0.0.1:6379 host-local only | 6379 | both graphs | — | — | | FalkorDB UI | http://your-server.your-tailnet.ts.net:3000 | 3000 | browse either graph | — | — | | Metrics | 127.0.0.1:9180/metrics host-local only | 9180 | gateway Prometheus | — | — | The personal/client endpoints require Authorization: Bearer <tier-token set PERSONAL TOKEN / CLIENT TOKEN in .env . A request without the right token gets 401 . On the host : Docker with Compose v2.running on the host, serving the shared embedder and the local extraction model. The MCP containers reach it over HTTP — the GPU is used by Ollama, not by the containers, so no GPU passthrough into Docker is required. A consumer NVIDIA GPU with ~8 GB VRAM runs Ollama https://ollama.com mistral:7b-instruct-q4 0 comfortably; CPU-only works but local extraction is slow.— the MCP endpoints are served over the tailnet, not the public internet. Tailscale https://tailscale.com No API keys required. Both tiers extract locally by default. An Anthropic API key is needed only if you opt the personal tier into a hosted model see Hosted upgrade? below . On each client laptop, etc. : Tailscale, plus an MCP-capable client Claude Code, Pi, … . Run on the host, from a clone of this repo e.g. ~/commonplace : 1. Pull the models Ollama will serve ollama pull nomic-embed-text ollama pull mistral:7b-instruct-q4 0 2. Configure secrets cp .env.example .env edit .env and set: FALKORDB PASSWORD openssl rand -hex 24 PERSONAL TOKEN / CLIENT TOKEN gateway bearer tokens openssl rand -hex 32 each no ANTHROPIC API KEY needed — extraction is local by default 3. Build the local image and start the stack docker compose up -d docker compose ps all services should report healthy Then point a client at the two endpoints — see Client configuration client-configuration . Hosted upgrade?Everything is local by default. To point thepersonaltier at a hosted model for higher-quality graphs non-confidential data only , set in .env : PERSONAL LLM PROVIDER=anthropic , PERSONAL LLM MODEL=claude-haiku-4-5 , PERSONAL SEMAPHORE LIMIT=5 , and ANTHROPIC API KEY=… . The client tier stays local regardless. Upgrading from a pre-gateway deploy?Add PERSONAL TOKEN / CLIENT TOKEN to .env , then docker compose up -d --build --force-recreate the MCP tiers move behind the gateway and the ontology change needs a recreate . Re-add each client with its Authorization: Bearer header — existing token-less clients will start getting 401 . These are the landmines specific to the current 2026 Graphiti MCP server. Several contradict older docs. There is no To use Ollama you set openai generic provider string. provider: "openai" and point api url at a non-OpenAI URL; the server then auto-selects its OpenAIGenericClient internally. That generic client is what avoids OpenAI's beta responses.parse which Ollama does not implement . Setting provider: "openai generic" is invalid. There is no The MCP server has a single small model setting. llm.model . On the openai path it uses that same model for the "small" slot too. The infamous gpt-4.1-mini is only a fallback used when model is None — pinning llm.model is enough to never hit it., and json schema structured output is always on for the local path and cannot be disabled— retries are built-in tenacity, 4 attempts . There is no config knob for either. If a small local model produces invalid JSON, the only lever is a more capable model. instructor is not used there Ollama must be reachable from inside the containers. Ollama runs on the host , so each MCP service needs extra hosts: "host.docker.internal:host-gateway" and an api url of http://host.docker.internal:11434/v1 . Ollama must listen on 0.0.0.0:11434 it does by default .Two graphs in one FalkorDB = two instances with the same FALKORDB DATABASE selects the graph; group id does not. FALKORDB URI and different FALKORDB DATABASE . group id only namespaces nodes within a graph. FalkorDB host/port are parsed from — FALKORDB URI FALKORDB HOST / FALKORDB PORT are ignored. The only env overrides read are FALKORDB URI and FALKORDB PASSWORD . FalkorDB password is set via , an env var — REDIS ARGS=--requirepass … not by overriding the container command that would stop the FalkorDB module from loading . Use the :standalone image, not :latest . zepai/knowledge-graph-mcp:latest bundles its own FalkorDB; :standalone expects an external one — required to share a single FalkorDB across two instances. The MCP path has a trailing slash: FastMCP default; not configurable . /mcp/ Anthropic model id: use the bare alias The claude-haiku-4-5 , not claude-haiku-4-5-latest . -latest suffix is an OpenAI-ism; the Anthropic API 404s on it not found error: model . The bare alias resolves to the current dated snapshot claude-haiku-4-5-20251001 . The Anthropic provider needs an explicit numeric graphiti passes llm.temperature . temperature=config.temperature ; with none set it sends null and the API 400s temperature: Input should be a valid number , so every personal-tier episode queues but never processes. The OpenAI/Ollama generic client tolerates null , so this bites only the Anthropic tier. Set e.g. temperature: 0.0 . The :standalone image ships WITHOUT the anthropic SDK. provider: anthropic then fails at startup — "Anthropic client not available in current graphiti-core version" the factory's HAS ANTHROPIC is False because import anthropic raises . The bundled Dockerfile adds it uv pip install anthropic . graphiti-core builds a default OpenAI reranker at init that demands OPENAI API KEY even though the search path uses NODE HYBRID SEARCH RRF no cross-encoder . Give each tier a dummy OPENAI API KEY so it can construct; point OPENAI BASE URL at Ollama so even an accidental call stays on-box. In practice it is never called. FastMCP rejects non-localhost Host headers with HTTP 421 "Invalid Host header". It auto-enables DNS-rebinding protection with a localhost-only allow-list at construction and passes that object explicitly into its pydantic Settings, so the FASTMCP … env vars cannot override it init kwargs beat env . The bundled patch transport security.py run in the Dockerfile disables the protection — safe on a tailnet, where the network is the trust boundary and clients are agents, not browsers. To tighten, set explicit allowed hosts instead. The container env var for the OpenAI-compatible base URL is graphiti's config expansion , not OPENAI API URL OPENAI BASE URL . Note the reranker 13 is the opposite — it reads the OpenAI SDK's OPENAI BASE URL . Two different names for two different clients. Run on the host, from the repo directory e.g. ~/commonplace . Redeploy in one command — scripts/commonplace wraps the pull → rebuild → recreate flow symlink it onto your PATH , e.g. ln -sf "$PWD/scripts/commonplace" ~/.local/bin/commonplace : commonplace update sync repo, rebuild image, recreate config-sensitive services commonplace update --reset same, but hard-reset to origin/main after a force-push commonplace status service health + graph counts The underlying compose commands, if you'd rather run them by hand: Bring the stack up after .env is filled in docker compose up -d Status / health docker compose ps docker compose logs -f mcp-personal or mcp-client, falkordb Restart one instance after a config change config/ .yaml does not hot-reload docker compose up -d --force-recreate mcp-client Rebuild the local image after editing the Dockerfile or patch transport security.py docker compose up -d --build Stop / start data persists in the falkordb data volume docker compose stop docker compose start Tear down KEEP data docker compose down Tear down AND delete the graphs docker compose down -v Quick MCP health check from a client, over the tailnet or LAN . Without a token you get 401 auth working ; with the right tier token you get 307 : curl -s -o /dev/null -w "%{http code}\n" -H "Authorization: Bearer $PERSONAL TOKEN" \ http://your-server.your-tailnet.ts.net:8000/mcp/ curl -s -o /dev/null -w "%{http code}\n" -H "Authorization: Bearer $CLIENT TOKEN" \ http://your-server.your-tailnet.ts.net:8001/mcp/ Is anyone actually using it? run on the host ./scripts/graph stats.sh writes landing per tier ./scripts/mcp activity.sh reads/writes per tier from the gateway log FalkorDB persists to the falkordb data volume — mounted at its actual data dir /var/lib/falkordb/data , with AOF enabled --appendonly yes , so writes are durable to ~1s and survive container recreates. Back up / restore the whole data dir RDB + AOF with the scripts: php ./scripts/backup.sh - ./backups/falkordb-<stamp .tar.gz ./scripts/restore.sh ./backups/falkordb-<stamp .tar.gz overwrites live data prompts to confirm Both read FALKORDB PASSWORD from .env . backup.sh asks the server for its data dir, so it keeps working even if the path changes. Earlier revisions mounted the volume at /data while FalkorDB wrote to /var/lib/falkordb/data on the ephemeral container layer — so data was lost on every --force-recreate . The mount path is now fixed; redeploy with commonplace update to apply it. Default: MagicDNS + port. The gateway binds :8000 / :8001 on the host and is reached over the tailnet at http://your-server.your-tailnet.ts.net:8000/mcp/ and :8001/mcp/ . This is tailnet-reachable and LAN-reachable but not public — do not port-forward these on your router. Auth. Every request needs Authorization: Bearer <tier-token ; the gateway 401s otherwise. Separate PERSONAL TOKEN / CLIENT TOKEN give each client only the tiers it should touch. FalkorDB host-local — never on the tailnet. :6379 and metrics :9180 bind to 127.0.0.1 only Keep the host single-homed. The host's primary interface should hold exactly one IPv4. If a second address appears e.g. a static IP plus a DHCP lease , Tailscale can advertise two WireGuard endpoints and the tunnel flaps, which black-holes TCP over MagicDNS while the LAN and disco pings roam across endpoints; real TCP does not . On Ubuntu this most often comes from cloud-init re-enabling DHCP — disable its network management tailscale ping still appear to work echo 'network: {config: disabled}' | sudo tee /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg . Symptom to watch for: ip -brief addr show <iface listing more than one address on your LAN subnet. HTTPS upgrade optional . To serve the MCP endpoints as tailnet-only HTTPS names instead of raw ports:then point clients at tailscale serve --bg --https=8443 http://localhost:8000 personal tailscale serve --bg --https=8444 http://localhost:8001 client https://your-server.your-tailnet.ts.net:8443/mcp/ etc. MagicDNS:port is the simpler default and is what the client config below uses. Replace your-server.your-tailnet.ts.net with your host's Tailscale MagicDNS name tailscale status . The identical ports/paths are also served on the host's LAN IP, which is a handy fallback if MagicDNS is ever unreachable. Pass the per-tier bearer token with --header . Give a client only the tiers it should reach e.g. omit the personal server on a machine that handles confidential work : claude mcp add --scope user --transport http commonplace-personal http://your-server.your-tailnet.ts.net:8000/mcp/ \ --header "Authorization: Bearer $PERSONAL TOKEN" claude mcp add --scope user --transport http commonplace-client http://your-server.your-tailnet.ts.net:8001/mcp/ \ --header "Authorization: Bearer $CLIENT TOKEN" claude mcp list both should report ✓ Connected New servers load on the next Claude Code start. Pi has no native MCP — add the community bridge, then a global mcp.json : pi install npm:@spences10/pi-mcp records the bridge in settings.json Each server entry must include "type": "http" ; a url -only entry triggers an OAuth handshake this server doesn't support. The extension lazy-connects by default — set MY PI MCP EAGER CONNECT=1 to connect and discover tools at startup. { "mcpServers": { "commonplace-personal": { "type": "http", "url": "http://your-server.your-tailnet.ts.net:8000/mcp/", "headers": { "Authorization": "Bearer YOUR PERSONAL TOKEN" } }, "commonplace-client": { "type": "http", "url": "http://your-server.your-tailnet.ts.net:8001/mcp/", "headers": { "Authorization": "Bearer YOUR CLIENT TOKEN" } } } } Any device on the tailnet can use the same two endpoints — there is nothing per-client on the server. To add one: - Join the device to the tailnet tailscale up and confirm it can reach the host tailscale ping your-server . - For Claude Code, run the two claude mcp add … /mcp/ commands above user scope . - For any MCP client, add both servers with "type": "http" pointing at :8000/mcp/ and :8001/mcp/ . - Nothing to change on the host — graphs and auth are shared; reads/writes from the new client land in the same two graphs. - For HTTPS, expose via tailscale serve above and use the https://… URLs instead. Two things turn this from a memory store into a memory system agents use well : Per-tier ontology. Each tier defines graphiti.entity types in its config personal: Preference, Project, Person, Decision, …; client: Engagement, Stakeholder, Requirement, Risk, … . These type descriptions constrain extraction — the single biggest lever on graph quality, and they help the weak local model the most. An agent protocol. is the contract for any client Claude Code, Pi : search before answering, write durable facts, docs/memory-protocol.md never cross tiers no confidential data on the hosted personal tier , and cite what you used. Install it as a skill or system prompt — without it, agents rarely call memory and the graph stays empty. Is it actually being used? scripts/graph stats.sh shows whether writes are landing; scripts/mcp activity.sh and the Prometheus endpoint on :9180 show whether agents are reading . Seed an existing corpus with scripts/ingest markdown.py , pull token-budgeted context with scripts/recall.py , gate retrieval quality with eval/run eval.py , and review resolved contradictions with scripts/contradictions.sh . See docs/ROADMAP.md /itsmeduncan/commonplace/blob/main/docs/ROADMAP.md for what's shipped vs. still open a local reranker remains the notable deferral . commonplace/ ├── docker-compose.yml FalkorDB + 2 MCP instances + gateway, restart: unless-stopped ├── Dockerfile commonplace-mcp:local — standalone image digest-pinned + patch ├── patch transport security.py build-time: allow remote Host headers disable DNS-rebind guard ├── gateway/ │ └── Caddyfile per-tier bearer auth + access logging + Prometheus metrics ├── config/ │ ├── personal.yaml instance A — Anthropic Haiku extraction + personal ontology │ └── client.yaml instance B — local Ollama extraction + confidential ontology ├── scripts/ │ ├── commonplace operate CLI: commonplace update redeploys the stack │ ├── graph stats.sh write counts per tier · mcp activity.sh read counts gateway log │ ├── recall.py token-budgeted recall · contradictions.sh superseded facts │ ├── backup.sh / restore.sh FalkorDB dump + restore │ └── ingest markdown.py load a markdown corpus notes/docs into a tier ├── eval/ │ ├── queries.yaml retrieval eval cases question → expected facts │ └── run eval.py scores recall against a tier ├── docs/ │ ├── memory-protocol.md how agents should read/write memory tier safety, cite-back │ └── ROADMAP.md hardening & maturity plan ├── .env.example template; copy to .env on the host gitignored ├── .dockerignore keeps .env and other secrets out of the build context ├── CLAUDE.md guidance for Claude Code working in this repo ├── LICENSE MIT └── README.md Secrets live only in .env on the host and are never committed. The repo is the source of truth: edit a clone, push to your fork, git pull on the host, docker compose up -d . MIT /itsmeduncan/commonplace/blob/main/LICENSE .