# Commonplace: Self-hosted, privacy-tiered memory for your AI agents

> Source: <https://github.com/itsmeduncan/commonplace>
> Published: 2026-06-30 22:46:21+00:00

A self-hosted, two-tier [Graphiti](https://github.com/getzep/graphiti) knowledge graph that MCP
clients (for example **Claude Code** and **Pi**) read from and write to over a private
[Tailscale](https://tailscale.com) network. **It's offline-first: by default every part — including
the LLM that extracts your graph — runs on your own hardware, so nothing leaves the box.**

It runs on a single always-on Linux host with Docker and a consumer NVIDIA GPU. Your laptops and other devices are pure clients — they host nothing.

Knowledge-graph ingestion uses an LLM to extract entities and relationships from text. That
extraction is where your data would be exposed to a model — so by default `commonplace`

does it
**locally**, on your GPU, for both tiers. The two tiers split memory by confidentiality and by
whether you're allowed to trade locality for quality:

| Tier | Graph | Extraction (default) | Where it runs | Use for |
|---|---|---|---|---|
personal |
`commonplace_personal` |
`mistral:7b-instruct-q4_0` (local) |
the host's GPU | your own notes, projects, life — optionally a hosted model for quality |
client-confidential |
`commonplace_client` |
`mistral:7b-instruct-q4_0` (local) |
the host's GPU | confidential / NDA material that must never leave the machine |

The **personal** tier is local by default but may be pointed at a hosted model (e.g. Claude Haiku)
for higher-quality graphs on **non-confidential** data — opt in via `.env`

(see *Hosted upgrade?*
under [Setup](#setup)). The **client** tier is always local; that's the whole point of it.

**Retrieval is cheap and private on both tiers.** Search is embeddings + BM25 + graph traversal
with **no LLM in the query path**. The GPU only ever does slow, asynchronous *background*
extraction — query latency is never affected. Slow local extraction is therefore fine.

Both tiers share **one embedder** (Ollama `nomic-embed-text`

, 768-dim) and **one FalkorDB**
holding two separate graphs, so the two memories stay isolated but the infrastructure stays simple.

```
flowchart TB
    CC["Claude Code<br/>(client)"]
    PI["Pi<br/>(client)"]
    TS{{"Tailscale<br/>MagicDNS · tailnet-only"}}
    ANT["Anthropic API<br/>Claude Haiku 4.5 · hosted"]

    CC --> TS
    PI --> TS

    subgraph HOST["your server — Docker"]
        direction TB
        GW["<b>gateway</b> :8000 / :8001<br/>per-tier auth · logging · metrics"]
        MP["<b>mcp-personal</b><br/>personal tier · internal"]
        MC["<b>mcp-client</b><br/>client-confidential · internal"]
        GW --> MP
        GW --> MC
        OL["<b>Ollama</b> :11434<br/>nomic-embed-text · mistral:7b<br/>local GPU"]
        subgraph FALKOR["FalkorDB :6379 · browser UI :3000"]
            direction LR
            GP[("commonplace_personal")]
            GC[("commonplace_client")]
        end
        MP -->|store| GP
        MC -->|store| GC
        MP -. embed .-> OL
        MC -. embed .-> OL
        MC -->|extract · local| OL
    end

    TS -->|Bearer token| GW
    MP -->|extract · hosted| ANT

    classDef ext fill:#fff3e0,stroke:#e67e22,color:#111;
    classDef tier fill:#e8f0fe,stroke:#4285f4,color:#111;
    class ANT ext
    class MP,MC tier
```

**One FalkorDB**, two graphs selected per-instance by`FALKORDB_DATABASE`

(`commonplace_personal`

vs`commonplace_client`

).**Two Graphiti MCP instances**(`commonplace-mcp:local`

, built from`zepai/knowledge-graph-mcp:standalone`

— see`Dockerfile`

), HTTP transport, served at path(trailing slash).`/mcp/`

**One shared Ollama embedder**(`nomic-embed-text`

, 768-dim) used by*both*instances. Do not mix embedders — vectors from different embedders are not comparable.**A gateway**(Caddy) fronts both tiers: it owns the host ports, requires a** per-tier bearer token**(so a client with only the client token can't reach the personal tier), and emits access logs (audit) + Prometheus metrics. The MCP containers themselves are internal-only.

Replace

`your-server.your-tailnet.ts.net`

with your host's Tailscale MagicDNS name throughout (run`tailscale status`

on the host to find it).

| Tier | Host endpoint (tailnet) | Internal port | Graph (`FALKORDB_DATABASE` ) |
LLM | `SEMAPHORE_LIMIT` |
|---|---|---|---|---|---|
| personal | `http://your-server.your-tailnet.ts.net:8000/mcp/` |
8000 | `commonplace_personal` |
`mistral:7b…` (local, default) |
1 |
| client | `http://your-server.your-tailnet.ts.net:8001/mcp/` |
8000 | `commonplace_client` |
`mistral:7b-instruct-q4_0` |
1 |
| FalkorDB | `127.0.0.1:6379` (host-local only) |
6379 | both graphs | — | — |
| FalkorDB UI | `http://your-server.your-tailnet.ts.net:3000` |
3000 | browse either graph | — | — |
| Metrics | `127.0.0.1:9180/metrics` (host-local only) |
9180 | gateway (Prometheus) | — | — |

The personal/client endpoints require `Authorization: Bearer <tier-token>`

(set `PERSONAL_TOKEN`

/
`CLIENT_TOKEN`

in `.env`

). A request without the right token gets `401`

.

On the **host**:

**Docker** with Compose v2.running on the host, serving the shared embedder and the local extraction model. The MCP containers reach it over HTTP — the GPU is used by Ollama, not by the containers, so no GPU passthrough into Docker is required. A consumer NVIDIA GPU with ~8 GB VRAM runs[Ollama](https://ollama.com)`mistral:7b-instruct-q4_0`

comfortably; CPU-only works but local extraction is slow.— the MCP endpoints are served over the tailnet, not the public internet.[Tailscale](https://tailscale.com)**No API keys required.** Both tiers extract locally by default. An**Anthropic API key** is needed*only*if you opt the personal tier into a hosted model (see*Hosted upgrade?*below).

On each **client** (laptop, etc.): Tailscale, plus an MCP-capable client (Claude Code, Pi, …).

Run on the host, from a clone of this repo (e.g. `~/commonplace`

):

```
# 1. Pull the models Ollama will serve
ollama pull nomic-embed-text
ollama pull mistral:7b-instruct-q4_0

# 2. Configure secrets
cp .env.example .env
#    edit .env and set:
#      FALKORDB_PASSWORD          (openssl rand -hex 24)
#      PERSONAL_TOKEN / CLIENT_TOKEN   gateway bearer tokens (openssl rand -hex 32 each)
#    (no ANTHROPIC_API_KEY needed — extraction is local by default)

# 3. Build the local image and start the stack
docker compose up -d
docker compose ps        # all services should report healthy
```

Then point a client at the two endpoints — see [Client configuration](#client-configuration).

Hosted upgrade?Everything is local by default. To point thepersonaltier at a hosted model for higher-quality graphs (non-confidential data only), set in`.env`

:`PERSONAL_LLM_PROVIDER=anthropic`

,`PERSONAL_LLM_MODEL=claude-haiku-4-5`

,`PERSONAL_SEMAPHORE_LIMIT=5`

, and`ANTHROPIC_API_KEY=…`

. The client tier stays local regardless.

Upgrading from a pre-gateway deploy?Add`PERSONAL_TOKEN`

/`CLIENT_TOKEN`

to`.env`

, then`docker compose up -d --build --force-recreate`

(the MCP tiers move behind the gateway and the ontology change needs a recreate). Re-add each client with its`Authorization: Bearer`

header — existing token-less clients will start getting`401`

.

These are the landmines specific to the current (2026) Graphiti MCP server. Several contradict older docs.

**There is no** To use Ollama you set`openai_generic`

provider string.`provider: "openai"`

and point`api_url`

at a non-OpenAI URL; the server then auto-selects its`OpenAIGenericClient`

internally. That generic client is what avoids OpenAI's beta`responses.parse()`

(which Ollama does not implement). Setting`provider: "openai_generic"`

is invalid.**There is no** The MCP server has a single`small_model`

setting.`llm.model`

. On the openai path it uses that same model for the "small" slot too. The infamous`gpt-4.1-mini`

is only a fallback used when`model`

is`None`

— pinning`llm.model`

is enough to never hit it., and`json_schema`

structured output is always on for the local path and cannot be disabled— retries are built-in (tenacity, 4 attempts). There is no config knob for either. If a small local model produces invalid JSON, the only lever is a more capable model.`instructor`

is not used there**Ollama must be reachable from inside the containers.** Ollama runs on the*host*, so each MCP service needs`extra_hosts: ["host.docker.internal:host-gateway"]`

and an`api_url`

of`http://host.docker.internal:11434/v1`

. Ollama must listen on`0.0.0.0:11434`

(it does by default).Two graphs in one FalkorDB = two instances with the same`FALKORDB_DATABASE`

selects the graph;`group_id`

does not.`FALKORDB_URI`

and different`FALKORDB_DATABASE`

.`group_id`

only namespaces nodes*within*a graph.**FalkorDB host/port are parsed from**—`FALKORDB_URI`

`FALKORDB_HOST`

/`FALKORDB_PORT`

are ignored. The only env overrides read are`FALKORDB_URI`

and`FALKORDB_PASSWORD`

.**FalkorDB password is set via**, an env var —`REDIS_ARGS=--requirepass …`

*not*by overriding the container`command`

(that would stop the FalkorDB module from loading).**Use the**`:standalone`

image, not`:latest`

.`zepai/knowledge-graph-mcp:latest`

bundles its own FalkorDB;`:standalone`

expects an external one — required to share a single FalkorDB across two instances.**The MCP path has a trailing slash:**(FastMCP default; not configurable).`/mcp/`

**Anthropic model id: use the bare alias** The`claude-haiku-4-5`

, not`claude-haiku-4-5-latest`

.`-latest`

suffix is an OpenAI-ism; the Anthropic API 404s on it (`not_found_error: model`

). The bare alias resolves to the current dated snapshot (`claude-haiku-4-5-20251001`

).**The Anthropic provider needs an explicit numeric** graphiti passes`llm.temperature`

.`temperature=config.temperature`

; with none set it sends`null`

and the API 400s (`temperature: Input should be a valid number`

), so every personal-tier episode queues but never processes. The OpenAI/Ollama generic client tolerates`null`

, so this bites only the Anthropic tier. Set e.g.`temperature: 0.0`

.**The**`:standalone`

image ships WITHOUT the`anthropic`

SDK.`provider: anthropic`

then fails at startup — "Anthropic client not available in current graphiti-core version" (the factory's`HAS_ANTHROPIC`

is False because`import anthropic`

raises). The bundled`Dockerfile`

adds it (`uv pip install anthropic`

).**graphiti-core builds a default OpenAI reranker at init** that demands`OPENAI_API_KEY`

even though the search path uses`NODE_HYBRID_SEARCH_RRF`

(no cross-encoder). Give each tier a dummy`OPENAI_API_KEY`

so it can construct; point`OPENAI_BASE_URL`

at Ollama so even an accidental call stays on-box. In practice it is never called.**FastMCP rejects non-localhost Host headers with HTTP 421 "Invalid Host header".** It auto-enables DNS-rebinding protection with a localhost-only allow-list at construction and passes that object explicitly into its pydantic Settings, so the`FASTMCP_…`

env vars cannot override it (init kwargs beat env). The bundled`patch_transport_security.py`

(run in the Dockerfile) disables the protection — safe on a tailnet, where the network is the trust boundary and clients are agents, not browsers. To tighten, set explicit`allowed_hosts`

instead.**The container env var for the OpenAI-compatible base URL is**(graphiti's config expansion), not`OPENAI_API_URL`

`OPENAI_BASE_URL`

. Note the reranker (#13) is the opposite — it reads the OpenAI SDK's`OPENAI_BASE_URL`

. Two different names for two different clients.

Run on the host, from the repo directory (e.g. `~/commonplace`

).

**Redeploy in one command** — `scripts/commonplace`

wraps the pull → rebuild → recreate flow
(symlink it onto your `PATH`

, e.g. `ln -sf "$PWD/scripts/commonplace" ~/.local/bin/commonplace`

):

```
commonplace update           # sync repo, rebuild image, recreate config-sensitive services
commonplace update --reset   # same, but hard-reset to origin/main (after a force-push)
commonplace status           # service health + graph counts
```

The underlying compose commands, if you'd rather run them by hand:

```
# Bring the stack up (after .env is filled in)
docker compose up -d

# Status / health
docker compose ps
docker compose logs -f mcp-personal     # or mcp-client, falkordb

# Restart one instance after a config change (config/*.yaml does not hot-reload)
docker compose up -d --force-recreate mcp-client

# Rebuild the local image after editing the Dockerfile or patch_transport_security.py
docker compose up -d --build

# Stop / start (data persists in the falkordb_data volume)
docker compose stop
docker compose start

# Tear down (KEEP data)
docker compose down
# Tear down AND delete the graphs
docker compose down -v
```

Quick MCP health check (from a client, over the tailnet or LAN). Without a token you get `401`

(auth working); with the right tier token you get `307`

:

```
curl -s -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $PERSONAL_TOKEN" \
  http://your-server.your-tailnet.ts.net:8000/mcp/
curl -s -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $CLIENT_TOKEN" \
  http://your-server.your-tailnet.ts.net:8001/mcp/

# Is anyone actually using it? (run on the host)
./scripts/graph_stats.sh        # writes landing per tier
./scripts/mcp_activity.sh       # reads/writes per tier from the gateway log
```

FalkorDB persists to the `falkordb_data`

volume — mounted at its actual data dir
(`/var/lib/falkordb/data`

), with **AOF enabled** (`--appendonly yes`

), so writes are durable to ~1s
and survive container recreates. Back up / restore the whole data dir (RDB + AOF) with the scripts:

``` php
./scripts/backup.sh                                       # -> ./backups/falkordb-<stamp>.tar.gz
./scripts/restore.sh ./backups/falkordb-<stamp>.tar.gz    # overwrites live data (prompts to confirm)
```

Both read `FALKORDB_PASSWORD`

from `.env`

. `backup.sh`

asks the server for its data dir, so it keeps
working even if the path changes.

Earlier revisions mounted the volume at

`/data`

while FalkorDB wrote to`/var/lib/falkordb/data`

on the ephemeral container layer — so data was lost on every`--force-recreate`

. The mount path is now fixed; redeploy with`commonplace update`

to apply it.

**Default: MagicDNS + port.** The gateway binds`:8000`

/`:8001`

on the host and is reached over the tailnet at`http://your-server.your-tailnet.ts.net:8000/mcp/`

and`:8001/mcp/`

. This is tailnet-reachable (and LAN-reachable) but**not** public — do not port-forward these on your router.**Auth.** Every request needs`Authorization: Bearer <tier-token>`

; the gateway 401s otherwise. Separate`PERSONAL_TOKEN`

/`CLIENT_TOKEN`

give each client only the tiers it should touch.**FalkorDB**(host-local) — never on the tailnet.`:6379`

and metrics`:9180`

bind to`127.0.0.1`

only**Keep the host single-homed.** The host's primary interface should hold exactly one IPv4. If a second address appears (e.g. a static IP*plus*a DHCP lease), Tailscale can advertise two WireGuard endpoints and the tunnel flaps, which**black-holes TCP over MagicDNS while the LAN and**(disco pings roam across endpoints; real TCP does not). On Ubuntu this most often comes from cloud-init re-enabling DHCP — disable its network management (`tailscale ping`

still appear to work`echo 'network: {config: disabled}' | sudo tee /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg`

). Symptom to watch for:`ip -brief addr show <iface>`

listing more than one address on your LAN subnet.**HTTPS upgrade (optional).** To serve the MCP endpoints as tailnet-only HTTPS names instead of raw ports:then point clients at

```
tailscale serve --bg --https=8443 http://localhost:8000   # personal
tailscale serve --bg --https=8444 http://localhost:8001   # client
```

`https://your-server.your-tailnet.ts.net:8443/mcp/`

etc. MagicDNS:port is the simpler default and is what the client config below uses.

Replace

`your-server.your-tailnet.ts.net`

with your host's Tailscale MagicDNS name (`tailscale status`

). The identical ports/paths are also served on the host's LAN IP, which is a handy fallback if MagicDNS is ever unreachable.

Pass the per-tier bearer token with `--header`

. Give a client only the tiers it should reach (e.g.
omit the personal server on a machine that handles confidential work):

```
claude mcp add --scope user --transport http commonplace-personal http://your-server.your-tailnet.ts.net:8000/mcp/ \
  --header "Authorization: Bearer $PERSONAL_TOKEN"
claude mcp add --scope user --transport http commonplace-client   http://your-server.your-tailnet.ts.net:8001/mcp/ \
  --header "Authorization: Bearer $CLIENT_TOKEN"
claude mcp list   # both should report ✓ Connected
```

(New servers load on the next Claude Code start.)

Pi has no native MCP — add the community bridge, then a global `mcp.json`

:

```
pi install npm:@spences10/pi-mcp     # records the bridge in settings.json
```

Each server entry must include `"type": "http"`

; a `url`

-only entry triggers an OAuth handshake
this server doesn't support. The extension lazy-connects by default — set
`MY_PI_MCP_EAGER_CONNECT=1`

to connect and discover tools at startup.

```
{
  "mcpServers": {
    "commonplace-personal": {
      "type": "http",
      "url": "http://your-server.your-tailnet.ts.net:8000/mcp/",
      "headers": { "Authorization": "Bearer YOUR_PERSONAL_TOKEN" }
    },
    "commonplace-client": {
      "type": "http",
      "url": "http://your-server.your-tailnet.ts.net:8001/mcp/",
      "headers": { "Authorization": "Bearer YOUR_CLIENT_TOKEN" }
    }
  }
}
```

Any device on the tailnet can use the same two endpoints — there is nothing per-client on the server. To add one:

- Join the device to the tailnet (
`tailscale up`

) and confirm it can reach the host (`tailscale ping your-server`

). - For Claude Code, run the two
`claude mcp add … /mcp/`

commands above (user scope). - For any MCP client, add both servers with
`"type": "http"`

pointing at`:8000/mcp/`

and`:8001/mcp/`

. - Nothing to change on the host — graphs and auth are shared; reads/writes from the new client land in the same two graphs.
- For HTTPS, expose via
`tailscale serve`

(above) and use the`https://…`

URLs instead.

Two things turn this from a memory *store* into a memory *system agents use well*:

**Per-tier ontology.** Each tier defines`graphiti.entity_types`

in its config (personal: Preference, Project, Person, Decision, …; client: Engagement, Stakeholder, Requirement, Risk, …). These type descriptions constrain extraction — the single biggest lever on graph quality, and they help the weak local model the most.**An agent protocol.** is the contract for any client (Claude Code, Pi): search before answering, write durable facts,`docs/memory-protocol.md`

**never cross tiers**(no confidential data on the hosted personal tier), and cite what you used. Install it as a skill or system prompt — without it, agents rarely call memory and the graph stays empty.

**Is it actually being used?** `scripts/graph_stats.sh`

shows whether *writes* are landing;
`scripts/mcp_activity.sh`

(and the Prometheus endpoint on `:9180`

) show whether agents are *reading*.
Seed an existing corpus with `scripts/ingest_markdown.py`

, pull token-budgeted context with
`scripts/recall.py`

, gate retrieval quality with `eval/run_eval.py`

, and review resolved
contradictions with `scripts/contradictions.sh`

. See [ docs/ROADMAP.md](/itsmeduncan/commonplace/blob/main/docs/ROADMAP.md) for what's
shipped vs. still open (a local reranker remains the notable deferral).

```
commonplace/
├── docker-compose.yml           # FalkorDB + 2 MCP instances + gateway, restart: unless-stopped
├── Dockerfile                   # commonplace-mcp:local — standalone image (digest-pinned) + patch
├── patch_transport_security.py  # build-time: allow remote Host headers (disable DNS-rebind guard)
├── gateway/
│   └── Caddyfile                # per-tier bearer auth + access logging + Prometheus metrics
├── config/
│   ├── personal.yaml            # instance A — Anthropic Haiku extraction + personal ontology
│   └── client.yaml              # instance B — local Ollama extraction + confidential ontology
├── scripts/
│   ├── commonplace              # operate CLI: `commonplace update` redeploys the stack
│   ├── graph_stats.sh           # write counts per tier   · mcp_activity.sh  # read counts (gateway log)
│   ├── recall.py                # token-budgeted recall    · contradictions.sh # superseded facts
│   ├── backup.sh / restore.sh   # FalkorDB dump + restore
│   └── ingest_markdown.py       # load a markdown corpus (notes/docs) into a tier
├── eval/
│   ├── queries.yaml             # retrieval eval cases (question → expected facts)
│   └── run_eval.py              # scores recall against a tier
├── docs/
│   ├── memory-protocol.md       # how agents should read/write memory (tier safety, cite-back)
│   └── ROADMAP.md               # hardening & maturity plan
├── .env.example                 # template; copy to .env on the host (gitignored)
├── .dockerignore                # keeps .env and other secrets out of the build context
├── CLAUDE.md                    # guidance for Claude Code working in this repo
├── LICENSE                      # MIT
└── README.md
```

Secrets live only in `.env`

on the host and are never committed. The repo is the source of
truth: edit a clone, push to your fork, `git pull`

on the host, `docker compose up -d`

.

[MIT](/itsmeduncan/commonplace/blob/main/LICENSE).