cd /news/ai-agents/commonplace-self-hosted-privacy-tier… Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-45706] src=github.com β†— pub= topic=ai-agents verified=true sentiment=↑ positive

Commonplace: Self-hosted, privacy-tiered memory for your AI agents

Commonplace launches a self-hosted, privacy-tiered memory system for AI agents, using a two-tier Graphiti knowledge graph that runs entirely on local hardware by default, with a personal tier optionally using hosted models for non-confidential data and a client-confidential tier that never leaves the machine.

read14 min views1 publishedJun 30, 2026
Commonplace: Self-hosted, privacy-tiered memory for your AI agents
Image: source

A self-hosted, two-tier Graphiti knowledge graph that MCP clients (for example Claude Code and Pi) read from and write to over a private Tailscale network. It's offline-first: by default every part β€” including the LLM that extracts your graph β€” runs on your own hardware, so nothing leaves the box.

It runs on a single always-on Linux host with Docker and a consumer NVIDIA GPU. Your laptops and other devices are pure clients β€” they host nothing.

Knowledge-graph ingestion uses an LLM to extract entities and relationships from text. That extraction is where your data would be exposed to a model β€” so by default commonplace

does it locally, on your GPU, for both tiers. The two tiers split memory by confidentiality and by whether you're allowed to trade locality for quality:

Tier Graph Extraction (default) Where it runs Use for
personal
commonplace_personal
mistral:7b-instruct-q4_0 (local)
the host's GPU your own notes, projects, life β€” optionally a hosted model for quality
client-confidential
commonplace_client
mistral:7b-instruct-q4_0 (local)
the host's GPU confidential / NDA material that must never leave the machine

The personal tier is local by default but may be pointed at a hosted model (e.g. Claude Haiku) for higher-quality graphs on non-confidential data β€” opt in via .env

(see Hosted upgrade? under Setup). The client tier is always local; that's the whole point of it.

Retrieval is cheap and private on both tiers. Search is embeddings + BM25 + graph traversal with no LLM in the query path. The GPU only ever does slow, asynchronous background extraction β€” query latency is never affected. Slow local extraction is therefore fine.

Both tiers share one embedder (Ollama nomic-embed-text

, 768-dim) and one FalkorDB holding two separate graphs, so the two memories stay isolated but the infrastructure stays simple.

flowchart TB
    CC["Claude Code<br/>(client)"]
    PI["Pi<br/>(client)"]
    TS{{"Tailscale<br/>MagicDNS Β· tailnet-only"}}
    ANT["Anthropic API<br/>Claude Haiku 4.5 Β· hosted"]

    CC --> TS
    PI --> TS

    subgraph HOST["your server β€” Docker"]
        direction TB
        GW["<b>gateway</b> :8000 / :8001<br/>per-tier auth Β· logging Β· metrics"]
        MP["<b>mcp-personal</b><br/>personal tier Β· internal"]
        MC["<b>mcp-client</b><br/>client-confidential Β· internal"]
        GW --> MP
        GW --> MC
        OL["<b>Ollama</b> :11434<br/>nomic-embed-text Β· mistral:7b<br/>local GPU"]
        subgraph FALKOR["FalkorDB :6379 Β· browser UI :3000"]
            direction LR
            GP[("commonplace_personal")]
            GC[("commonplace_client")]
        end
        MP -->|store| GP
        MC -->|store| GC
        MP -. embed .-> OL
        MC -. embed .-> OL
        MC -->|extract Β· local| OL
    end

    TS -->|Bearer token| GW
    MP -->|extract Β· hosted| ANT

    classDef ext fill:#fff3e0,stroke:#e67e22,color:#111;
    classDef tier fill:#e8f0fe,stroke:#4285f4,color:#111;
    class ANT ext
    class MP,MC tier

One FalkorDB, two graphs selected per-instance byFALKORDB_DATABASE

(commonplace_personal

vscommonplace_client

).Two Graphiti MCP instances(commonplace-mcp:local

, built fromzepai/knowledge-graph-mcp:standalone

β€” seeDockerfile

), HTTP transport, served at path(trailing slash)./mcp/

One shared Ollama embedder(nomic-embed-text

, 768-dim) used bybothinstances. Do not mix embedders β€” vectors from different embedders are not comparable.A gateway(Caddy) fronts both tiers: it owns the host ports, requires a** per-tier bearer token**(so a client with only the client token can't reach the personal tier), and emits access logs (audit) + Prometheus metrics. The MCP containers themselves are internal-only.

Replace

your-server.your-tailnet.ts.net

with your host's Tailscale MagicDNS name throughout (runtailscale status

on the host to find it).

| Tier | Host endpoint (tailnet) | Internal port | Graph (FALKORDB_DATABASE ) | LLM | SEMAPHORE_LIMIT | |---|---|---|---|---|---| | personal | http://your-server.your-tailnet.ts.net:8000/mcp/ | 8000 | commonplace_personal | mistral:7b… (local, default) | 1 | | client | http://your-server.your-tailnet.ts.net:8001/mcp/ | 8000 | commonplace_client | mistral:7b-instruct-q4_0 | 1 | | FalkorDB | 127.0.0.1:6379 (host-local only) | 6379 | both graphs | β€” | β€” | | FalkorDB UI | http://your-server.your-tailnet.ts.net:3000 | 3000 | browse either graph | β€” | β€” | | Metrics | 127.0.0.1:9180/metrics (host-local only) | 9180 | gateway (Prometheus) | β€” | β€” |

The personal/client endpoints require Authorization: Bearer <tier-token>

(set PERSONAL_TOKEN

/ CLIENT_TOKEN

in .env

). A request without the right token gets 401

.

On the host:

Docker with Compose v2.running on the host, serving the shared embedder and the local extraction model. The MCP containers reach it over HTTP β€” the GPU is used by Ollama, not by the containers, so no GPU passthrough into Docker is required. A consumer NVIDIA GPU with ~8 GB VRAM runsOllamamistral:7b-instruct-q4_0

comfortably; CPU-only works but local extraction is slow.β€” the MCP endpoints are served over the tailnet, not the public internet.TailscaleNo API keys required. Both tiers extract locally by default. AnAnthropic API key is neededonlyif you opt the personal tier into a hosted model (see*Hosted upgrade?*below).

On each client (laptop, etc.): Tailscale, plus an MCP-capable client (Claude Code, Pi, …).

Run on the host, from a clone of this repo (e.g. ~/commonplace

):

ollama pull nomic-embed-text
ollama pull mistral:7b-instruct-q4_0

cp .env.example .env

docker compose up -d
docker compose ps        # all services should report healthy

Then point a client at the two endpoints β€” see Client configuration.

Hosted upgrade?Everything is local by default. To point thepersonaltier at a hosted model for higher-quality graphs (non-confidential data only), set in.env

:PERSONAL_LLM_PROVIDER=anthropic

,PERSONAL_LLM_MODEL=claude-haiku-4-5

,PERSONAL_SEMAPHORE_LIMIT=5

, andANTHROPIC_API_KEY=…

. The client tier stays local regardless.

Upgrading from a pre-gateway deploy?AddPERSONAL_TOKEN

/CLIENT_TOKEN

to.env

, thendocker compose up -d --build --force-recreate

(the MCP tiers move behind the gateway and the ontology change needs a recreate). Re-add each client with itsAuthorization: Bearer

header β€” existing token-less clients will start getting401

.

These are the landmines specific to the current (2026) Graphiti MCP server. Several contradict older docs.

There is no To use Ollama you setopenai_generic

provider string.provider: "openai"

and pointapi_url

at a non-OpenAI URL; the server then auto-selects itsOpenAIGenericClient

internally. That generic client is what avoids OpenAI's betaresponses.parse()

(which Ollama does not implement). Settingprovider: "openai_generic"

is invalid.There is no The MCP server has a singlesmall_model

setting.llm.model

. On the openai path it uses that same model for the "small" slot too. The infamousgpt-4.1-mini

is only a fallback used whenmodel

isNone

β€” pinningllm.model

is enough to never hit it., andjson_schema

structured output is always on for the local path and cannot be disabledβ€” retries are built-in (tenacity, 4 attempts). There is no config knob for either. If a small local model produces invalid JSON, the only lever is a more capable model.instructor

is not used thereOllama must be reachable from inside the containers. Ollama runs on thehost, so each MCP service needsextra_hosts: ["host.docker.internal:host-gateway"]

and anapi_url

ofhttp://host.docker.internal:11434/v1

. Ollama must listen on0.0.0.0:11434

(it does by default).Two graphs in one FalkorDB = two instances with the sameFALKORDB_DATABASE

selects the graph;group_id

does not.FALKORDB_URI

and differentFALKORDB_DATABASE

.group_id

only namespaces nodeswithina graph.FalkorDB host/port are parsed fromβ€”FALKORDB_URI

FALKORDB_HOST

/FALKORDB_PORT

are ignored. The only env overrides read areFALKORDB_URI

andFALKORDB_PASSWORD

.FalkorDB password is set via, an env var β€”REDIS_ARGS=--requirepass …

notby overriding the containercommand

(that would stop the FalkorDB module from ).Use the:standalone

image, not:latest

.zepai/knowledge-graph-mcp:latest

bundles its own FalkorDB;:standalone

expects an external one β€” required to share a single FalkorDB across two instances.The MCP path has a trailing slash:(FastMCP default; not configurable)./mcp/

Anthropic model id: use the bare alias Theclaude-haiku-4-5

, notclaude-haiku-4-5-latest

.-latest

suffix is an OpenAI-ism; the Anthropic API 404s on it (not_found_error: model

). The bare alias resolves to the current dated snapshot (claude-haiku-4-5-20251001

).The Anthropic provider needs an explicit numeric graphiti passesllm.temperature

.temperature=config.temperature

; with none set it sendsnull

and the API 400s (temperature: Input should be a valid number

), so every personal-tier episode queues but never processes. The OpenAI/Ollama generic client toleratesnull

, so this bites only the Anthropic tier. Set e.g.temperature: 0.0

.The:standalone

image ships WITHOUT theanthropic

SDK.provider: anthropic

then fails at startup β€” "Anthropic client not available in current graphiti-core version" (the factory'sHAS_ANTHROPIC

is False becauseimport anthropic

raises). The bundledDockerfile

adds it (uv pip install anthropic

).graphiti-core builds a default OpenAI reranker at init that demandsOPENAI_API_KEY

even though the search path usesNODE_HYBRID_SEARCH_RRF

(no cross-encoder). Give each tier a dummyOPENAI_API_KEY

so it can construct; pointOPENAI_BASE_URL

at Ollama so even an accidental call stays on-box. In practice it is never called.FastMCP rejects non-localhost Host headers with HTTP 421 "Invalid Host header". It auto-enables DNS-rebinding protection with a localhost-only allow-list at construction and passes that object explicitly into its pydantic Settings, so theFASTMCP_…

env vars cannot override it (init kwargs beat env). The bundledpatch_transport_security.py

(run in the Dockerfile) disables the protection β€” safe on a tailnet, where the network is the trust boundary and clients are agents, not browsers. To tighten, set explicitallowed_hosts

instead.The container env var for the OpenAI-compatible base URL is(graphiti's config expansion), notOPENAI_API_URL

OPENAI_BASE_URL

. Note the reranker (#13) is the opposite β€” it reads the OpenAI SDK'sOPENAI_BASE_URL

. Two different names for two different clients.

Run on the host, from the repo directory (e.g. ~/commonplace

).

Redeploy in one command β€” scripts/commonplace

wraps the pull β†’ rebuild β†’ recreate flow (symlink it onto your PATH

, e.g. ln -sf "$PWD/scripts/commonplace" ~/.local/bin/commonplace

):

commonplace update           # sync repo, rebuild image, recreate config-sensitive services
commonplace update --reset   # same, but hard-reset to origin/main (after a force-push)
commonplace status           # service health + graph counts

The underlying compose commands, if you'd rather run them by hand:

docker compose up -d

docker compose ps
docker compose logs -f mcp-personal     # or mcp-client, falkordb

docker compose up -d --force-recreate mcp-client

docker compose up -d --build

docker compose stop
docker compose start

docker compose down
docker compose down -v

Quick MCP health check (from a client, over the tailnet or LAN). Without a token you get 401

(auth working); with the right tier token you get 307

:

curl -s -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $PERSONAL_TOKEN" \
  http://your-server.your-tailnet.ts.net:8000/mcp/
curl -s -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $CLIENT_TOKEN" \
  http://your-server.your-tailnet.ts.net:8001/mcp/

./scripts/graph_stats.sh        # writes landing per tier
./scripts/mcp_activity.sh       # reads/writes per tier from the gateway log

FalkorDB persists to the falkordb_data

volume β€” mounted at its actual data dir (/var/lib/falkordb/data

), with AOF enabled (--appendonly yes

), so writes are durable to ~1s and survive container recreates. Back up / restore the whole data dir (RDB + AOF) with the scripts:

./scripts/backup.sh                                       # -> ./backups/falkordb-<stamp>.tar.gz
./scripts/restore.sh ./backups/falkordb-<stamp>.tar.gz    # overwrites live data (prompts to confirm)

Both read FALKORDB_PASSWORD

from .env

. backup.sh

asks the server for its data dir, so it keeps working even if the path changes.

Earlier revisions mounted the volume at

/data

while FalkorDB wrote to/var/lib/falkordb/data

on the ephemeral container layer β€” so data was lost on every--force-recreate

. The mount path is now fixed; redeploy withcommonplace update

to apply it.

Default: MagicDNS + port. The gateway binds:8000

/:8001

on the host and is reached over the tailnet athttp://your-server.your-tailnet.ts.net:8000/mcp/

and:8001/mcp/

. This is tailnet-reachable (and LAN-reachable) butnot public β€” do not port-forward these on your router.Auth. Every request needsAuthorization: Bearer <tier-token>

; the gateway 401s otherwise. SeparatePERSONAL_TOKEN

/CLIENT_TOKEN

give each client only the tiers it should touch.FalkorDB(host-local) β€” never on the tailnet.:6379

and metrics:9180

bind to127.0.0.1

onlyKeep the host single-homed. The host's primary interface should hold exactly one IPv4. If a second address appears (e.g. a static IPplusa DHCP lease), Tailscale can advertise two WireGuard endpoints and the tunnel flaps, whichblack-holes TCP over MagicDNS while the LAN and(disco pings roam across endpoints; real TCP does not). On Ubuntu this most often comes from cloud-init re-enabling DHCP β€” disable its network management (tailscale ping

still appear to workecho 'network: {config: disabled}' | sudo tee /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

). Symptom to watch for:ip -brief addr show <iface>

listing more than one address on your LAN subnet.HTTPS upgrade (optional). To serve the MCP endpoints as tailnet-only HTTPS names instead of raw ports:then point clients at

tailscale serve --bg --https=8443 http://localhost:8000   # personal
tailscale serve --bg --https=8444 http://localhost:8001   # client

https://your-server.your-tailnet.ts.net:8443/mcp/

etc. MagicDNS:port is the simpler default and is what the client config below uses.

Replace

your-server.your-tailnet.ts.net

with your host's Tailscale MagicDNS name (tailscale status

). The identical ports/paths are also served on the host's LAN IP, which is a handy fallback if MagicDNS is ever unreachable.

Pass the per-tier bearer token with --header

. Give a client only the tiers it should reach (e.g. omit the personal server on a machine that handles confidential work):

claude mcp add --scope user --transport http commonplace-personal http://your-server.your-tailnet.ts.net:8000/mcp/ \
  --header "Authorization: Bearer $PERSONAL_TOKEN"
claude mcp add --scope user --transport http commonplace-client   http://your-server.your-tailnet.ts.net:8001/mcp/ \
  --header "Authorization: Bearer $CLIENT_TOKEN"
claude mcp list   # both should report βœ“ Connected

(New servers load on the next Claude Code start.)

Pi has no native MCP β€” add the community bridge, then a global mcp.json

:

pi install npm:@spences10/pi-mcp     # records the bridge in settings.json

Each server entry must include "type": "http"

; a url

-only entry triggers an OAuth handshake this server doesn't support. The extension lazy-connects by default β€” set MY_PI_MCP_EAGER_CONNECT=1

to connect and discover tools at startup.

{
  "mcpServers": {
    "commonplace-personal": {
      "type": "http",
      "url": "http://your-server.your-tailnet.ts.net:8000/mcp/",
      "headers": { "Authorization": "Bearer YOUR_PERSONAL_TOKEN" }
    },
    "commonplace-client": {
      "type": "http",
      "url": "http://your-server.your-tailnet.ts.net:8001/mcp/",
      "headers": { "Authorization": "Bearer YOUR_CLIENT_TOKEN" }
    }
  }
}

Any device on the tailnet can use the same two endpoints β€” there is nothing per-client on the server. To add one:

  • Join the device to the tailnet ( tailscale up

) and confirm it can reach the host (tailscale ping your-server

). - For Claude Code, run the two claude mcp add … /mcp/

commands above (user scope). - For any MCP client, add both servers with "type": "http"

pointing at:8000/mcp/

and:8001/mcp/

. - Nothing to change on the host β€” graphs and auth are shared; reads/writes from the new client land in the same two graphs.

  • For HTTPS, expose via tailscale serve

(above) and use thehttps://…

URLs instead.

Two things turn this from a memory store into a memory system agents use well:

Per-tier ontology. Each tier definesgraphiti.entity_types

in its config (personal: Preference, Project, Person, Decision, …; client: Engagement, Stakeholder, Requirement, Risk, …). These type descriptions constrain extraction β€” the single biggest lever on graph quality, and they help the weak local model the most.An agent protocol. is the contract for any client (Claude Code, Pi): search before answering, write durable facts,docs/memory-protocol.md

never cross tiers(no confidential data on the hosted personal tier), and cite what you used. Install it as a skill or system prompt β€” without it, agents rarely call memory and the graph stays empty.

Is it actually being used? scripts/graph_stats.sh

shows whether writes are landing; scripts/mcp_activity.sh

(and the Prometheus endpoint on :9180

) show whether agents are reading. Seed an existing corpus with scripts/ingest_markdown.py

, pull token-budgeted context with scripts/recall.py

, gate retrieval quality with eval/run_eval.py

, and review resolved contradictions with scripts/contradictions.sh

. See docs/ROADMAP.md for what's shipped vs. still open (a local reranker remains the notable deferral).

commonplace/
β”œβ”€β”€ docker-compose.yml           # FalkorDB + 2 MCP instances + gateway, restart: unless-stopped
β”œβ”€β”€ Dockerfile                   # commonplace-mcp:local β€” standalone image (digest-pinned) + patch
β”œβ”€β”€ patch_transport_security.py  # build-time: allow remote Host headers (disable DNS-rebind guard)
β”œβ”€β”€ gateway/
β”‚   └── Caddyfile                # per-tier bearer auth + access logging + Prometheus metrics
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ personal.yaml            # instance A β€” Anthropic Haiku extraction + personal ontology
β”‚   └── client.yaml              # instance B β€” local Ollama extraction + confidential ontology
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ commonplace              # operate CLI: `commonplace update` redeploys the stack
β”‚   β”œβ”€β”€ graph_stats.sh           # write counts per tier   Β· mcp_activity.sh  # read counts (gateway log)
β”‚   β”œβ”€β”€ recall.py                # token-budgeted recall    Β· contradictions.sh # superseded facts
β”‚   β”œβ”€β”€ backup.sh / restore.sh   # FalkorDB dump + restore
β”‚   └── ingest_markdown.py       # load a markdown corpus (notes/docs) into a tier
β”œβ”€β”€ eval/
β”‚   β”œβ”€β”€ queries.yaml             # retrieval eval cases (question β†’ expected facts)
β”‚   └── run_eval.py              # scores recall against a tier
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ memory-protocol.md       # how agents should read/write memory (tier safety, cite-back)
β”‚   └── ROADMAP.md               # hardening & maturity plan
β”œβ”€β”€ .env.example                 # template; copy to .env on the host (gitignored)
β”œβ”€β”€ .dockerignore                # keeps .env and other secrets out of the build context
β”œβ”€β”€ CLAUDE.md                    # guidance for Claude Code working in this repo
β”œβ”€β”€ LICENSE                      # MIT
└── README.md

Secrets live only in .env

on the host and are never committed. The repo is the source of truth: edit a clone, push to your fork, git pull

on the host, docker compose up -d

.

MIT.

── more in #ai-agents 4 stories Β· sorted by recency
── more on @commonplace 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/commonplace-self-hos…] indexed:0 read:14min 2026-06-30 Β· β€”