# Claude-tinderbox: Search your Claude.ai conversation history locally via MCP

> Source: <https://github.com/luckyrmp/tinderbox-archive>
> Published: 2026-06-06 03:39:45+00:00

A personal claude.ai conversation archive — schema, ingest, embeddings, hybrid retrieval, and an MCP server that lets any Claude session search your own past conversations.

Status:working end-to-end. Used daily by the author. Not packaged for general consumption — seeCaveatsbelow.

You export your conversations from claude.ai, drop the ZIP into a watched directory, and within ~15 minutes your full archive is searchable from any Claude session via two MCP tools:

`tinderbox_search(query, limit=10)`

— hybrid semantic + full-text retrieval over every message and artifact`tinderbox_get_conversation(export_id, max_messages=50)`

— pull the full thread of any conversation surfaced by a search

Everything is local-ish — you bring your own Supabase free-tier project, your own Ollama install for embeddings, and a Mac (this is targeted at Apple Silicon).

The current author's archive: 676 conversations, 10,653 messages, 172 artifacts, 10,731 mxbai-embed-large vectors. Hybrid retrieval hits 68.7% top-1 / 88.7% top-10 on a frozen 150-query QA set generated by Haiku from the corpus itself (re-runs weekly via launchd).

Tinderbox stores statements, not facts. Every retrieval response renders the provenance inline — never

`"X is true"`

, always`"on [date], in [conversation], [participant] said [content]"`

. The corpus answerswhat was said when, by whom; neverwhat is true.

That's [design principle #1 from the schema doc](/luckyrmp/tinderbox-archive/blob/main/docs/STAGE_1_SCHEMA_PROPOSAL.md). Memorial archive, not extraction pipeline. Forward-linked when superseded, never backward-edited.

For context-window reasons it's also genuinely useful: a Claude session can look up its own past reasoning instead of re-deriving it.

Postgres (Supabase) holds 12 tables under a `tinderbox`

schema — schema versioning, ingest runs, conversations, messages, artifacts, attachments, embeddings (`vector(1024)`

+ hnsw), enrichment, named_instances, query log, and a frozen QA test set. A Python parser stream-reads claude.ai export ZIPs and upserts everything idempotently. An embed worker batches messages and artifacts through Ollama (`mxbai-embed-large`

, 1024-dim) and writes vectors back. A server-side Postgres function (`tinderbox.hybrid_search`

) ranks results by `(1 - cosine_distance) + 0.5 * ts_rank_cd`

. A small from-scratch JSON-RPC 2.0 MCP server exposes two tools over stdio. Three launchd daemons run the whole thing on a schedule: inbox watcher (15min), QA eval (Sundays 03:00), staleness alerter (daily 09:00 with cooldown + debounce).

**macOS** with Apple Silicon, Python 3.14 (or 3.12+ probably — author runs 3.14.3_1)**Supabase free-tier project**— $0/month for this scale. Optional[$4/mo IPv4 add-on](https://supabase.com/pricing)for proper RLS scoping (stage 5b).** Ollama**running locally with`mxbai-embed-large`

pulled (`ollama pull mxbai-embed-large`

)- A claude.ai data export ZIP (Settings → Account → Export Data)

```
# 1. Clone
git clone <this repo> ~/tinderbox && cd ~/tinderbox

# 2. Create your Supabase project, get URL + service-role key
# 3. Render config + plists for your $HOME / $USER
./parser/scripts/setup.sh

# 4. Create your env file (path is configurable via TINDERBOX_ENV_FILE)
cp .env.example ~/.tinderbox.env
# … and fill in SUPABASE_URL, SUPABASE_SERVICE_KEY, etc.

# 5. Apply the migrations to your Supabase project
# (each migration file is plain SQL — run them in order via the Supabase
# SQL editor, or via psql, or via your tool of choice)
ls migrations/

# 6. Pull the embedding model
ollama pull mxbai-embed-large

# 7. Set up the venv (the project uses a .pth bridge to share deps from
# other venvs on the author's machine; you'll likely want to install
# fresh — see parser/pyproject.toml for the deps list)
python3 -m venv parser/venv
parser/venv/bin/pip install supabase python-dotenv click httpx pydantic anthropic

# 8. Drop your export ZIP into the inbox and watch it ingest
mkdir -p inbox
mv ~/Downloads/data-*.zip inbox/
parser/venv/bin/python -m tinderbox.cli scan-inbox

# 9. Embed everything
parser/venv/bin/python -m tinderbox.cli embed

# 10. Try a search
parser/venv/bin/python -m tinderbox.cli search "your test query"

# 11. Wire to Claude Code / Desktop — see docs/MCP_INSTALL.md
```

Activate the daemons (optional but recommended):

```
launchctl load ~/Library/LaunchAgents/com.$USER.tinderbox.scan.plist
launchctl load ~/Library/LaunchAgents/com.$USER.tinderbox.qa.plist
launchctl load ~/Library/LaunchAgents/com.$USER.tinderbox.staleness.plist
.
├── migrations/                    # Numbered SQL — apply to your Supabase project in order
├── parser/
│   ├── tinderbox/                 # Python package
│   │   ├── parser/                # ZIP → typed records (streaming JSON, content-block parsing, artifact versioning)
│   │   ├── ingest/                # Records → DB (upsert, retry, tombstone sweep, mass-tombstone canary)
│   │   ├── embed/                 # mxbai-embed-large via Ollama, batched, idempotent, per-row fallback
│   │   ├── search/                # Hybrid retrieval + query logging
│   │   ├── qa/                    # Frozen-query-set eval (Haiku-generated, scheduled)
│   │   ├── mcp/                   # Minimal JSON-RPC 2.0 MCP server (no SDK dep)
│   │   ├── staleness.py           # Daily check w/ cooldown + debounce
│   │   ├── cli.py                 # tinderbox <command>
│   │   └── ...
│   ├── tests/
│   ├── scripts/                   # Setup, MCP launcher, surgical recovery
│   └── launchd/templates/         # Plist templates filled by setup.sh
├── docs/
│   ├── STAGE_1_SCHEMA_PROPOSAL.md       # Design principles + table-by-table rationale
│   ├── STAGE_1_COMPLETION_REPORT.md     # Bugs found + fixed during ingest
│   ├── STAGE_2_COMPLETION_REPORT.md     # Embed + hybrid retrieval shipped
│   ├── STAGE_5_COMPLETION_REPORT.md     # MCP server + IPv4 add-on
│   ├── ACCEPTED_ADVISORIES.md           # Supabase advisor findings + accepted/applied/deferred
│   ├── MCP_INSTALL.md                   # Claude Code / Desktop config snippets
│   └── STAGE_2_HANDOFF.md               # Inter-session handback (historical)
└── README.md (this file)
```

**Hardcoded paths.** The author runs everything under`~/tinderbox/`

.`parser/scripts/setup.sh`

renders the launchd plists for your`$HOME`

/`$USER`

but the Python defaults still assume that root.`TINDERBOX_*`

env vars override every default — set them up in your env file.**No tests for end-to-end MCP from a real client.** The smoke test in`parser/tests/test_mcp_smoke.py`

spawns the server as a subprocess and exchanges minimal protocol. Real validation is "does Claude Code surface the tool" (verified) and "does the tool return useful results" (eyeballed).The MCP server still authenticates via`service_role`

bypass for stage-1.`service_role`

(RLS bypassed). Documented in`docs/STAGE_5_COMPLETION_REPORT.md`

— fine until you start differentiating privacy classes; then stage 5b auth swap to`tinderbox_owner`

direct connection becomes urgent.**No SQLite option.** Supabase only. The free tier easily handles this scale; if you want 100% local, you'll need to translate the schema and rewrite the DB layer (~5-6 hrs of work — author chose not to).**macOS only.** launchd schedules, paths, and the`.pth`

venv bridge are macOS conventions. Linux would need systemd units and a different venv approach. Not difficult, just not done.**Author's archive shape baked into a few decisions.** The 5 MB`RAW_CONTENT_BYTE_LIMIT`

was chosen because the author's largest message is 57 MB. The stratified sampler in`qa/sample.py`

uses bucket sizes (long ≥20 msgs, short 3-10 msgs) tuned to the author's distribution. Both are easy to retune.

Pick one. The author is open to anything that lets people fork and adapt without obligation.

Built collaboratively across many Claude sessions over a few days in April 2026. Schema design → parser → embed → search → QA → MCP server, mostly autonomous, with the human (Lucky) stepping in at architecture decisions and pushing back when something didn't smell right.

The system can now query the very conversations that built it.
