{"slug": "econd-brain-mcp", "title": "Econd-Brain-MCP", "summary": "Second-brain-mcp is a self-maintaining personal knowledge database that uses MCP, DuckDB, and biological memory models to automatically link, compress, and index saved papers, notes, and figures. The tool fetches full-text articles, OCRs figures with Claude Vision, and applies an Ebbinghaus forgetting curve to compress stale notes by 60–90 percent while keeping all content searchable by semantic query or figure content.", "body_md": "**A self-maintaining personal knowledge database — powered by MCP, DuckDB, and biological memory models.**\n\nFor anyone who saves more papers, notes, and figures than they could ever re-read.second-brain turns everything you capture into a database thatmaintains itself— auto-linking related notes, compressing what you stop reading, and keeping every figure searchable by its content. What you saved a year ago is still one query away, at a fraction of the token cost.\n\n| Problem | Solution |\n|---|---|\n| 📄 You save dozens of papers but can never find the right figure | `search_figures(\"UMAP melanocyte\")` — returns the exact panel, across every paper you've saved |\n| 📑 arXiv gives you the abstract; you need the full paper | Auto-upgrades `/abs/` → `/html/` — fetches the complete paper with all sections, not just the abstract |\n| 🗂 Notes pile up; older ones never get cleaned up | Vault Sleep: low-access notes compress automatically every Sunday while you sleep (60–90% token reduction) |\n| 🔗 New notes stay isolated; you forget what's connected | Auto-wikilinks: every saved note is automatically linked to semantically related notes already in your vault |\n| 🔎 Semantic search needs a cloud API or Docker stack | Self-hosted `nomic-embed-text` via llama-server; BM25 fallback when offline |\n| 🔒 Every AI memory tool locks you into their format | Pure Markdown vault — sync with Google Drive, iCloud, or git; switch agents anytime |\n| 🖼 Figure context is lost when you read a paper | Every figure is downloaded, OCR'd by Claude Vision, and stored in DuckDB — searchable by gene name, p-value, axis label |\n\n```\nsave_article(\"https://arxiv.org/abs/2405.01234\")\n  ↓\n• /abs/ auto-upgraded to /html/ — full paper, not just abstract\n• Full text converted to Markdown\n• All figures downloaded + OCR'd by Claude Vision\n• Semantic embeddings computed\n• Auto-linked to related notes already in your vault   ← auto-wikilinks\n• Stored in 30-resources/ — queryable immediately\n\nsearch_figures(\"UMAP cluster batch correction\")\n  ↓\n• Returns the exact figure from the exact paper\n• Works across your entire saved literature library\nflowchart LR\n    subgraph input[\"📥 Any Content Source\"]\n        A1[\"arXiv / PubMed paper\"]\n        A2[\"Web article / blog\"]\n        A3[\"Local PDF / DOCX\"]\n        A4[\"Personal note\"]\n    end\n\n    subgraph core[\"⚙️ second-brain-mcp\"]\n        B1[\"Markdown note<br/>30-resources/\"]\n        B2[\"Figure OCR<br/>+ VLM description\"]\n        B3[\"Semantic embedding<br/>+ auto-wikilinks\"]\n        B4[\"Ebbinghaus score<br/>ranking\"]\n        B5[\"PNG snapshots<br/>60–90% token reduction\"]\n    end\n\n    subgraph query[\"🔍 Queryable Knowledge\"]\n        C1[\"search_figures<br/>'UMAP melanocyte'\"]\n        C2[\"search_notes<br/>'batch correction scRNA'\"]\n        C3[\"get_context<br/>top-20 relevant notes\"]\n    end\n\n    input --> core\n    B1 --> B2\n    B1 --> B3\n    B3 --> B4\n    B4 --> B5\n    B2 --> C1\n    B3 --> C2\n    B4 --> C3\n```\n\n**Eight things most self-hosted memory tools can't do — combined in one:**\n\n| Most memory tools… | second-brain |\n|---|---|\n| Save a link or PDF, then leave you to read and tag it | 🔬 One command builds the database — `save_article` fetches any URL/PDF, converts to Markdown, downloads & OCRs every figure with Claude Vision, then semantic-indexes it |\nStore the arXiv abstract you pasted |\n📑 Full text, not abstracts — `/abs/` URLs auto-upgrade to `/html/` for the complete paper: methods, results, discussion |\n| Leave new notes isolated until you tag them | 🔗 The knowledge graph builds itself — every note is auto-linked to semantically related notes already in your vault |\n| Cost the same whether a note is read daily or never | 🧠 Memory that forgets like a brain — Ebbinghaus score ranks by recency × frequency; stale notes compress while you sleep |\nSearch documents, not what's inside the figures |\n🖼 Figure-level search across your whole library — `search_figures(\"p < 0.001\")` returns the exact panel from the exact paper |\n| Forget your project decisions between sessions | 📋 The AI learns your rules — hot notes auto-extract constraints into `memory/rules.md` , injected at every session start |\n| Grow more expensive as the vault grows | 📉 Token cost shrinks with age — PNG snapshots replace old text at 60–90% compression; frequently-read papers stay full-fidelity |\n| Lock you into their database format | 🔓 Zero lock-in — pure Markdown, any MCP agent, sync via any cloud drive or git |\n\nEvery project you work on can be resumed in a new session with full context — no re-explaining, no lost progress.\n\n``` php\nflowchart LR\n    A[\"🟢 Session Start<br/>get_context()\"] --> B[\"AI receives:<br/>• goals.md — current priorities<br/>• Top-20 recent notes<br/>• Extracted rules\"]\n    B --> C[\"Work on project<br/>new_note / search / read\"]\n    C --> D[\"🔴 Before ending session<br/>update_goals(...)\"]\n    D --> E[\"New session<br/>get_context() again\"]\n    E --> B\n```\n\n**End of session** — tell the agent to save state:\n\n```\nUpdate goals: currently working on the scRNA batch correction pipeline.\nCompleted: harmony integration. Blocked on: choosing n_components for PCA.\nNext session: start from the PCA parameter sweep in 20-areas/research/harmony-notes.md\n```\n\nThe agent calls `update_goals()`\n\nand optionally `new_note(\"project\", ...)`\n\nfor detailed progress.\n\n**Start of next session** — just say:\n\n```\nGet context and continue where we left off.\n```\n\nThe agent calls `get_context()`\n\nand immediately sees:\n\n`goals.md`\n\nwith the state you saved- The harmony-notes.md surfaced at the top (recently accessed, high Ebbinghaus score)\n- Rules auto-extracted from that note, e.g.:\n\n```\nRULE: use n_components=30 for this dataset — tested 20/30/50, 30 minimises batch effect without losing resolution\nRULE: exclude sample CRC_04 — library size outlier confirmed by QC\n```\n\nThese rules live in `memory/rules.md`\n\nand are injected at every `get_context()`\n\ncall — the AI carries your hard-won decisions forward automatically, without you having to repeat them.\n\n| What | Where | Always in context? |\n|---|---|---|\n| Current priorities / blocked items | `memory/goals.md` |\n✅ every session |\n| Project progress notes | `10-projects/` or `20-areas/` |\n✅ if recently accessed |\n| Decisions and rationale | `decisions/` |\nvia `get_decisions()` |\n| Extracted rules from notes | `memory/rules.md` |\n✅ every session |\n| Saved papers and figures | `30-resources/` |\nvia `search_notes/figures` |\n\nThis works across any project— bioinformatics analysis, coding, writing, research. Save state with one sentence at the end of a session; resume instantly at the start of the next.\n\n```\n# Resume a project from last session\nget_context()  # → goals + recent notes + rules loaded automatically\n\n# Find a specific figure panel across all saved papers\nsearch_figures(\"p < 0.001 UMAP cluster\")\n\n# Semantic search across all notes\nsearch_notes(\"single cell integration batch correction\")\n\n# Decision records for a specific project\nget_decisions(\"MyProject\")\n```\n\n| Biological Brain | This System |\n|---|---|\n| Hippocampal consolidation during sleep | Vault Sleep: weekly LLM-compression of old low-access notes |\n| Ebbinghaus forgetting curve | Score-based ranking: `access_count / ln(age_days)` |\n| Visual long-term memory | PNG snapshots — resolution degrades gracefully with age |\n| Associative recall | Semantic search + auto-generated `[[wikilinks]]` |\n| Sleep-dependent consolidation | launchd cron, runs Sunday 02:00 while you sleep |\n\nMemory that gets cheaper over time — unlike flat-file systems where old notes cost the same forever.\n\n```\nNote age →   fresh (0–3 mo)   3–6 months     6–12 months    1 year+\n             ──────────────   ──────────     ───────────    ───────\ntoken cost:  ██████████████   ██████         ████           ██\n             ~1,000 tokens    ~400 tokens    ~256 tokens    ~100 tokens\n                              ▼ 60%          ▼ 74%          ▼ 90%\n```\n\nTier assigned by\n\nscore × age(adaptive). Frequently-accessed notes stay full-text regardless of age.\n\nMeasured on Apple Silicon MacBook (20-rep average, BM25-only mode).\n\n```\nVault    BM25-only p50          Hybrid BM25+semantic p50\n──────   ─────────────────      ────────────────────────\n10 n     ████░░░░░   21 ms      ████████████   37 ms\n50 n     ██████░░░   25 ms      █████████████  39 ms\n100 n    ███████░░   27 ms      ██████████████ 45 ms\n```\n\n| Vault Size | BM25 p50 | Hybrid p50 | Recall@1 | Recall@5 | MRR |\n|---|---|---|---|---|---|\n| 10 notes | 21 ms | 37 ms | 30% | 60% | 0.42 |\n| 50 notes | 25 ms | 39 ms | 70% | 90% | 0.78 |\n| 100 notes | 27 ms | 45 ms | 70% | 80% | 0.73 |\n\nHybrid mode adds ~18 ms for embedding lookup. Both modes scale sub-linearly with vault size.\n\nRecall figures at this scale (10–100 notes) carry high sample variance — a single ambiguous query shifts Recall@1 by 10%. Treat them as directional, not as benchmarks against large corpora; the takeaway is that hybrid consistently beats BM25-only on relevance for a fixed query set.\n\n```\n┌─────────────────────────────────────────────────────┐\n│                    AI Agent Layer                    │\n│         Claude Code · Gemini CLI · Any MCP           │\n└──────────────────────┬──────────────────────────────┘\n                       │ MCP Protocol (19 tools)\n┌──────────────────────▼──────────────────────────────┐\n│               Layer 2 — MCP Server                   │\n│                    server.py                         │\n│   get_context · search_notes · save_article · …      │\n└──────┬───────────────┬────────────────┬─────────────┘\n       │               │                │\n┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐\n│  vault_sleep│ │  vault_db   │ │  figures    │\n│  compress   │ │  DuckDB FTS │ │  PNG snap   │\n│  Phase 3–9  │ │  + semantic │ │  OCR · VLM  │\n└──────┬──────┘ └──────┬──────┘ └─────────────┘\n       │               │\n┌──────▼───────────────▼──────────────────────────────┐\n│               Layer 0 — Markdown Vault               │\n│   00-inbox · 10-projects · 20-areas · 30-resources   │\n│   40-archive · decisions · memory · templates        │\n│         (syncs via Google Drive / iCloud / git)      │\n└─────────────────────────────────────────────────────┘\nEvery Sunday 02:00 (launchd, no interaction needed)\n        │\n        ▼\n sync_index + embeddings\n        │\n        ▼  age > 90d AND Ebbinghaus score ≤ 0.5\n ┌──────────────────────────────────────┐\n │         Adaptive Tier Selection      │\n │  score > 1.5  →  text  (keep full)  │  ← frequently-read: never compressed\n │  score > 0.8  →  large  ~400 tokens │\n │  score > 0.3  →  base   ~256 tokens │\n │  otherwise    →  small  ~100 tokens │\n └────────────────┬─────────────────────┘\n                  │\n  Gemini CLI → Claude CLI → naive   (auto-fallback, no LLM required)\n                  │\n    compressed → vault  /  original → 40-archive/  /  snapshot → .png\n```\n\n| Tool | Description |\n|---|---|\n`get_context` |\nSession start: goals + top-20 Ebbinghaus-ranked notes + auto-rules |\n`save_article` |\nFetch URL/PDF → Markdown + auto-extract figures |\n`search_notes` |\nHybrid BM25 + semantic search across all notes |\n`search_figures` |\nSearch figure OCR text / VLM descriptions |\n`extract_figures_for` |\nManually trigger figure extraction for a saved article |\n`read_note` |\nRead note + record access (updates Ebbinghaus score) |\n`read_note_as_image` |\nReturn PNG snapshot for token-efficient reading |\n`new_note` |\nCreate note with correct template and folder by type |\n`get_decisions` |\nList ADR decision records, optionally filtered by project |\n`update_goals` |\nUpdate `memory/goals.md` |\n`sync_index` |\nRebuild DuckDB index from vault files |\n`index_stats` |\nShow note counts by type |\n`vault_sleep` |\nCompress old low-activity notes (dry_run=True by default) |\n`sleep_status` |\nShow compression candidates without acting |\n`snapshot_note_tool` |\nRender note to PNG at chosen resolution tier |\n`extract_rules_tool` |\nExtract L3 rules from frequently-accessed notes |\n`consolidate_tool` |\nMerge semantically similar notes into one abstract note |\n`update_links_tool` |\nRefresh auto-generated `[[wikilinks]]` |\n`prune_archive_tool` |\nDelete archived originals that have a PNG snapshot |\n\n```\ntests/test_figures.py      19 passed   (OCR, snapshots, VLM)\ntests/test_server.py       13 passed   (MCP tools, path safety)\ntests/test_vault_db.py     39 passed   (FTS, semantic search, embeddings)\ntests/test_vault_sleep.py  44 passed   (compression, consolidation, rules, prune)\n────────────────────────────────────────\n115 passed in 3.37s\n```\n\n| Dependency | Required | Notes |\n|---|---|---|\n| Python 3.11+ | ✅ | |\n|\n\n[Playwright](https://playwright.dev/)[llama-server](https://github.com/ggerganov/llama.cpp)[nomic-embed-text-v1.5.Q8_0.gguf](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF)`ANTHROPIC_API_KEY`\n\n```\npip install mcp-second-brain\nplaywright install chromium\nmkdir -p ~/second-brain/{00-inbox,10-projects,20-areas,30-resources,40-archive,decisions,memory,templates}\n```\n\nOption A: **Claude Code (CLI)**\n\n```\nclaude mcp add --scope user second-brain \\\n  --env SECOND_BRAIN_PATH=~/second-brain \\\n  -- python -m mcp_second_brain\n```\n\nOption B: **Claude Desktop** — add to `~/Library/Application Support/Claude/claude_desktop_config.json`\n\n:\n\n```\n{\n  \"mcpServers\": {\n    \"second-brain\": {\n      \"command\": \"python\",\n      \"args\": [\"-m\", \"mcp_second_brain\"],\n      \"env\": { \"SECOND_BRAIN_PATH\": \"/path/to/your/vault\" }\n    }\n  }\n}\n```\n\nIn Claude Code or Claude Desktop, tell the agent:\n\n```\nRun sync_index to build the initial index.\ngit clone https://github.com/ddmanyes/second-brain-mcp\ncd second-brain-mcp\nuv sync\nuv run playwright install chromium\n```\n\nThen register with Claude Code:\n\n```\nclaude mcp add --scope user second-brain \\\n  --env SECOND_BRAIN_PATH=~/second-brain \\\n  -- uv run --project /path/to/second-brain-mcp python server.py\n```\n\n| Variable | Default | Description |\n|---|---|---|\n`SECOND_BRAIN_PATH` |\n`~/second-brain` |\nPath to your vault directory |\n`EMBED_URL` |\n`http://localhost:11435/v1/embeddings` |\nEmbedding server endpoint |\n`EMBED_MODEL` |\n`nomic-embed-text` |\nEmbedding model name |\n`EMBED_PORT` |\n`11435` |\nllama-server port |\n\n```\n# Embedding server — always on, restarts on crash\ncp examples/launchd/com.yourname.llama-embed.plist ~/Library/LaunchAgents/\n# Edit paths inside the file, then:\nlaunchctl load ~/Library/LaunchAgents/com.yourname.llama-embed.plist\n\n# Weekly vault maintenance — every Sunday 02:00\ncp examples/launchd/com.yourname.vault-sleep.plist ~/Library/LaunchAgents/\nlaunchctl load ~/Library/LaunchAgents/com.yourname.vault-sleep.plist\n```\n\n| Symptom | Likely cause | Fix |\n|---|---|---|\n| Semantic search silently falls back to BM25 | llama-server not running on `EMBED_PORT` |\nStart the embedding server (see\n`curl localhost:11435/v1/embeddings` |\n\n`read_note_as_image`\n\n/ snapshots fail`uv run playwright install chromium`\n\n`vault_sleep`\n\nnever compresses anything`ANTHROPIC_API_KEY`\n\n→ naive fallback, or no eligible notes`ANTHROPIC_API_KEY`\n\n; remember only notes >90 days old with Ebbinghaus score ≤ 0.5 are candidates (`sleep_status`\n\nshows them)`sync_index`\n\nonce after install (and after bulk file changes)`SECOND_BRAIN_PATH`\n\nunset or wrong`env`\n\nblock; defaults to `~/second-brain`\n\n```\nvault/\n├── 00-inbox/          # Unprocessed captures — clear daily\n├── 10-projects/       # Active projects\n├── 20-areas/\n│   ├── research/      # Ongoing research domains\n│   ├── coding/        # Dev tools and workflows\n│   └── consolidated/  # Auto-merged similar notes (Phase 8)\n├── 30-resources/      # ← Papers and articles (save_article writes here)\n├── 40-archive/        # Compressed originals (auto-managed by vault_sleep)\n├── decisions/         # Architecture Decision Records (ADR format)\n├── memory/\n│   ├── goals.md       # Current priorities — injected at every session start\n│   ├── index.md       # Vault map\n│   └── rules.md       # Auto-extracted L3 rules — injected at every session start\n└── templates/         # Note templates (note, decision, project, research)\nuv run pytest tests/ -v\nuv run python benchmark.py --quick --markdown   # search latency + accuracy report\n```\n\n| Paper | Where Used |\n|---|---|\n|\n\n[Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents (2026)](https://arxiv.org/abs/2604.15877)[DeepSeek-OCR: Contexts Optical Compression (2025)](https://arxiv.org/abs/2510.18234)[MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning (2026)](https://arxiv.org/abs/2601.21468)[Active Context Compression: Autonomous Memory Management in LLM Agents (2026)](https://arxiv.org/abs/2601.07190)[SimpleMem: Efficient Lifelong Memory for LLM Agents (2026)](https://arxiv.org/abs/2601.02553)[Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers (2026)](https://arxiv.org/abs/2603.07670)- Ebbinghaus, H. (1885).\n*Über das Gedächtnis*. — forgetting curve; basis for`access_count / ln(age_days + 1)`\n\n[Stickgold, R. (2005).](https://www.nature.com/articles/nature04286)— sleep-dependent memory consolidation*Nature*, 437, 1272–1278.\n\n[MarkItDown](https://github.com/microsoft/markitdown) · [DuckDB](https://duckdb.org) · [llama.cpp](https://github.com/ggerganov/llama.cpp) · [nomic-embed-text](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF) · [FastMCP](https://github.com/jlowin/fastmcp) · [Playwright](https://playwright.dev) · [Anthropic Claude API](https://docs.anthropic.com)\n\nPRs and Issues welcome. Please open an issue first to discuss significant changes.\n\nMIT License — © 2026 Chan Chi Ru. See [LICENSE](/ddmanyes/second-brain-mcp/blob/master/LICENSE).", "url": "https://wpnews.pro/news/econd-brain-mcp", "canonical_source": "https://github.com/ddmanyes/second-brain-mcp", "published_at": "2026-05-29 07:23:13+00:00", "updated_at": "2026-05-29 07:47:36.577532+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "ai-infrastructure", "machine-learning", "large-language-models"], "entities": ["MCP", "DuckDB", "llama-server", "nomic-embed-text", "Google Drive", "iCloud", "arXiv", "BM25"], "alternates": {"html": "https://wpnews.pro/news/econd-brain-mcp", "markdown": "https://wpnews.pro/news/econd-brain-mcp.md", "text": "https://wpnews.pro/news/econd-brain-mcp.txt", "jsonld": "https://wpnews.pro/news/econd-brain-mcp.jsonld"}}