Ucp-Local – Offline RAG for Claude Desktop, Cursor, and LM Studio

wpnews.pro

A local-first MCP server that grounds LLMs in your own files.

UCP indexes folders on your machine — notes, code, conversation exports — and exposes them to any MCP-compatible client (Claude Desktop, Cursor, LM Studio, and other local-agent runtimes) as a single tool: search_local_context

. Hybrid retrieval (BM25 + vector), tree-sitter-aware code chunking, full citations, content-hash embedding cache. Single binary. No telemetry. No cloud.

Paired with a local model in LM Studio (or Ollama via ucp-local ask

), the whole stack — indexing, embeddings, retrieval, and the chat model — runs fully offline. Works on a plane, in an air-gapped facility, or anywhere a cloud LLM isn't an option.

Conversation memory — make every past Claude chat searchable across every future session.

Air-gap RAG — local Ollama + local index, zero network traffic.

Quick start — install, index, ask, in under a minute.

If you are…	UCP gives you…
A Claude / Cursor / LM Studio power user
A searchable archive of every past AI conversation, callable from any future session as the `search_local_context` tool.
A software engineer
Code + private docs + sibling repos + past Claude chats unified under one MCP tool — surfaced inside Cursor or Claude Code alongside their native indexers.
A researcher, writer, or academic
A PDF + notes corpus you can ask grounded questions against, with line-level citations, without anything leaving the machine.
In a privacy-regulated workflow (legal, medical, defense, NDA-bound IP)
A single Rust binary with zero telemetry and zero cloud. Pair with LM Studio for a fully offline, end-to-end RAG stack.
A solo founder or consultant
Per-folder client isolation via `folder_filter` — no risk of leaking client A's context into client B's session.

Full audience analysis, competitive comparison, and the two wedges UCP is explicitly built to win on: see POSITIONING.md.

v0.1, headless. Track scope in ROADMAP.md.

What ships:

Hybrid search: SQLite FTS5 (BM25) ⨉ sqlite-vec

(ANN) merged via reciprocal-rank fusion. - Tree-sitter chunking for Rust, Python, TypeScript/JavaScript. Heading-aware Markdown. Sentence-bounded prose fallback.

Conversation memory: ingest your Claude conversations.json

export and search across past chats. - PII masking on by default — email, OpenAI sk-

, AWS keys, GitHub PATs, JWT. - Content-hash embedding cache: re-indexing unchanged content makes zero Ollama calls.

Filesystem watcher: edit a file, the index updates in ~500ms.

What's not in v0.1:

Desktop UI / tray (deferred — was in original spec, now in ROADMAP tier 2+).
OS hotkey injector and HTTP proxy interceptor (cut from the original spec).
OpenAI / Anthropic embedding providers (Ollama only for now).
Cursor and ChatGPT export formats (Claude only; others later).

UCP needs three things on your machine: Rust (to build), Ollama (to embed and optionally chat), and Poppler (for robust PDF text extraction — recommended).

brew install ollama poppler
ollama serve &              # or use the menu-bar app
ollama pull nomic-embed-text
sudo apt install poppler-utils
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text
sudo dnf install poppler-utils
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text
choco install poppler ollama   # or install each manually
ollama pull nomic-embed-text

Rust (stable, edition 2024) is needed only to build from source. If you install a pre-built UCP binary, skip the Rust install.

Poppler is optional but recommended.Without it, UCP only uses the bundledpdf-extract

for PDFs, which struggles with PDFs whose body fonts lack a ToUnicode CMap (you'll see headings extract but body text go missing). Withpdftotext

from Poppler on PATH, UCP falls back to it automatically.

Note on the name.The crate is published ason crates.io — the bareucp-local

ucp

name was taken. The binary on yourPATH

is alsoucp-local

(that's what you type on the command line), and the library is imported asuse ucp_local::...

.

cargo install ucp-local
git clone <repo-url> ucp-local
cd ucp-local
cargo build --release
cargo install --path .   # optional, to put `ucp-local` on your PATH
ucp-local index ~/Documents/notes

ucp-local index ~/Documents/notes ~/code/my-project ~/research

ucp-local watch ~/code/my-project

ucp-local clear

ucp-local clear ~/Documents/notes

ucp-local clear --hard --yes

ucp-local ingest-conversations ~/Downloads/claude-export/conversations.json

ucp-local status

ucp-local serve

ucp-local search "your query here"
ucp-local search "rate limiting" --folder ~/code/my-project --limit 10

ucp-local ask "what does the rate limiter do when a token bucket runs out?"
ucp-local ask "summarize my Q3 plan" --model qwen2.5

UCP speaks MCP over stdio, so any client that launches MCP servers can use it. Same serve

command, different config file per client.

Add to ~/Library/Application Support/Claude/claude_desktop_config.json

on macOS (%APPDATA%\Claude\claude_desktop_config.json

on Windows):

{
  "mcpServers": {
    "ucp-local": {
      "command": "/full/path/to/ucp-local",
      "args": ["serve"]
    }
  }
}

Restart Claude Desktop. The search_local_context

tool will be available — ask something grounded in your indexed files and it'll cite them inline.

Cursor reads MCP servers from ~/.cursor/mcp.json

(or per-project .cursor/mcp.json

):

{
  "mcpServers": {
    "ucp-local": {
      "command": "/full/path/to/ucp-local",
      "args": ["serve"]
    }
  }
}

Reload Cursor. The chat sidebar will surface search_local_context

as a tool — useful for grounding the agent in repos and docs Cursor's own @codebase

indexer can't reach (private notes, conversation history, sibling repos).

LM Studio 0.3.17+ supports MCP. Open the chat settings, find the MCP servers section, and add:

{
  "mcpServers": {
    "ucp-local": {
      "command": "/full/path/to/ucp-local",
      "args": ["serve"]
    }
  }
}

Pair UCP with any local model you've downloaded in LM Studio (Llama, Qwen, Mistral, etc.). Now your indexing, embeddings, retrieval, and chat model all run on the same machine — no cloud, no network — and the LLM can still call search_local_context

to ground its answers in your files.

Any client following the MCP spec (Zed, Continue.dev, Goose, custom Agent SDK apps, etc.) takes the same command

args

shape. If your client expects a JSON-RPC stdio server, point it at ucp-local serve

and you're done.

~/.config/ucp/config.toml

(or the platform equivalent — ucp-local status

prints the resolved path). All fields optional; defaults shown:

[ollama]
host = "http://localhost:11434"
embedding_model = "nomic-embed-text"

[chunking]
max_tokens = 512
overlap_sentences = 1

By extension: md

, markdown

, txt

, rs

, py

, ts

, tsx

, js

, jsx

, mjs

, go

, pdf

.

PDFs:text is extracted viapdf-extract

and chunked as prose. Works well for digitally generated PDFs (papers, docs, exported notes). Falls down on scanned image-only PDFs — those need OCR (v0.2+). Citation line numbers reference the extracted plaintext, not PDF page numbers; page-aware citations are on the v0.2 list.

Skipped directories: .git

, .idea

, .vscode

, target

, node_modules

, __pycache__

, .venv

, venv

, dist

, build

, .next

, .nuxt

, coverage

, .pytest_cache

, .mypy_cache

. Dotfiles are skipped.

Module	Role
`ingestion`
Masking + per-format chunkers (prose / markdown / code via tree-sitter) + dispatcher
`storage`
`rusqlite` + `sqlite-vec` + FTS5; hybrid search via RRF
`embeddings`
`OllamaClient` + content-hash cache via `EmbeddingCache::hash`
`indexer`
Walk + read + chunk + embed + insert; single-file and bulk-chunk paths
`watcher`
`notify` -based debounced re-index
`mcp`
JSON-RPC 2.0 stdio server, one tool: `search_local_context`

See CLAUDE.md for the developer-facing architecture summary, and Universal Context Pipeline Specification.md for the original (now narrower in scope) design doc.

cargo test                    # full test suite
cargo test --lib ingestion    # one module
cargo run -- index <path>     # iterate against the dev build
RUST_LOG=ucp_local=info cargo run -- watch <path>   # verbose

Release history and notes live in CHANGELOG.md. The current published version is 0.1.0 (crates.io).

Under Apache-2.0.

source & further reading

github.com — original article

Ucp-Local – Offline RAG for Claude Desktop, Cursor, and LM Studio

Run your AI side-project on zahid.host