cd /news/ai-tools/ucp-local-offline-rag-for-claude-des… · home topics ai-tools article
[ARTICLE · art-30706] src=github.com ↗ pub= topic=ai-tools verified=true sentiment=↑ positive

Ucp-Local – Offline RAG for Claude Desktop, Cursor, and LM Studio

Ucp-Local, an open-source offline RAG server, launched as a single-binary MCP tool that indexes local files for use with Claude Desktop, Cursor, and LM Studio. The tool provides hybrid retrieval, tree-sitter code chunking, and full offline operation via Ollama, targeting privacy-sensitive workflows and air-gapped environments.

read7 min views1 publishedJun 17, 2026

A local-first MCP server that grounds LLMs in your own files.

UCP indexes folders on your machine — notes, code, conversation exports — and exposes them to any MCP-compatible client (Claude Desktop, Cursor, LM Studio, and other local-agent runtimes) as a single tool: search_local_context

. Hybrid retrieval (BM25 + vector), tree-sitter-aware code chunking, full citations, content-hash embedding cache. Single binary. No telemetry. No cloud.

Paired with a local model in LM Studio (or Ollama via ucp-local ask

), the whole stack — indexing, embeddings, retrieval, and the chat model — runs fully offline. Works on a plane, in an air-gapped facility, or anywhere a cloud LLM isn't an option.

Conversation memory — make every past Claude chat searchable across every future session.

Air-gap RAG — local Ollama + local index, zero network traffic.

Quick start — install, index, ask, in under a minute.

If you are… UCP gives you…
A Claude / Cursor / LM Studio power user
A searchable archive of every past AI conversation, callable from any future session as the search_local_context tool.
A software engineer
Code + private docs + sibling repos + past Claude chats unified under one MCP tool — surfaced inside Cursor or Claude Code alongside their native indexers.
A researcher, writer, or academic
A PDF + notes corpus you can ask grounded questions against, with line-level citations, without anything leaving the machine.
In a privacy-regulated workflow (legal, medical, defense, NDA-bound IP)
A single Rust binary with zero telemetry and zero cloud. Pair with LM Studio for a fully offline, end-to-end RAG stack.
A solo founder or consultant
Per-folder client isolation via folder_filter — no risk of leaking client A's context into client B's session.

Full audience analysis, competitive comparison, and the two wedges UCP is explicitly built to win on: see POSITIONING.md.

v0.1, headless. Track scope in ROADMAP.md.

What ships:

  • Hybrid search: SQLite FTS5 (BM25) ⨉ sqlite-vec

(ANN) merged via reciprocal-rank fusion. - Tree-sitter chunking for Rust, Python, TypeScript/JavaScript. Heading-aware Markdown. Sentence-bounded prose fallback.

  • Conversation memory: ingest your Claude conversations.json

export and search across past chats. - PII masking on by default — email, OpenAI sk-

, AWS keys, GitHub PATs, JWT. - Content-hash embedding cache: re-indexing unchanged content makes zero Ollama calls.

  • Filesystem watcher: edit a file, the index updates in ~500ms.

What's not in v0.1:

  • Desktop UI / tray (deferred — was in original spec, now in ROADMAP tier 2+).
  • OS hotkey injector and HTTP proxy interceptor (cut from the original spec).
  • OpenAI / Anthropic embedding providers (Ollama only for now).
  • Cursor and ChatGPT export formats (Claude only; others later).

UCP needs three things on your machine: Rust (to build), Ollama (to embed and optionally chat), and Poppler (for robust PDF text extraction — recommended).

brew install ollama poppler
ollama serve &              # or use the menu-bar app
ollama pull nomic-embed-text
sudo apt install poppler-utils
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text
sudo dnf install poppler-utils
curl -fsSL https://ollama.com/install.sh | sh
ollama pull nomic-embed-text
choco install poppler ollama   # or install each manually
ollama pull nomic-embed-text

Rust (stable, edition 2024) is needed only to build from source. If you install a pre-built UCP binary, skip the Rust install.

Poppler is optional but recommended.Without it, UCP only uses the bundledpdf-extract

for PDFs, which struggles with PDFs whose body fonts lack a ToUnicode CMap (you'll see headings extract but body text go missing). Withpdftotext

from Poppler on PATH, UCP falls back to it automatically.

Note on the name.The crate is published ason crates.io — the bareucp-local

ucp

name was taken. The binary on yourPATH

is alsoucp-local

(that's what you type on the command line), and the library is imported asuse ucp_local::...

.

cargo install ucp-local
git clone <repo-url> ucp-local
cd ucp-local
cargo build --release
cargo install --path .   # optional, to put `ucp-local` on your PATH
ucp-local index ~/Documents/notes

ucp-local index ~/Documents/notes ~/code/my-project ~/research

ucp-local watch ~/code/my-project

ucp-local clear

ucp-local clear ~/Documents/notes

ucp-local clear --hard --yes

ucp-local ingest-conversations ~/Downloads/claude-export/conversations.json

ucp-local status

ucp-local serve

ucp-local search "your query here"
ucp-local search "rate limiting" --folder ~/code/my-project --limit 10

ucp-local ask "what does the rate limiter do when a token bucket runs out?"
ucp-local ask "summarize my Q3 plan" --model qwen2.5

UCP speaks MCP over stdio, so any client that launches MCP servers can use it. Same serve

command, different config file per client.

Add to ~/Library/Application Support/Claude/claude_desktop_config.json

on macOS (%APPDATA%\Claude\claude_desktop_config.json

on Windows):

{
  "mcpServers": {
    "ucp-local": {
      "command": "/full/path/to/ucp-local",
      "args": ["serve"]
    }
  }
}

Restart Claude Desktop. The search_local_context

tool will be available — ask something grounded in your indexed files and it'll cite them inline.

Cursor reads MCP servers from ~/.cursor/mcp.json

(or per-project .cursor/mcp.json

):

{
  "mcpServers": {
    "ucp-local": {
      "command": "/full/path/to/ucp-local",
      "args": ["serve"]
    }
  }
}

Reload Cursor. The chat sidebar will surface search_local_context

as a tool — useful for grounding the agent in repos and docs Cursor's own @codebase

indexer can't reach (private notes, conversation history, sibling repos).

LM Studio 0.3.17+ supports MCP. Open the chat settings, find the MCP servers section, and add:

{
  "mcpServers": {
    "ucp-local": {
      "command": "/full/path/to/ucp-local",
      "args": ["serve"]
    }
  }
}

Pair UCP with any local model you've downloaded in LM Studio (Llama, Qwen, Mistral, etc.). Now your indexing, embeddings, retrieval, and chat model all run on the same machine — no cloud, no network — and the LLM can still call search_local_context

to ground its answers in your files.

Any client following the MCP spec (Zed, Continue.dev, Goose, custom Agent SDK apps, etc.) takes the same command

  • args

shape. If your client expects a JSON-RPC stdio server, point it at ucp-local serve

and you're done.

~/.config/ucp/config.toml

(or the platform equivalent — ucp-local status

prints the resolved path). All fields optional; defaults shown:

[ollama]
host = "http://localhost:11434"
embedding_model = "nomic-embed-text"

[chunking]
max_tokens = 512
overlap_sentences = 1

By extension: md

, markdown

, txt

, rs

, py

, ts

, tsx

, js

, jsx

, mjs

, go

, pdf

.

PDFs:text is extracted viapdf-extract

and chunked as prose. Works well for digitally generated PDFs (papers, docs, exported notes). Falls down on scanned image-only PDFs — those need OCR (v0.2+). Citation line numbers reference the extracted plaintext, not PDF page numbers; page-aware citations are on the v0.2 list.

Skipped directories: .git

, .idea

, .vscode

, target

, node_modules

, __pycache__

, .venv

, venv

, dist

, build

, .next

, .nuxt

, coverage

, .pytest_cache

, .mypy_cache

. Dotfiles are skipped.

Module Role
ingestion
Masking + per-format chunkers (prose / markdown / code via tree-sitter) + dispatcher
storage
rusqlite + sqlite-vec + FTS5; hybrid search via RRF
embeddings
OllamaClient + content-hash cache via EmbeddingCache::hash
indexer
Walk + read + chunk + embed + insert; single-file and bulk-chunk paths
watcher
notify -based debounced re-index
mcp
JSON-RPC 2.0 stdio server, one tool: search_local_context

See CLAUDE.md for the developer-facing architecture summary, and Universal Context Pipeline Specification.md for the original (now narrower in scope) design doc.

cargo test                    # full test suite
cargo test --lib ingestion    # one module
cargo run -- index <path>     # iterate against the dev build
RUST_LOG=ucp_local=info cargo run -- watch <path>   # verbose

Release history and notes live in CHANGELOG.md. The current published version is 0.1.0 (crates.io).

Under Apache-2.0.

── more in #ai-tools 4 stories · sorted by recency
── more on @ucp-local 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ucp-local-offline-ra…] indexed:0 read:7min 2026-06-17 ·