{"slug": "ucp-local-offline-rag-for-claude-desktop-cursor-and-lm-studio", "title": "Ucp-Local – Offline RAG for Claude Desktop, Cursor, and LM Studio", "summary": "Ucp-Local, an open-source offline RAG server, launched as a single-binary MCP tool that indexes local files for use with Claude Desktop, Cursor, and LM Studio. The tool provides hybrid retrieval, tree-sitter code chunking, and full offline operation via Ollama, targeting privacy-sensitive workflows and air-gapped environments.", "body_md": "A local-first MCP server that grounds LLMs in your own files.\n\nUCP indexes folders on your machine — notes, code, conversation exports — and exposes them to any MCP-compatible client (Claude Desktop, Cursor, LM Studio, and other local-agent runtimes) as a single tool: `search_local_context`\n\n. Hybrid retrieval (BM25 + vector), tree-sitter-aware code chunking, full citations, content-hash embedding cache. Single binary. No telemetry. No cloud.\n\nPaired with a local model in LM Studio (or Ollama via `ucp-local ask`\n\n), the whole stack — indexing, embeddings, retrieval, and the chat model — runs fully offline. Works on a plane, in an air-gapped facility, or anywhere a cloud LLM isn't an option.\n\n**Conversation memory — make every past Claude chat searchable across every future session.**\n\n**Air-gap RAG — local Ollama + local index, zero network traffic.**\n\n**Quick start — install, index, ask, in under a minute.**\n\n| If you are… | UCP gives you… |\n|---|---|\nA Claude / Cursor / LM Studio power user |\nA searchable archive of every past AI conversation, callable from any future session as the `search_local_context` tool. |\nA software engineer |\nCode + private docs + sibling repos + past Claude chats unified under one MCP tool — surfaced inside Cursor or Claude Code alongside their native indexers. |\nA researcher, writer, or academic |\nA PDF + notes corpus you can ask grounded questions against, with line-level citations, without anything leaving the machine. |\nIn a privacy-regulated workflow (legal, medical, defense, NDA-bound IP) |\nA single Rust binary with zero telemetry and zero cloud. Pair with LM Studio for a fully offline, end-to-end RAG stack. |\nA solo founder or consultant |\nPer-folder client isolation via `folder_filter` — no risk of leaking client A's context into client B's session. |\n\nFull audience analysis, competitive comparison, and the two wedges UCP is explicitly built to win on: see [POSITIONING.md](/akshay2211/universal-context-pipeline/blob/master/POSITIONING.md).\n\nv0.1, headless. Track scope in [ROADMAP.md](/akshay2211/universal-context-pipeline/blob/master/ROADMAP.md).\n\nWhat ships:\n\n- Hybrid search: SQLite FTS5 (BM25) ⨉\n`sqlite-vec`\n\n(ANN) merged via reciprocal-rank fusion. - Tree-sitter chunking for Rust, Python, TypeScript/JavaScript. Heading-aware Markdown. Sentence-bounded prose fallback.\n- Conversation memory: ingest your Claude\n`conversations.json`\n\nexport and search across past chats. - PII masking on by default — email, OpenAI\n`sk-`\n\n, AWS keys, GitHub PATs, JWT. - Content-hash embedding cache: re-indexing unchanged content makes zero Ollama calls.\n- Filesystem watcher: edit a file, the index updates in ~500ms.\n\nWhat's not in v0.1:\n\n- Desktop UI / tray (deferred — was in original spec, now in ROADMAP tier 2+).\n- OS hotkey injector and HTTP proxy interceptor (cut from the original spec).\n- OpenAI / Anthropic embedding providers (Ollama only for now).\n- Cursor and ChatGPT export formats (Claude only; others later).\n\nUCP needs three things on your machine: Rust (to build), Ollama (to embed and optionally chat), and Poppler (for robust PDF text extraction — recommended).\n\n```\nbrew install ollama poppler\nollama serve &              # or use the menu-bar app\nollama pull nomic-embed-text\n# Optional, for `ucp-local ask`:  ollama pull llama3.2\nsudo apt install poppler-utils\ncurl -fsSL https://ollama.com/install.sh | sh\nollama pull nomic-embed-text\n# Optional, for `ucp-local ask`:  ollama pull llama3.2\nsudo dnf install poppler-utils\ncurl -fsSL https://ollama.com/install.sh | sh\nollama pull nomic-embed-text\nchoco install poppler ollama   # or install each manually\nollama pull nomic-embed-text\n```\n\nRust (stable, edition 2024) is needed only to build from source. If you install a pre-built UCP binary, skip the Rust install.\n\nPoppler is optional but recommended.Without it, UCP only uses the bundled`pdf-extract`\n\nfor PDFs, which struggles with PDFs whose body fonts lack a ToUnicode CMap (you'll see headings extract but body text go missing). With`pdftotext`\n\nfrom Poppler on PATH, UCP falls back to it automatically.\n\nNote on the name.The crate is published ason crates.io — the bare`ucp-local`\n\n`ucp`\n\nname was taken. The binary on your`PATH`\n\nis also`ucp-local`\n\n(that's what you type on the command line), and the library is imported as`use ucp_local::...`\n\n.\n\n```\ncargo install ucp-local\n# Puts the `ucp-local` binary on your PATH\ngit clone <repo-url> ucp-local\ncd ucp-local\ncargo build --release\n# Binary at target/release/ucp-local\ncargo install --path .   # optional, to put `ucp-local` on your PATH\n# Index one folder\nucp-local index ~/Documents/notes\n\n# Index multiple folders into the same store\nucp-local index ~/Documents/notes ~/code/my-project ~/research\n\n# Watch a folder and re-index on changes (initial pass runs first)\nucp-local watch ~/code/my-project\n\n# Clear the index — soft (keeps the embedding cache so re-index is fast)\nucp-local clear\n\n# Clear only one folder's chunks\nucp-local clear ~/Documents/notes\n\n# Hard reset — also wipes the embedding cache, forces re-embed on next index\nucp-local clear --hard --yes\n\n# Ingest a Claude conversations.json export\nucp-local ingest-conversations ~/Downloads/claude-export/conversations.json\n\n# Show config + index status\nucp-local status\n\n# Run the MCP server over stdio (this is what MCP clients launch)\nucp-local serve\n\n# Search the index from the terminal (no LLM) — best for debugging \"did indexing actually capture this?\"\nucp-local search \"your query here\"\nucp-local search \"rate limiting\" --folder ~/code/my-project --limit 10\n\n# Ask a question — runs search internally, then a local chat model answers with citations\nucp-local ask \"what does the rate limiter do when a token bucket runs out?\"\nucp-local ask \"summarize my Q3 plan\" --model qwen2.5\n```\n\nUCP speaks MCP over stdio, so any client that launches MCP servers can use it. Same `serve`\n\ncommand, different config file per client.\n\nAdd to `~/Library/Application Support/Claude/claude_desktop_config.json`\n\non macOS (`%APPDATA%\\Claude\\claude_desktop_config.json`\n\non Windows):\n\n```\n{\n  \"mcpServers\": {\n    \"ucp-local\": {\n      \"command\": \"/full/path/to/ucp-local\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\nRestart Claude Desktop. The `search_local_context`\n\ntool will be available — ask something grounded in your indexed files and it'll cite them inline.\n\nCursor reads MCP servers from `~/.cursor/mcp.json`\n\n(or per-project `.cursor/mcp.json`\n\n):\n\n```\n{\n  \"mcpServers\": {\n    \"ucp-local\": {\n      \"command\": \"/full/path/to/ucp-local\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\nReload Cursor. The chat sidebar will surface `search_local_context`\n\nas a tool — useful for grounding the agent in repos and docs Cursor's own `@codebase`\n\nindexer can't reach (private notes, conversation history, sibling repos).\n\nLM Studio 0.3.17+ supports MCP. Open the chat settings, find the **MCP servers** section, and add:\n\n```\n{\n  \"mcpServers\": {\n    \"ucp-local\": {\n      \"command\": \"/full/path/to/ucp-local\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\nPair UCP with any local model you've downloaded in LM Studio (Llama, Qwen, Mistral, etc.). Now your indexing, embeddings, retrieval, and chat model all run on the same machine — no cloud, no network — and the LLM can still call `search_local_context`\n\nto ground its answers in your files.\n\nAny client following the MCP spec (Zed, Continue.dev, Goose, custom Agent SDK apps, etc.) takes the same `command`\n\n+ `args`\n\nshape. If your client expects a JSON-RPC stdio server, point it at `ucp-local serve`\n\nand you're done.\n\n`~/.config/ucp/config.toml`\n\n(or the platform equivalent — `ucp-local status`\n\nprints the resolved path). All fields optional; defaults shown:\n\n```\n[ollama]\nhost = \"http://localhost:11434\"\nembedding_model = \"nomic-embed-text\"\n\n[chunking]\nmax_tokens = 512\noverlap_sentences = 1\n```\n\nBy extension: `md`\n\n, `markdown`\n\n, `txt`\n\n, `rs`\n\n, `py`\n\n, `ts`\n\n, `tsx`\n\n, `js`\n\n, `jsx`\n\n, `mjs`\n\n, `go`\n\n, `pdf`\n\n.\n\nPDFs:text is extracted via`pdf-extract`\n\nand chunked as prose. Works well for digitally generated PDFs (papers, docs, exported notes). Falls down on scanned image-only PDFs — those need OCR (v0.2+). Citation line numbers reference the extracted plaintext, not PDF page numbers; page-aware citations are on the v0.2 list.\n\nSkipped directories: `.git`\n\n, `.idea`\n\n, `.vscode`\n\n, `target`\n\n, `node_modules`\n\n, `__pycache__`\n\n, `.venv`\n\n, `venv`\n\n, `dist`\n\n, `build`\n\n, `.next`\n\n, `.nuxt`\n\n, `coverage`\n\n, `.pytest_cache`\n\n, `.mypy_cache`\n\n. Dotfiles are skipped.\n\n| Module | Role |\n|---|---|\n`ingestion` |\nMasking + per-format chunkers (prose / markdown / code via tree-sitter) + dispatcher |\n`storage` |\n`rusqlite` + `sqlite-vec` + FTS5; hybrid search via RRF |\n`embeddings` |\n`OllamaClient` + content-hash cache via `EmbeddingCache::hash` |\n`indexer` |\nWalk + read + chunk + embed + insert; single-file and bulk-chunk paths |\n`watcher` |\n`notify` -based debounced re-index |\n`mcp` |\nJSON-RPC 2.0 stdio server, one tool: `search_local_context` |\n\nSee [CLAUDE.md](/akshay2211/universal-context-pipeline/blob/master/CLAUDE.md) for the developer-facing architecture summary, and [Universal Context Pipeline Specification.md](/akshay2211/universal-context-pipeline/blob/master/Universal%20Context%20Pipeline%20Specification.md) for the original (now narrower in scope) design doc.\n\n```\ncargo test                    # full test suite\ncargo test --lib ingestion    # one module\ncargo run -- index <path>     # iterate against the dev build\nRUST_LOG=ucp_local=info cargo run -- watch <path>   # verbose\n```\n\nRelease history and notes live in [CHANGELOG.md](/akshay2211/universal-context-pipeline/blob/master/CHANGELOG.md). The current published version is **0.1.0** ([crates.io](https://crates.io/crates/ucp-local)).\n\nUnder Apache-2.0.", "url": "https://wpnews.pro/news/ucp-local-offline-rag-for-claude-desktop-cursor-and-lm-studio", "canonical_source": "https://github.com/akshay2211/universal-context-pipeline", "published_at": "2026-06-17 08:08:05+00:00", "updated_at": "2026-06-17 08:23:04.869408+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure", "artificial-intelligence", "large-language-models", "developer-tools"], "entities": ["Ucp-Local", "Claude Desktop", "Cursor", "LM Studio", "Ollama", "Anthropic", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/ucp-local-offline-rag-for-claude-desktop-cursor-and-lm-studio", "markdown": "https://wpnews.pro/news/ucp-local-offline-rag-for-claude-desktop-cursor-and-lm-studio.md", "text": "https://wpnews.pro/news/ucp-local-offline-rag-for-claude-desktop-cursor-and-lm-studio.txt", "jsonld": "https://wpnews.pro/news/ucp-local-offline-rag-for-claude-desktop-cursor-and-lm-studio.jsonld"}}