Build a Private AI Search on Your Device: Local RAG in the Browser Utilora has built a fully client-side Retrieval-Augmented Generation (RAG) system called Personal RAG that runs entirely within a web browser, enabling private AI search over local files without any server, API, or cost. The system combines browser-based document parsing, local machine learning embeddings, and the Origin Private File System (OPFS) for vector storage, with Google's Comlink library managing background Web Worker communication. This architecture ensures user data never leaves the device, providing true privacy, offline access, and zero backend infrastructure. How many times have you wanted to search your private PDFs, notes, or code files using AI, but hesitated? We all want the power of AI search. But uploading sensitive documents to external servers is a big privacy risk. What if you could build a complete search engine that runs 100% inside your browser? No servers, no APIs, and no cost. At Utilora https://utilora.app , we built exactly this. We call it Personal RAG. Here is how we made it work, and how you can do it too. Retrieval-Augmented Generation RAG usually requires a backend database, python servers, and API keys. To make it run entirely on the client side, we combined three modern browser technologies: Here is the exact flow of how a document is processed: Your File ➔ Client Parser ➔ Chunking ➔ Local ML Embedding ➔ OPFS Storage You cannot store millions of text numbers in normal browser storage like LocalStorage. It is too slow and has a 5MB limit. Instead, we use the Origin Private File System OPFS . It gives web apps a private, highly optimized filesystem. Here is a simple look at how we write vector indexes to OPFS: // Access the private root directory const root = await navigator.storage.getDirectory ; // Create or access our index file const fileHandle = await root.getFileHandle "vector-index.db", { create: true } ; // Create a high-speed write stream const accessHandle = await fileHandle.createWritable ; await accessHandle.write new TextEncoder .encode JSON.stringify myVectorData ; await accessHandle.close ; We use Comlink by Google to easily communicate with a background Web Worker: js // In your main component import as Comlink from "comlink"; const worker = new Worker new URL "./rag-indexer.worker.ts", import.meta.url , { type: "module" } ; const localIndexer = Comlink.wrap worker ; // Run indexer in the background await localIndexer.processAndEmbedFile myUploadedFile ; Building with zero backend constraints completely changes how you think about software: • True Privacy: Privacy is not a text policy on a page. It is hardcoded into the architecture. Since there is no backend, we cannot see your files even if we wanted to. • Completely Free: You do not pay for API keys, vector databases, or server hosting. The user's computer does all the work. • Instant Offline Access: Once the page loads, you can turn off your internet and it still works. If you want to see this in action, come check it out on Utilora https://utilora.com https://utilora.com our free, open collection of local web utilities . Drag in a PDF, let it index, and ask questions. Your data never leaves your screen. Have you built anything using local browser models? Let's chat in the comments below