Building a Local-Only RAG System with Ollama and TypeScript A developer built a fully local Retrieval-Augmented Generation (RAG) system using Ollama and TypeScript, requiring no API keys or third-party calls. The 200-line command-line tool indexes `.md` and `.txt` files into a SQLite vector store using `sqlite-vec`, then answers natural language questions via local embedding and language models. The system keeps all data on the user's machine, with SQLite outperforming Chroma or Qdrant for collections under a million chunks. Most RAG tutorials send your private documents to OpenAI. Here's how to keep them on your laptop. This post walks through a complete Retrieval-Augmented Generation pipeline that runs entirely on your machine. No API keys, no third-party calls, no monthly bill. Two hundred lines of TypeScript and a single binary. A command-line tool that: .md or .txt files into a local vector store.By the end, you'll be able to point it at your engineering wiki, your personal notes, or your codebase, and ask questions in natural language without anything leaving your machine. @xenova/transformers sqlite-vec Why SQLite over Chroma or Qdrant? For collections under a million chunks, SQLite is faster, simpler to deploy, and doesn't need a daemon. Your "vector database" is one file. ollama pull nomic-embed-text the embedding model ollama pull qwen2.5:7b the answer model pnpm add better-sqlite3 sqlite-vec python import fs from "node:fs"; import path from "node:path"; function chunk text: string, size = 800, overlap = 100 : string { const sentences = text.split / ?<= . ? \s+/ ; const chunks: string = ; let buffer = ""; for const s of sentences { if buffer + " " + s .length size && buffer { chunks.push buffer.trim ; buffer = buffer.slice -overlap + " " + s; } else { buffer = buffer ? buffer + " " + s : s; } } if buffer chunks.push buffer.trim ; return chunks; } async function embed text: string : Promise