Building a Local-Only RAG System with Ollama and TypeScript

wpnews.pro

cd /news/large-language-models/building-a-local-only-rag-system-wit… · home › topics › large-language-models › article

[ARTICLE · art-13517] src=dev.to ↗ pub=2026-05-25T14:47Z topic=large-language-models verified=true sentiment=↑ positive

Building a Local-Only RAG System with Ollama and TypeScript

A developer built a fully local Retrieval-Augmented Generation (RAG) system using Ollama and TypeScript, requiring no API keys or third-party calls. The 200-line command-line tool indexes `.md` and `.txt` files into a SQLite vector store using `sqlite-vec`, then answers natural language questions via local embedding and language models. The system keeps all data on the user's machine, with SQLite outperforming Chroma or Qdrant for collections under a million chunks.

read4 min views16 publishedMay 25, 2026

Most RAG tutorials send your private documents to OpenAI. Here's how to keep them on your laptop.

This post walks through a complete Retrieval-Augmented Generation pipeline that runs entirely on your machine. No API keys, no third-party calls, no monthly bill. Two hundred lines of TypeScript and a single binary.

A command-line tool that:

.md

or .txt

files into a local vector store.By the end, you'll be able to point it at your engineering wiki, your personal notes, or your codebase, and ask questions in natural language without anything leaving your machine.

@xenova/transformers

sqlite-vec

Why SQLite over Chroma or Qdrant? For collections under a million chunks, SQLite is faster, simpler to deploy, and doesn't need a daemon. Your "vector database" is one file.

ollama pull nomic-embed-text       # the embedding model
ollama pull qwen2.5:7b             # the answer model
pnpm add better-sqlite3 sqlite-vec
python
import fs from "node:fs";
import path from "node:path";

function chunk(text: string, size = 800, overlap = 100): string[] {
  const sentences = text.split(/(?<=[.!?])\s+/);
  const chunks: string[] = [];
  let buffer = "";
  for (const s of sentences) {
    if ((buffer + " " + s).length > size && buffer) {
      chunks.push(buffer.trim());
      buffer = buffer.slice(-overlap) + " " + s;
    } else {
      buffer = buffer ? buffer + " " + s : s;
    }
  }
  if (buffer) chunks.push(buffer.trim());
  return chunks;
}

async function embed(text: string): Promise<number[]> {
  const r = await fetch("http://localhost:11434/api/embeddings", {
    method: "POST",
    body: JSON.stringify({ model: "nomic-embed-text", prompt: text }),
  });
  const json = await r.json();
  return json.embedding;
}

nomic-embed-text

returns 768-dimensional vectors. Fast enough that you can re-index a thousand-document corpus in a few minutes.

import Database from "better-sqlite3";
import * as sqliteVec from "sqlite-vec";

const db = new Database("rag.db");
sqliteVec.load(db);

db.exec(`
  CREATE TABLE IF NOT EXISTS chunks (
    id INTEGER PRIMARY KEY,
    source TEXT NOT NULL,
    content TEXT NOT NULL
  );
  CREATE VIRTUAL TABLE IF NOT EXISTS vec_chunks USING vec0(
    id INTEGER PRIMARY KEY,
    embedding FLOAT[768]
  );
`);

async function indexFile(filePath: string) {
  const text = fs.readFileSync(filePath, "utf8");
  const pieces = chunk(text);
  for (const piece of pieces) {
    const insertChunk = db.prepare(
      "INSERT INTO chunks (source, content) VALUES (?, ?)"
    );
    const result = insertChunk.run(filePath, piece);
    const vec = await embed(piece);
    db.prepare(
      "INSERT INTO vec_chunks (id, embedding) VALUES (?, ?)"
    ).run(result.lastInsertRowid, JSON.stringify(vec));
  }
}
js
async function search(query: string, k = 4) {
  const queryVec = await embed(query);
  const rows = db.prepare(`
    SELECT chunks.source, chunks.content, vec_chunks.distance
    FROM vec_chunks
    JOIN chunks ON chunks.id = vec_chunks.id
    WHERE vec_chunks.embedding MATCH ?
    ORDER BY distance
    LIMIT ?
  `).all(JSON.stringify(queryVec), k) as Array<{
    source: string;
    content: string;
    distance: number;
  }>;
  return rows;
}

MATCH

triggers sqlite-vec

's cosine similarity. Sub-millisecond on small corpora.

async function ask(question: string) {
  const matches = await search(question, 4);

  const context = matches
    .map((m, i) => `[${i + 1}] ${m.source}\n${m.content}`)
    .join("\n\n---\n\n");

  const prompt = `Answer the question using only the context provided.
If the answer is not in the context, say so.
Cite sources by their number in square brackets.

CONTEXT:
${context}

QUESTION: ${question}

ANSWER:`;

  const r = await fetch("http://localhost:11434/v1/chat/completions", {
    method: "POST",
    body: JSON.stringify({
      model: "qwen2.5:7b",
      messages: [{ role: "user", content: prompt }],
      stream: false,
    }),
  });
  const json = await r.json();
  return {
    answer: json.choices[0].message.content,
    sources: matches.map((m) => m.source),
  };
}
js
// Index a folder
const files = fs.readdirSync("./notes").map((f) => path.join("./notes", f));
for (const f of files) await indexFile(f);

// Ask
const result = await ask("What did we decide about the auth refactor?");
console.log(result.answer);
console.log("Sources:", result.sources);

Total runtime, indexing 500 markdown files: about three minutes on an M2 MacBook. Per-question latency: under two seconds.

If your team's documentation has grown past the point where anyone reads it cover to cover (about a hundred pages), local RAG turns that wiki back into something useful. Same applies to:

Last bullet matters: every legal-tech startup right now is building a cloud version of this. Yours runs on your laptop.

The previous post in this series covered function calling. Combining function calling with RAG gives you a local agent that can read your documents and take actions: "draft an email to legal summarising what our MSA says about data residency" — read MSA chunks, compose draft, call the email tool.

That's a real assistant. And nothing leaves your machine.

Next post: streaming Ollama responses through Server-Sent Events in Next.js, the production pattern for live UIs.

source & further reading

dev.to — original article What is going on? The 16.67ms Race: Mastering Real-Time 60 FPS Video Segmentation on Android WEBSITE FOR THE DEV WEEKEND CHALLENGE

~/api · this article 200

$curl api.wpnews.pro/v1/news/building-a-local-only-ra…

Read original on dev.to → dev.to/pavelespitia/building-a-local-only-rag-sy…

mentioned entities

Ollama

TypeScript

SQLite

Chroma

Qdrant

nomic-embed-text

qwen2.5:7b

xenova/transformers

metadata

slugbuilding-a-local-only-rag-system-with-ollama-and-typescript

topic#large-language-models

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← prevApple Blocked $11 Billion in App…

next →PlainMarkdown v1.6: Clean conver…

── more in #large-language-models 4 stories · sorted by recency

pub.towardsai.net · 12 Jul · #large-language-models

Schema Archaeology: How to Use AI to Reverse-Engineer Business Meaning From an Undocumented…

dev.to · 12 Jul · #large-language-models

My SaaS was silently broken for 4 days because an AI My app was broken for four weeks and I didn't notice. So I got rid of the API.

dev.to · 10 Jul · #large-language-models

Why Schema-Aware Query Generation Beats Generic AI Templates for Production Databases

dev.to · 12 Jul · #large-language-models

I Control My Mac with Voice — Say Hey Jarvis and It Does Everything

── more on @ollama 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required