{"slug": "building-a-private-rag-system-lessons-from-a-local-first-ai-journal", "title": "Building a Private RAG System: Lessons from a Local-First AI Journal", "summary": "The article details the technical architecture of DiaryGPT, a private, local-first AI journaling application that processes user data entirely on-device by default using Ollama for embeddings and language models. It explains how the system uses Retrieval-Augmented Generation (RAG) to perform semantic search on encrypted diary entries stored in SQLite, retrieving only the most relevant excerpts for AI analysis without sending the full diary to external services. Additionally, the companion mode features a hardcoded crisis detection system that bypasses the LLM entirely to provide reliable emergency resources.", "body_md": "*Most AI apps quietly send your data to the cloud. DiaryGPT does the opposite — and this is the full technical story.*\n\n## The Problem With AI + Private Data\n\nWhen you write in a journal, you write the things you'd never say out loud. The last thing you want is that text sitting on someone else's server, used to train a model, or exposed in a breach.\n\nBut AI is genuinely useful for journaling. It can find patterns you miss, reflect things back to you, ask questions a blank page never would. The tension is real: **you want AI insight without sacrificing privacy.**\n\nMost apps solve this by trusting a privacy policy. I wanted a technical guarantee.\n\nSo I built DiaryGPT — an AI-powered personal journal where, by default, **zero data leaves your machine.** Here's exactly how it works.\n\n## What DiaryGPT Does\n\nBefore the architecture, here's what the app gives you:\n\n-\n**AI mood analysis** on every entry — mood, themes, a reflective response, and a follow-up question -\n**RAG-powered chat**— ask \"when was I most anxious?\" and get answers grounded in your actual entries -\n**Semantic search**— find entries by meaning, not keywords (\"times I felt lonely\" finds entries with \"isolated\", \"disconnected\", \"blue\") -\n**Weekly reflection**— AI summary of your emotional arc across the week -\n**Personalized journaling prompts**— generated from your recent writing patterns -\n**Writing streaks and memories**— \"on this day last year you wrote…\" -\n**AI companion mode**— CBT/DBT-grounded reflection with built-in crisis detection (not a replacement for a licensed therapist) -\n**Mood check-ins**— 1–10 logging with history chart -\n**Voice dictation and voice chat**— speak entries, hear responses read back -\n**Full AES-256-GCM encryption** at rest — every diary entry, chat message, and note\n\n## The Privacy Architecture\n\nDiaryGPT has two modes. You choose in Settings.\n\n### 🟢 Local Mode (Default)\n\nEverything runs on your machine. The AI model, the search, the analysis — all local via [Ollama](https://ollama.com/).\n\n```\nYour diary entry\n      ↓\nOllama (nomic-embed-text) → converts to numbers → saved in SQLite\n      ↓\nOllama (llama3.2 / qwen2.5) → analyzes mood → saved encrypted\n\nZero data leaves your machine.\n```\n\n### 🟡 Cloud Mode (Opt-in)\n\nFor users who want higher reasoning quality and are comfortable with API transit. You bring your own API key — Groq, OpenAI, Anthropic, or Gemini. The key is stored locally.\n\n```\nYour diary entry\n      ↓\nOllama (embeddings) → still local, nothing sent\n      ↓\nTop 5 relevant excerpts → your provider's API → answer streams back\n\nOnly a small slice of your diary transits. Never the full thing.\n```\n\n## The RAG Pipeline — How the AI \"Remembers\" Your Life\n\nRAG stands for **Retrieval-Augmented Generation**. It's the technique that makes the AI feel like it actually knows you — without sending everything you've ever written to a language model on every request.\n\n### What is an Embedding?\n\nEvery diary entry gets converted into a list of numbers — like GPS coordinates for meaning.\n\n```\n\"I felt anxious today\"    → [0.21, 0.83, 0.12, 0.74, ...]\n\"I was really stressed\"   → [0.22, 0.81, 0.14, 0.71, ...]  ← very similar\n\"I love hiking\"           → [0.91, 0.12, 0.67, 0.23, ...]  ← very different\n```\n\nSimilar meaning = similar numbers. This is what makes semantic search work — you search by concept, not exact words.\n\n### Phase 1 — Writing an Entry\n\n```\nYou write: \"Today was rough. Felt anxious about the deadline.\"\n                    ↓\n       Ollama (nomic-embed-text)\n       converts text → [0.21, 0.83, 0.12, 0.74, ...]\n                    ↓\n       Saved in SQLite / PostgreSQL:\n         entry text    → AES-256-GCM encrypted\n         embedding     → stored raw (math requires it)\n         mood/themes   → analyzed by LLM, stored encrypted\n```\n\nThis happens asynchronously — the entry saves immediately, analysis runs in the background.\n\n### Phase 2 — Asking a Question\n\n```\nYou ask: \"When did I feel anxious about work?\"\n                    ↓\n       Ollama converts question → numbers\n                    ↓\n       Cosine similarity search runs in YOUR database\n       (sqlite-vec or pgvector — pure math, no external call)\nentry A: 0.91 match ✓\nentry B: 0.87 match ✓\nentry C: 0.79 match ✓\nentry D: 0.31 match ✗  (skipped)\n             ↓\nTop 5 entries decrypted in memory\n             ↓\nLLM receives: system prompt + diary excerpts + your question\n             ↓\nStreams answer word by word (SSE)\n```\n\n**The key insight: embeddings find what to read. The LLM decides what to say about it.**\n\nThe LLM never sees your full diary — only the 5 most relevant entries. Cosine similarity runs entirely on your server. Nothing goes to an external service unless you've opted into cloud mode.\n\n## The Companion Pipeline — Safety First\n\nThe companion mode is built around one rule: **if someone is in crisis, the LLM never runs.**\n\n```\nYou type a message\n        ↓\nCrisis detection (keyword matching, server-side)\n\"suicide\", \"hurt myself\", \"want to die\", etc.\n        ↓\n    CRISIS?          SAFE?\n      ↓                 ↓\nHardcoded response   LLM runs with CBT/DBT prompt\n988 + Crisis Text    Acknowledges → reflects → one question\nLine + findahelpline\nLLM never called     Saves encrypted to companion_messages\n```\n\nThe crisis response is hardcoded. It cannot be hallucinated, modified, or bypassed by a clever prompt. The companion banner — *\"This is an AI companion, not a licensed therapist\"* — is also hardcoded in the UI, never AI-generated.\n\nThe companion system uses a distinct system prompt built around CBT thought-reframing, DBT skills, and reflective listening. Sessions are saved and resumable.\n\nA real limitation worth naming: keyword detection catches explicit phrases like \"I want to die\" but will miss oblique crisis language like \"I just want it to stop\" or \"everyone would be better off without me.\" A small local classifier as a second layer is on the roadmap — keyword filter as the fast, auditable first line, classifier as the safety net for implicit signals.\n\n## The Encryption Layer\n\nEvery piece of user content goes through AES-256-GCM encryption before hitting the database.\n\n```\n// Every diary entry, chat message, companion note goes through this\nencrypt(text)   // before DB insert\ndecrypt(text)   // after DB read, before sending to LLM or browser\n```\n\nThe encryption key is yours — a 64-character hex string you generate and store in your `.env`\n\n. Without it, the database is unreadable. The server never transmits the key.\n\nThe one exception: **embedding vectors are stored unencrypted.** Cosine similarity requires the raw numbers. The chunk text that generated the embedding is stored separately, encrypted. The security boundary lives at the source text, not the derived vector.\n\n## The Technical Stack\n\n```\nRuntime        Node.js + Express\nFrontend       Vanilla JS SPA (no build step, no framework)\nAuth           JWT + Argon2id password hashing\nEncryption     AES-256-GCM (Node.js crypto module)\nStorage        SQLite (local default) or PostgreSQL (multi-device)\nVector search  sqlite-vec (local) or pgvector (Postgres)\nEmbeddings     Ollama nomic-embed-text (local default)\nLLM            Ollama (local default) / Groq / OpenAI / Gemini / Anthropic\nStreaming      SSE (Server-Sent Events) over POST with ReadableStream\nVoice          Browser SpeechRecognition API (free) or Whisper (premium)\n```\n\nThe frontend is deliberately no-framework. No React, no build pipeline, no `node_modules`\n\nin the browser. It loads instantly and works offline (except for cloud LLM calls).\n\n## LLM Provider Architecture\n\nThe LLM layer is a thin factory that routes every call to whatever provider is active:\n\n``` js\n// services/llm.js\nconst PROVIDERS = { ollama, anthropic, openai, gemini, groq };\n\nexport const streamChat = (history, message, context, onDelta) =>\n  PROVIDERS[getConfig().provider].streamChat(history, message, context, onDelta);\n```\n\nSwitching providers happens at runtime — no restart needed. Every provider implements the same three-function contract:\n\n```\nanalyzeEntry(text)                              // → { mood, themes, reflection, followUpQuestion }\ngenerateText(systemPrompt, userMessage)         // → string\nstreamChat(history, message, context, onDelta)  // → full string, streams via onDelta\n```\n\nGroq uses the OpenAI SDK pointed at `https://api.groq.com/openai/v1`\n\n. Ollama uses the same SDK pointed at `http://localhost:11434/v1`\n\n. Identical interface, completely different privacy properties.\n\n## What I Learned\n\n**1. Embeddings and LLMs are completely separate concerns.** The model that converts text to numbers has nothing to do with the model that generates answers. You can run Ollama for embeddings and Groq for chat simultaneously. Most people conflate the two.\n\n**2. 7B–8B models are good enough for structured diary tasks.** Mood detection, theme extraction, journaling prompts — a well-prompted `qwen2.5:7b`\n\nhandles all of these reliably. The quality gap versus 70B only shows up in long-form weekly summaries. Use `format: json`\n\nmode in Ollama for structured output; without it, small models will eventually return malformed JSON and break your pipeline silently.\n\n**3. Cosine similarity belongs in your database, not a vector database.** For a personal app with thousands (not millions) of entries, `sqlite-vec`\n\nand `pgvector`\n\nare more than sufficient. No Pinecone, no Weaviate, no extra infra. The math is simple and fast.\n\n**4. SSE over POST is the right call for streaming.** The standard advice is to use `EventSource`\n\n, but `EventSource`\n\nis GET-only. Chat requires POST (to send the message body). The fix is `fetch`\n\n+ `ReadableStream`\n\non the client — full control over the stream lifecycle, no awkward query-string payloads.\n\n**5. Crisis detection must run before the LLM, not inside it.** You cannot rely on an LLM to consistently detect crisis language and respond safely. Keyword matching before the LLM call is not elegant, but it is reliable and auditable. An LLM should never be the first line of defense for someone in crisis — it should never even get the message.\n\n**6. The hardest engineering decisions in a privacy-first app are about what not to do.** No analytics. No telemetry. No \"anonymized\" usage data. Every one of those is a useful product feature you give up — and giving them up is the point.\n\n## Try It\n\nDiaryGPT is open source. Self-host it, read every line, verify the privacy claims.\n\n🔗 **GitHub:** [https://github.com/rahul70-code/diarygpt](https://github.com/rahul70-code/diarygpt)\n\nYour diary is yours. The AI should work for you, not harvest from you.\n\n*Stack: Node.js · Ollama · SQLite · AES-256-GCM · Vanilla JS*\n\n*Tags: #LLM #RAG #Privacy #LocalFirst #OpenSource*", "url": "https://wpnews.pro/news/building-a-private-rag-system-lessons-from-a-local-first-ai-journal", "canonical_source": "https://dev.to/rahul_talreja_946a8621542/building-a-private-rag-system-lessons-from-a-local-first-ai-journal-2dol", "published_at": "2026-05-23 10:19:15+00:00", "updated_at": "2026-05-23 10:31:52.528874+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "data"], "entities": ["DiaryGPT", "Ollama", "Groq", "OpenAI", "Anthropic", "Gemini", "SQLite"], "alternates": {"html": "https://wpnews.pro/news/building-a-private-rag-system-lessons-from-a-local-first-ai-journal", "markdown": "https://wpnews.pro/news/building-a-private-rag-system-lessons-from-a-local-first-ai-journal.md", "text": "https://wpnews.pro/news/building-a-private-rag-system-lessons-from-a-local-first-ai-journal.txt", "jsonld": "https://wpnews.pro/news/building-a-private-rag-system-lessons-from-a-local-first-ai-journal.jsonld"}}