How to Chat with 10 Years of Your Own Medical Records: A Quantified-Self RAG Tutorial

A developer built a Quantified-Self RAG (Retrieval-Augmented Generation) system that ingests a decade of personal medical records from messy PDF scans using Unstructured.io, Sentence-Transformers, and Qdrant. The pipeline performs Hybrid Search (BM25 + Vector) to navigate complex medical terminology and inconsistent layouts, enabling users to query their health history conversationally. The system uses layout-aware partitioning and chunking to preserve table structures and headers that standard PDF parsers would lose.

Have you ever stared at a stack of yellowing medical reports and thought, "I wish I could just ask my computer when my cholesterol started creeping up?" We live in the era of the Quantified-Self , yet our most critical data—medical records—often sits rotting in "dirty" PDF scans or messy outpatient summaries. Today, we are going to fix that. We're building a Quantified-Self RAG Retrieval-Augmented Generation system designed to ingest a decade of personal health history using Unstructured.io , Sentence-Transformers , and Qdrant . By the end of this guide, you'll have a pipeline capable of performing Hybrid Search BM25 + Vector to navigate through complex medical terminology and messy layouts. Let's turn those pixels into actionable health insights Medical PDFs are a nightmare. They contain tables, handwritten signatures, and inconsistent headers. A simple PyPDF2.extract text won't cut it. We need a Layout-Aware approach. php graph TD A Messy PDF Scans -- B Unstructured.io Partitioning B -- C Layout-Aware Chunking C -- D{Hybrid Encoding} D -- E Dense Vector: Sentence-Transformers D -- F Sparse Vector: BM25/SPLADE E -- G Qdrant Vector Store F -- G Qdrant Vector Store H User Query -- I FastAPI Search Endpoint I -- G G -- J Contextual Answer Before we dive into the code, ensure you have the following stack ready: Standard parsers lose the context of tables. Unstructured.io treats the document as a series of elements Title, NarrativeText, Table, etc. . python from unstructured.partition.pdf import partition pdf def extract medical data file path : This uses layout detection to identify tables and headers elements = partition pdf filename=file path, strategy="hi res", Uses Detectron2 under the hood infer table structure=True, chunking strategy="by title", max characters=1000, new after n chars=800, chunks = for element in elements: metadata = element.metadata.to dict chunks.append { "text": element.text, "type": element.category, e.g., 'Table' or 'NarrativeText' "page": metadata.get "page number" } return chunks Example: Process a 2014 Blood Test Scan data chunks = extract medical data "report 2014.pdf" Medical queries often require exact keyword matches e.g., "HbA1c" and semantic meaning e.g., "blood sugar levels" . Qdrant's Hybrid Search combines the best of both worlds. python from qdrant client import QdrantClient from qdrant client.http import models client = QdrantClient ":memory:" Or your cloud/docker instance Create a collection with both Dense and Sparse vectors client.recreate collection collection name="medical records", vectors config=models.VectorParams size=384, For 'all-MiniLM-L6-v2' distance=models.Distance.COSINE , sparse vectors config={ "text-sparse": models.SparseVectorParams index=models.SparseIndexParams on disk=True, } We’ll use Sentence-Transformers for the dense embeddings. For the sparse part, we can use a simple BM25-like approach or Qdrant’s built-in sparse capabilities. python from sentence transformers import SentenceTransformer model = SentenceTransformer 'all-MiniLM-L6-v2' def prepare points chunks : points = for i, chunk in enumerate chunks : vector = model.encode chunk "text" .tolist points.append models.PointStruct id=i, vector=vector, payload=chunk return points client.upsert collection name="medical records", points=prepare points data chunks Building a medical RAG isn't just about indexing; it's about accuracy and privacy. If you are looking for production-ready patterns, such as Self-Querying Retrievers filtering by year/doctor automatically or Advanced Re-ranking for medical accuracy, I highly recommend exploring the resources at WellAlly Blog . They have fantastic deep dives into scaling LLM applications for sensitive data. Now, let's wrap this in a clean API to query our decade of data. python from fastapi import FastAPI app = FastAPI @app.get "/query" async def ask health history q: str : 1. Embed the query query vector = model.encode q .tolist 2. Hybrid search in Qdrant search result = client.search collection name="medical records", query vector=query vector, limit=3, with payload=True 3. Format the context for the LLM context = "\n".join res.payload "text" for res in search result return { "query": q, "context found": context, "sources": res.payload "page" for res in search result } Run with: uvicorn main:app --reload By using Layout-aware OCR , we ensure that a value in a "Cholesterol" table row isn't just a random number—it's tied to its header. By using Hybrid Search , we ensure that searching for "high sugar" finds "Hyperglycemia" Semantic while searching for "Tylenol" finds exactly "Tylenol" Keyword . Personal health data is the ultimate frontier for RAG. You've now built a system that doesn't just store data—it remembers your history. What's next? Are you working on Quantified-Self projects? What’s your biggest struggle with messy PDFs? Let’s chat in the comments below 👇 If you enjoyed this tutorial, don't forget to check out WellAlly https://www.wellally.tech/blog for more high-level architectural insights