How to Chat with 10 Years of Your Own Medical Records: A Quantified-Self RAG Tutorial

wpnews.pro

cd /news/artificial-intelligence/how-to-chat-with-10-years-of-your-ow… · home › topics › artificial-intelligence › article

[ARTICLE · art-23692] src=dev.to ↗ pub=2026-06-07T00:29Z topic=artificial-intelligence verified=true sentiment=↑ positive

How to Chat with 10 Years of Your Own Medical Records: A Quantified-Self RAG Tutorial

A developer built a Quantified-Self RAG (Retrieval-Augmented Generation) system that ingests a decade of personal medical records from messy PDF scans using Unstructured.io, Sentence-Transformers, and Qdrant. The pipeline performs Hybrid Search (BM25 + Vector) to navigate complex medical terminology and inconsistent layouts, enabling users to query their health history conversationally. The system uses layout-aware partitioning and chunking to preserve table structures and headers that standard PDF parsers would lose.

read3 min views17 publishedJun 7, 2026

Have you ever stared at a stack of yellowing medical reports and thought, "I wish I could just ask my computer when my cholesterol started creeping up?"

We live in the era of the Quantified-Self, yet our most critical data—medical records—often sits rotting in "dirty" PDF scans or messy outpatient summaries. Today, we are going to fix that. We're building a Quantified-Self RAG (Retrieval-Augmented Generation) system designed to ingest a decade of personal health history using Unstructured.io, Sentence-Transformers, and Qdrant.

By the end of this guide, you'll have a pipeline capable of performing Hybrid Search (BM25 + Vector) to navigate through complex medical terminology and messy layouts. Let's turn those pixels into actionable health insights!

Medical PDFs are a nightmare. They contain tables, handwritten signatures, and inconsistent headers. A simple PyPDF2.extract_text()

won't cut it. We need a Layout-Aware approach.

graph TD
    A[Messy PDF Scans] --> B[Unstructured.io Partitioning]
    B --> C[Layout-Aware Chunking]
    C --> D{Hybrid Encoding}
    D --> E[Dense Vector: Sentence-Transformers]
    D --> F[Sparse Vector: BM25/SPLADE]
    E --> G[Qdrant Vector Store]
    F --> G[Qdrant Vector Store]
    H[User Query] --> I[FastAPI Search Endpoint]
    I --> G
    G --> J[Contextual Answer]

Before we dive into the code, ensure you have the following stack ready:

Standard parsers lose the context of tables. Unstructured.io treats the document as a series of elements (Title, NarrativeText, Table, etc.).

from unstructured.partition.pdf import partition_pdf

def extract_medical_data(file_path):
    elements = partition_pdf(
        filename=file_path,
        strategy="hi_res", # Uses Detectron2 under the hood
        infer_table_structure=True,
        chunking_strategy="by_title",
        max_characters=1000,
        new_after_n_chars=800,
    )

    chunks = []
    for element in elements:
        metadata = element.metadata.to_dict()
        chunks.append({
            "text": element.text,
            "type": element.category, # e.g., 'Table' or 'NarrativeText'
            "page": metadata.get("page_number")
        })
    return chunks

Medical queries often require exact keyword matches (e.g., "HbA1c") and semantic meaning (e.g., "blood sugar levels"). Qdrant's Hybrid Search combines the best of both worlds.

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient(":memory:") # Or your cloud/docker instance

client.recreate_collection(
    collection_name="medical_records",
    vectors_config=models.VectorParams(
        size=384, # For 'all-MiniLM-L6-v2'
        distance=models.Distance.COSINE
    ),
    sparse_vectors_config={
        "text-sparse": models.SparseVectorParams(
            index=models.SparseIndexParams(
                on_disk=True,
            )
        )
    }
)

We’ll use Sentence-Transformers

for the dense embeddings. For the sparse part, we can use a simple BM25-like approach or Qdrant’s built-in sparse capabilities.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def prepare_points(chunks):
    points = []
    for i, chunk in enumerate(chunks):
        vector = model.encode(chunk["text"]).tolist()
        points.append(
            models.PointStruct(
                id=i,
                vector=vector,
                payload=chunk
            )
        )
    return points

Building a medical RAG isn't just about indexing; it's about accuracy and privacy. If you are looking for production-ready patterns, such as Self-Querying Retrievers (filtering by year/doctor automatically) or Advanced Re-ranking for medical accuracy, I highly recommend exploring the resources at ** WellAlly Blog**. They have fantastic deep dives into scaling LLM applications for sensitive data.

Now, let's wrap this in a clean API to query our decade of data.

from fastapi import FastAPI

app = FastAPI()

@app.get("/query")
async def ask_health_history(q: str):
    query_vector = model.encode(q).tolist()

    search_result = client.search(
        collection_name="medical_records",
        query_vector=query_vector,
        limit=3,
        with_payload=True
    )

    context = "\n".join([res.payload["text"] for res in search_result])

    return {
        "query": q,
        "context_found": context,
        "sources": [res.payload["page"] for res in search_result]
    }

By using Layout-aware OCR, we ensure that a value in a "Cholesterol" table row isn't just a random number—it's tied to its header. By using Hybrid Search, we ensure that searching for "high sugar" finds "Hyperglycemia" (Semantic) while searching for "Tylenol" finds exactly "Tylenol" (Keyword).

Personal health data is the ultimate frontier for RAG. You've now built a system that doesn't just store data—it remembers your history.

What's next?

Are you working on Quantified-Self projects? What’s your biggest struggle with messy PDFs? Let’s chat in the comments below! 👇

If you enjoyed this tutorial, don't forget to check out WellAlly for more high-level architectural insights!

source & further reading

dev.to — original article SKILL.md: how to write a Claude Code skill that actually triggers (format + template) Two AI models that attack each other beat one that agrees with itself Understanding Vector Databases: A Beginner's Guide to Embeddings and Similarity Search

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-to-chat-with-10-year…

Read original on dev.to → dev.to/beck_moulton/how-to-chat-with-10-years-of…

mentioned entities

Unstructured.io

Sentence-Transformers

Qdrant

PyPDF2

FastAPI

BM25

SPLADE

metadata

slughow-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevNick Holt's Documentary Question…

next →Is AI Anxiety Causing the Vibece…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 11 Jul · #artificial-intelligence

Quantified Self 2.0: Stop Guessing Your Health History—Build a Personal Medical Vector Database

github.com · 19 Jul · #artificial-intelligence

Visualizing how multimodal vector search works under the hood

machinebrief.com · 22 Jul · #artificial-intelligence

DAIS: Dependency-Aware Intermediate QA Supervision for Complex Reasoning

machinebrief.com · 22 Jul · #artificial-intelligence

Reasoning Before Translation: Enhancing Legal Machine Translation with Structured Reasoning

── more on @unstructured.io 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required