# How to Chat with 10 Years of Your Own Medical Records: A Quantified-Self RAG Tutorial

> Source: <https://dev.to/beck_moulton/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag-tutorial-1k60>
> Published: 2026-06-07 00:29:00+00:00

Have you ever stared at a stack of yellowing medical reports and thought, *"I wish I could just ask my computer when my cholesterol started creeping up?"*

We live in the era of the **Quantified-Self**, yet our most critical data—medical records—often sits rotting in "dirty" PDF scans or messy outpatient summaries. Today, we are going to fix that. We're building a **Quantified-Self RAG (Retrieval-Augmented Generation)** system designed to ingest a decade of personal health history using **Unstructured.io**, **Sentence-Transformers**, and **Qdrant**.

By the end of this guide, you'll have a pipeline capable of performing **Hybrid Search (BM25 + Vector)** to navigate through complex medical terminology and messy layouts. Let's turn those pixels into actionable health insights!

Medical PDFs are a nightmare. They contain tables, handwritten signatures, and inconsistent headers. A simple `PyPDF2.extract_text()`

won't cut it. We need a **Layout-Aware** approach.

``` php
graph TD
    A[Messy PDF Scans] --> B[Unstructured.io Partitioning]
    B --> C[Layout-Aware Chunking]
    C --> D{Hybrid Encoding}
    D --> E[Dense Vector: Sentence-Transformers]
    D --> F[Sparse Vector: BM25/SPLADE]
    E --> G[Qdrant Vector Store]
    F --> G[Qdrant Vector Store]
    H[User Query] --> I[FastAPI Search Endpoint]
    I --> G
    G --> J[Contextual Answer]
```

Before we dive into the code, ensure you have the following stack ready:

Standard parsers lose the context of tables. **Unstructured.io** treats the document as a series of elements (Title, NarrativeText, Table, etc.).

``` python
from unstructured.partition.pdf import partition_pdf

def extract_medical_data(file_path):
    # This uses layout detection to identify tables and headers
    elements = partition_pdf(
        filename=file_path,
        strategy="hi_res", # Uses Detectron2 under the hood
        infer_table_structure=True,
        chunking_strategy="by_title",
        max_characters=1000,
        new_after_n_chars=800,
    )

    chunks = []
    for element in elements:
        metadata = element.metadata.to_dict()
        chunks.append({
            "text": element.text,
            "type": element.category, # e.g., 'Table' or 'NarrativeText'
            "page": metadata.get("page_number")
        })
    return chunks

# Example: Process a 2014 Blood Test Scan
# data_chunks = extract_medical_data("report_2014.pdf")
```

Medical queries often require exact keyword matches (e.g., "HbA1c") and semantic meaning (e.g., "blood sugar levels"). Qdrant's **Hybrid Search** combines the best of both worlds.

``` python
from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient(":memory:") # Or your cloud/docker instance

# Create a collection with both Dense and Sparse vectors
client.recreate_collection(
    collection_name="medical_records",
    vectors_config=models.VectorParams(
        size=384, # For 'all-MiniLM-L6-v2'
        distance=models.Distance.COSINE
    ),
    sparse_vectors_config={
        "text-sparse": models.SparseVectorParams(
            index=models.SparseIndexParams(
                on_disk=True,
            )
        )
    }
)
```

We’ll use `Sentence-Transformers`

for the dense embeddings. For the sparse part, we can use a simple BM25-like approach or Qdrant’s built-in sparse capabilities.

``` python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def prepare_points(chunks):
    points = []
    for i, chunk in enumerate(chunks):
        vector = model.encode(chunk["text"]).tolist()
        points.append(
            models.PointStruct(
                id=i,
                vector=vector,
                payload=chunk
            )
        )
    return points

# client.upsert(collection_name="medical_records", points=prepare_points(data_chunks))
```

Building a medical RAG isn't just about indexing; it's about accuracy and privacy. If you are looking for production-ready patterns, such as **Self-Querying Retrievers** (filtering by year/doctor automatically) or **Advanced Re-ranking** for medical accuracy, I highly recommend exploring the resources at ** WellAlly Blog**. They have fantastic deep dives into scaling LLM applications for sensitive data.

Now, let's wrap this in a clean API to query our decade of data.

``` python
from fastapi import FastAPI

app = FastAPI()

@app.get("/query")
async def ask_health_history(q: str):
    # 1. Embed the query
    query_vector = model.encode(q).tolist()

    # 2. Hybrid search in Qdrant
    search_result = client.search(
        collection_name="medical_records",
        query_vector=query_vector,
        limit=3,
        with_payload=True
    )

    # 3. Format the context for the LLM
    context = "\n".join([res.payload["text"] for res in search_result])

    return {
        "query": q,
        "context_found": context,
        "sources": [res.payload["page"] for res in search_result]
    }

# Run with: uvicorn main:app --reload
```

By using **Layout-aware OCR**, we ensure that a value in a "Cholesterol" table row isn't just a random number—it's tied to its header. By using **Hybrid Search**, we ensure that searching for "high sugar" finds "Hyperglycemia" (Semantic) while searching for "Tylenol" finds exactly "Tylenol" (Keyword).

Personal health data is the ultimate frontier for RAG. You've now built a system that doesn't just store data—it *remembers* your history.

**What's next?**

Are you working on Quantified-Self projects? What’s your biggest struggle with messy PDFs? Let’s chat in the comments below! 👇

*If you enjoyed this tutorial, don't forget to check out **[WellAlly](https://www.wellally.tech/blog)** for more high-level architectural insights!*
