cd /news/artificial-intelligence/how-to-chat-with-10-years-of-your-ow… · home topics artificial-intelligence article
[ARTICLE · art-23692] src=dev.to pub= topic=artificial-intelligence verified=true sentiment=↑ positive

How to Chat with 10 Years of Your Own Medical Records: A Quantified-Self RAG Tutorial

A developer built a Quantified-Self RAG (Retrieval-Augmented Generation) system that ingests a decade of personal medical records from messy PDF scans using Unstructured.io, Sentence-Transformers, and Qdrant. The pipeline performs Hybrid Search (BM25 + Vector) to navigate complex medical terminology and inconsistent layouts, enabling users to query their health history conversationally. The system uses layout-aware partitioning and chunking to preserve table structures and headers that standard PDF parsers would lose.

read3 min publishedJun 7, 2026

Have you ever stared at a stack of yellowing medical reports and thought, "I wish I could just ask my computer when my cholesterol started creeping up?"

We live in the era of the Quantified-Self, yet our most critical data—medical records—often sits rotting in "dirty" PDF scans or messy outpatient summaries. Today, we are going to fix that. We're building a Quantified-Self RAG (Retrieval-Augmented Generation) system designed to ingest a decade of personal health history using Unstructured.io, Sentence-Transformers, and Qdrant.

By the end of this guide, you'll have a pipeline capable of performing Hybrid Search (BM25 + Vector) to navigate through complex medical terminology and messy layouts. Let's turn those pixels into actionable health insights!

Medical PDFs are a nightmare. They contain tables, handwritten signatures, and inconsistent headers. A simple PyPDF2.extract_text()

won't cut it. We need a Layout-Aware approach.

graph TD
    A[Messy PDF Scans] --> B[Unstructured.io Partitioning]
    B --> C[Layout-Aware Chunking]
    C --> D{Hybrid Encoding}
    D --> E[Dense Vector: Sentence-Transformers]
    D --> F[Sparse Vector: BM25/SPLADE]
    E --> G[Qdrant Vector Store]
    F --> G[Qdrant Vector Store]
    H[User Query] --> I[FastAPI Search Endpoint]
    I --> G
    G --> J[Contextual Answer]

Before we dive into the code, ensure you have the following stack ready:

Standard parsers lose the context of tables. Unstructured.io treats the document as a series of elements (Title, NarrativeText, Table, etc.).

from unstructured.partition.pdf import partition_pdf

def extract_medical_data(file_path):
    elements = partition_pdf(
        filename=file_path,
        strategy="hi_res", # Uses Detectron2 under the hood
        infer_table_structure=True,
        chunking_strategy="by_title",
        max_characters=1000,
        new_after_n_chars=800,
    )

    chunks = []
    for element in elements:
        metadata = element.metadata.to_dict()
        chunks.append({
            "text": element.text,
            "type": element.category, # e.g., 'Table' or 'NarrativeText'
            "page": metadata.get("page_number")
        })
    return chunks

Medical queries often require exact keyword matches (e.g., "HbA1c") and semantic meaning (e.g., "blood sugar levels"). Qdrant's Hybrid Search combines the best of both worlds.

from qdrant_client import QdrantClient
from qdrant_client.http import models

client = QdrantClient(":memory:") # Or your cloud/docker instance

client.recreate_collection(
    collection_name="medical_records",
    vectors_config=models.VectorParams(
        size=384, # For 'all-MiniLM-L6-v2'
        distance=models.Distance.COSINE
    ),
    sparse_vectors_config={
        "text-sparse": models.SparseVectorParams(
            index=models.SparseIndexParams(
                on_disk=True,
            )
        )
    }
)

We’ll use Sentence-Transformers

for the dense embeddings. For the sparse part, we can use a simple BM25-like approach or Qdrant’s built-in sparse capabilities.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

def prepare_points(chunks):
    points = []
    for i, chunk in enumerate(chunks):
        vector = model.encode(chunk["text"]).tolist()
        points.append(
            models.PointStruct(
                id=i,
                vector=vector,
                payload=chunk
            )
        )
    return points

Building a medical RAG isn't just about indexing; it's about accuracy and privacy. If you are looking for production-ready patterns, such as Self-Querying Retrievers (filtering by year/doctor automatically) or Advanced Re-ranking for medical accuracy, I highly recommend exploring the resources at ** WellAlly Blog**. They have fantastic deep dives into scaling LLM applications for sensitive data.

Now, let's wrap this in a clean API to query our decade of data.

from fastapi import FastAPI

app = FastAPI()

@app.get("/query")
async def ask_health_history(q: str):
    query_vector = model.encode(q).tolist()

    search_result = client.search(
        collection_name="medical_records",
        query_vector=query_vector,
        limit=3,
        with_payload=True
    )

    context = "\n".join([res.payload["text"] for res in search_result])

    return {
        "query": q,
        "context_found": context,
        "sources": [res.payload["page"] for res in search_result]
    }

By using Layout-aware OCR, we ensure that a value in a "Cholesterol" table row isn't just a random number—it's tied to its header. By using Hybrid Search, we ensure that searching for "high sugar" finds "Hyperglycemia" (Semantic) while searching for "Tylenol" finds exactly "Tylenol" (Keyword).

Personal health data is the ultimate frontier for RAG. You've now built a system that doesn't just store data—it remembers your history.

What's next?

Are you working on Quantified-Self projects? What’s your biggest struggle with messy PDFs? Let’s chat in the comments below! 👇

If you enjoyed this tutorial, don't forget to check out WellAlly for more high-level architectural insights!

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-chat-with-10-…] indexed:0 read:3min 2026-06-07 ·