{"slug": "how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag", "title": "How to Chat with 10 Years of Your Own Medical Records: A Quantified-Self RAG Tutorial", "summary": "A developer built a Quantified-Self RAG (Retrieval-Augmented Generation) system that ingests a decade of personal medical records from messy PDF scans using Unstructured.io, Sentence-Transformers, and Qdrant. The pipeline performs Hybrid Search (BM25 + Vector) to navigate complex medical terminology and inconsistent layouts, enabling users to query their health history conversationally. The system uses layout-aware partitioning and chunking to preserve table structures and headers that standard PDF parsers would lose.", "body_md": "Have you ever stared at a stack of yellowing medical reports and thought, *\"I wish I could just ask my computer when my cholesterol started creeping up?\"*\n\nWe live in the era of the **Quantified-Self**, yet our most critical data—medical records—often sits rotting in \"dirty\" PDF scans or messy outpatient summaries. Today, we are going to fix that. We're building a **Quantified-Self RAG (Retrieval-Augmented Generation)** system designed to ingest a decade of personal health history using **Unstructured.io**, **Sentence-Transformers**, and **Qdrant**.\n\nBy the end of this guide, you'll have a pipeline capable of performing **Hybrid Search (BM25 + Vector)** to navigate through complex medical terminology and messy layouts. Let's turn those pixels into actionable health insights!\n\nMedical PDFs are a nightmare. They contain tables, handwritten signatures, and inconsistent headers. A simple `PyPDF2.extract_text()`\n\nwon't cut it. We need a **Layout-Aware** approach.\n\n``` php\ngraph TD\n    A[Messy PDF Scans] --> B[Unstructured.io Partitioning]\n    B --> C[Layout-Aware Chunking]\n    C --> D{Hybrid Encoding}\n    D --> E[Dense Vector: Sentence-Transformers]\n    D --> F[Sparse Vector: BM25/SPLADE]\n    E --> G[Qdrant Vector Store]\n    F --> G[Qdrant Vector Store]\n    H[User Query] --> I[FastAPI Search Endpoint]\n    I --> G\n    G --> J[Contextual Answer]\n```\n\nBefore we dive into the code, ensure you have the following stack ready:\n\nStandard parsers lose the context of tables. **Unstructured.io** treats the document as a series of elements (Title, NarrativeText, Table, etc.).\n\n``` python\nfrom unstructured.partition.pdf import partition_pdf\n\ndef extract_medical_data(file_path):\n    # This uses layout detection to identify tables and headers\n    elements = partition_pdf(\n        filename=file_path,\n        strategy=\"hi_res\", # Uses Detectron2 under the hood\n        infer_table_structure=True,\n        chunking_strategy=\"by_title\",\n        max_characters=1000,\n        new_after_n_chars=800,\n    )\n\n    chunks = []\n    for element in elements:\n        metadata = element.metadata.to_dict()\n        chunks.append({\n            \"text\": element.text,\n            \"type\": element.category, # e.g., 'Table' or 'NarrativeText'\n            \"page\": metadata.get(\"page_number\")\n        })\n    return chunks\n\n# Example: Process a 2014 Blood Test Scan\n# data_chunks = extract_medical_data(\"report_2014.pdf\")\n```\n\nMedical queries often require exact keyword matches (e.g., \"HbA1c\") and semantic meaning (e.g., \"blood sugar levels\"). Qdrant's **Hybrid Search** combines the best of both worlds.\n\n``` python\nfrom qdrant_client import QdrantClient\nfrom qdrant_client.http import models\n\nclient = QdrantClient(\":memory:\") # Or your cloud/docker instance\n\n# Create a collection with both Dense and Sparse vectors\nclient.recreate_collection(\n    collection_name=\"medical_records\",\n    vectors_config=models.VectorParams(\n        size=384, # For 'all-MiniLM-L6-v2'\n        distance=models.Distance.COSINE\n    ),\n    sparse_vectors_config={\n        \"text-sparse\": models.SparseVectorParams(\n            index=models.SparseIndexParams(\n                on_disk=True,\n            )\n        )\n    }\n)\n```\n\nWe’ll use `Sentence-Transformers`\n\nfor the dense embeddings. For the sparse part, we can use a simple BM25-like approach or Qdrant’s built-in sparse capabilities.\n\n``` python\nfrom sentence_transformers import SentenceTransformer\n\nmodel = SentenceTransformer('all-MiniLM-L6-v2')\n\ndef prepare_points(chunks):\n    points = []\n    for i, chunk in enumerate(chunks):\n        vector = model.encode(chunk[\"text\"]).tolist()\n        points.append(\n            models.PointStruct(\n                id=i,\n                vector=vector,\n                payload=chunk\n            )\n        )\n    return points\n\n# client.upsert(collection_name=\"medical_records\", points=prepare_points(data_chunks))\n```\n\nBuilding a medical RAG isn't just about indexing; it's about accuracy and privacy. If you are looking for production-ready patterns, such as **Self-Querying Retrievers** (filtering by year/doctor automatically) or **Advanced Re-ranking** for medical accuracy, I highly recommend exploring the resources at ** WellAlly Blog**. They have fantastic deep dives into scaling LLM applications for sensitive data.\n\nNow, let's wrap this in a clean API to query our decade of data.\n\n``` python\nfrom fastapi import FastAPI\n\napp = FastAPI()\n\n@app.get(\"/query\")\nasync def ask_health_history(q: str):\n    # 1. Embed the query\n    query_vector = model.encode(q).tolist()\n\n    # 2. Hybrid search in Qdrant\n    search_result = client.search(\n        collection_name=\"medical_records\",\n        query_vector=query_vector,\n        limit=3,\n        with_payload=True\n    )\n\n    # 3. Format the context for the LLM\n    context = \"\\n\".join([res.payload[\"text\"] for res in search_result])\n\n    return {\n        \"query\": q,\n        \"context_found\": context,\n        \"sources\": [res.payload[\"page\"] for res in search_result]\n    }\n\n# Run with: uvicorn main:app --reload\n```\n\nBy using **Layout-aware OCR**, we ensure that a value in a \"Cholesterol\" table row isn't just a random number—it's tied to its header. By using **Hybrid Search**, we ensure that searching for \"high sugar\" finds \"Hyperglycemia\" (Semantic) while searching for \"Tylenol\" finds exactly \"Tylenol\" (Keyword).\n\nPersonal health data is the ultimate frontier for RAG. You've now built a system that doesn't just store data—it *remembers* your history.\n\n**What's next?**\n\nAre you working on Quantified-Self projects? What’s your biggest struggle with messy PDFs? Let’s chat in the comments below! 👇\n\n*If you enjoyed this tutorial, don't forget to check out **[WellAlly](https://www.wellally.tech/blog)** for more high-level architectural insights!*", "url": "https://wpnews.pro/news/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag", "canonical_source": "https://dev.to/beck_moulton/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag-tutorial-1k60", "published_at": "2026-06-07 00:29:00+00:00", "updated_at": "2026-06-07 01:11:59.707308+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "natural-language-processing", "ai-tools"], "entities": ["Unstructured.io", "Sentence-Transformers", "Qdrant", "PyPDF2", "FastAPI", "BM25", "SPLADE"], "alternates": {"html": "https://wpnews.pro/news/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag", "markdown": "https://wpnews.pro/news/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag.md", "text": "https://wpnews.pro/news/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag.txt", "jsonld": "https://wpnews.pro/news/how-to-chat-with-10-years-of-your-own-medical-records-a-quantified-self-rag.jsonld"}}