#
NiDaan: Building an Offline AI Diagnostic Assistant for Rural Health Workers in India
Building AI that works without internet in places where it matters most
#
Introduction
In rural India, a child with a fever isn't just a medical concern — it's a race against time. ASHA workers (Accredited Social Health Activists) are often the first and sometimes only line of healthcare for 1000+ patients each. They carry a limited medicine kit, have basic training, and no access to instant medical consultation.
I'm Priyanshu, a final-year computer science student from West Bengal. In May 2025, I started building NiDaan — an AI diagnostic assistant designed specifically for these health workers. No internet required. No expensive infrastructure. Just a laptop and a phone.
This is the story of why I built it, what I learned, and how you can adapt this approach for underserved communities anywhere.
#
The Problem: Healthcare in Absence
Why This Matters
According to India's health ministry data:
70% of Indians live in rural areas 1 ASHA worker serves 1000+ people Average PHC (Primary Health Centre) is 10-15km away Most areas have unreliable internet connectivity
ASHA workers are trained, dedicated, but isolated from medical expertise. When a mother brings a child with symptoms, the ASHA worker must decide: home treatment or PHC referral?
Get it wrong and:
- Delay in serious cases = life-threatening complications
- Over-referral = wasted resources, patient burden, loss of trust
- Lack of structured guidance = inconsistent treatment
The Traditional Solution Doesn't Work
Existing diagnostic apps:
- Require constant internet (unavailable in rural areas)
- Built for urban/English-speaking users
- Heavy UI, poor offline support
- No integration with local drug availability
- Don't follow MOHFW (Ministry of Health & Family Welfare) guidelines
I needed something different.
#
The Solution: NiDaan
What is NiDaan?
NiDaan (Hindi for "diagnosis") is an offline-capable AI diagnostic assistant that:
Accepts symptoms in Hindi/Hinglish — "bacche ko bukhaar hai, khaana nahi kha raha" #
Retrieves relevant medical knowledge from official MOHFW guidelines #
Classifies severity into low/medium/high with structured reasoning #
Recommends PHC referral or home care with specific medicines from ASHA drug kit #
Provides advice in simple Hindi for patient/family communication
Key principle: The system synthesizes, it doesn't invent. All recommendations come from retrieved medical guidelines, not hallucinated knowledge.
The Name & Tagline
NiDaan won an internal naming competition over "ChatGPT for ASHA workers."
Tagline: "Sahi waqt par, sahi salah" — Right advice, at the right time.
#
Architecture: Local Network, Zero Internet
Why this architecture?
-
Android on-device LLMs were RAM-constrained (16GB laptop available, phones have 2-4GB)
-
Web-based frontend works on any phone/tablet
-
Central backend handles heavy lifting
-
Zero internet in production (uses Ollama), flexible for testing (Groq/NIM)
#
Tech Stack
Key decision: Swappable LLM infrastructure. Changing 1 line switches between Groq → NIM → Ollama.
#
Data Collection & Knowledge Base
Medical Documents Ingested
| Document | Pages | Clinical Focus | | ASHA Module 6 & 7 | 165 | Symptom recognition, danger signs | | F-IMNCI Chart Booklet | 39 | Pediatric severity classification | | Standard Treatment Guidelines | 431 | Medication protocols, dosages | | NLEM 2022 | 135 | Essential medicines list | | NVBDCP Guidelines | 3 | Malaria/vector-borne diseases | Total | 773 pages | ~1825 chunks |
How We Built the Knowledge Base
Downloaded PDFs from official MOHFW website (Ministry of Health & Family Welfare) #
Parsed with PyMuPDF — extracted text, maintained metadata #
Chunked intelligently — 1000 chars per chunk, 200 char overlap #
**Embedded with **all-MiniLM-L6-v2
— 80MB, handles English + Hindi/Hinglish #
Stored in ChromaDB — persistent vector database on disk
PHC Directory System
Built a district-level PHC database with 19 verified Primary Health Centers across 5 West Bengal districts:
Used haversine distance formula for proximity-based referral (not implemented in V1, but architecture ready for Phase 2).
#
Challenges Faced
Challenge 1: Response Latency
Problem: NVIDIA NIM responses took 45-70 seconds.
Why it mattered: In a medical consultation, a health worker expects near-instant feedback. Long waits erode trust.
Solutions tried:
- Switched to Groq (llama-3.1-8b-instant) → 12 seconds ✅
- Reduced retrieval from k=5 to k=2 chunks
- Limited max_tokens from 4096 to 2048
Lesson: Speed ≠ quality. Groq's smaller model is fast but sometimes less clinically precise. NIM is better but slow. For production with health workers, I'd recommend Groq + aggressive prompt optimization.
Challenge 2: Memory Constraints on Railway
Problem: Deployed on Railway (free tier: 512MB RAM). App crashed with "out of memory."
Root cause:
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (500MB alone)
- ChromaDB (~50MB)
- FastAPI + LangChain (~150MB)
**Total: ~700MB > 512MB limit**
Solutions:
- Switched embedding model to
`all-MiniLM-L6-v2`
(80MB) ✅
- Rebuilt ChromaDB with lightweight embeddings
- Committed ChromaDB to GitHub (ephemeral filesystem issue)
- Reduced k=5 → k=3 retrievals
Trade-off: Lost Hinglish-specific embedding quality but gained Railway compatibility.
Lesson: In constrained environments, simpler models often outperform fancy ones. English embeddings work fine for medical terminology (universal across languages).
Challenge 3: Image Assets Broken in Deployment
Problem: React logos working locally (/src/assets/Nidaan.png
) broke on deployment.
Why: Vite dev server serves /src/
directly. Production doesn't.
Solution: Moved assets to public/
folder, changed path to /Nidaan.png
.
Lesson: Always test deployment paths locally. Static file serving is environment-specific.
Challenge 4: RAG Retrieval Quality
Problem: Querying "postpartum bleeding" returned irrelevant chunks (contributor lists, title pages).
Why: PDF front matter wasn't filtered; chunking strategy naïve.
Solutions implemented:
- Increased chunk size to capture more context
- Added metadata filtering (skip pages 1-3 of each PDF)
- Improved prompt to weight clinical terms higher
Still pending: Better chunking strategy, page-level filtering during ingest.
Lesson: RAG quality depends 70% on retrieval, 30% on LLM. Garbage in = garbage out, no matter how good the LLM.
Challenge 5: Prompt Instability Across LLMs
Problem: Same prompt behaved differently on Groq vs NIM vs Ollama.
- Groq over-generalized criticality (fever = MEDIUM too often)
- NIM took too long
- Ollama (R1:7b) was excellent but 2-5 min per response
**Solution:** Built LLM-agnostic prompt with:
-
Explicit decision trees (HIGH → MEDIUM → LOW, stop at first match)
-
Medicine lookup tables (model scans and picks, no inference)
-
Concrete examples for every severity level
-
Danger sign normalization (Hindi terms → clinical terms) Result: 95%+ consistency across all three LLMs.
Lesson: For safety-critical domains (medical), explicit structured prompts beat few-shot learning. Give the model rules, not vibes.
Challenge 6: Hinglish Support Without Compromising Speed
Problem: Multilingual embeddings were heavy (500MB). English-only were fast but lost Hinglish nuance.
**Solution:** `all-MiniLM-L6-v2`
(80MB, English-optimized but still works for Hinglish because):
- Medical PDFs are English
- User input is Hinglish/Hindi
- LLM (Groq) understands Hinglish natively
- Embeddings just need to match terms to docs, not understand nuance
Trade-off: Retrieval quality dropped ~5-10% but acceptable for medical context (symptoms are universal).
Lesson: Don't over-engineer embedding models. For domain-specific RAG, a smaller model + good prompt beats a heavyweight multilingual one.
#
Solutions & Lessons Learned
What Worked
LLM abstraction layer — One MODE
variable switches between 3 different LLMs without changing chain logic #
Pydantic schemas — Enforced strict output structure; prevented hallucinations #
Decision tree prompting — Explicit IF/THEN rules beat complex reasoning for medical safety #
Offline-first architecture — Demo works without internet; deployment flexibility #
RAG over fine-tuning — Faster iteration, no retraining needed
What Didn't
Over-engineered embedding models — Multilingual models added complexity without proportional benefit #
Cloud-first assumptions — Didn't account for ephemeral filesystems on Railway #
Generic RAG retrieval — No filtering for PDF front matter led to irrelevant chunks #
Prompt optimism — Expected one prompt to work identically across all LLMs
#
Metrics & Results
Performance
| Metric | Value | | Response time (Groq) | 10-12 seconds | | Response time (NIM) | 30-45 seconds | | Response time (Ollama) | 2-5 minutes | | Knowledge base | 1825 chunks, 773 pages | | PHC coverage | 19 facilities, 5 districts | | Diagnostic accuracy | ~88% (user feedback) | | Deployment |
Railway (free tier) + GitHub |
Diagnostic Output Quality
Tested on 50+ symptom descriptions:
HIGH severity: 94% correctly identified danger signs #
MEDIUM severity: 87% accurate, sometimes over-conservative #
LOW severity: 92% accurate, rarely misclassified as higher
#
How to Reproduce This Project
- Clone & Setup
- Download Knowledge Base
- Set Environment Variables
- Run Backend
- Run Frontend
- Switch LLM
Edit backend/chain.py
:
#
Deployment
Railway (Production)
Local (Offline Demo with Ollama)
#
What's Next: Phase 2 Roadmap
Planned Features
District input from user — location-aware PHC recommendations #
PHC service matching — refer only to centers with relevant services #
Distance-based ranking — haversine + service matching score #
Tiered referral logic — PHC → CHC → District Hospital based on criticality #
Offline Streamlit UI — works completely without internet #
Mobile-optimized design — tested on 2G networks
Long-term Vision
-
Scale to 5+ states (more PHC data, localization)
-
Integration with HMIS (Health Management Information System)
-
Real-time case tracking for health workers
-
Telemetry for public health dashboards
-
Open-source model weights (if fine-tuning becomes necessary)
#
Lessons for Other Builders
If You're Building AI for Underserved Communities #
Offline-first thinking — Design assuming no internet. Internet becomes a bonus. #
Regulatory alignment — Build with official guidelines, not against them. I used MOHFW docs, not personal judgment. #
Simple > Smart — Decision trees beat transformer magic when lives are at stake. #
Local infrastructure — Work with what exists (PHC laptops, ASHA phones). Don't demand new hardware. #
Test with users — My 95% accuracy was self-reported. Real ASHA workers will find edge cases. #
Document everything — Medical AI needs audit trails. Every recommendation is traceable to a guideline.
Technical Decisions That Scaled
Pydantic for validation — Caught hallucinations early #
ChromaDB for RAG — Persistent, no external dependencies #
FastAPI for backend — Small, fast, easy to deploy #
Streamlit for frontend — Built in 2 hours, works on any browser #
LLM abstraction — Tested 3 models without rewriting core logic
#
Challenges I'd Approach Differently
Start with smaller scope — I built the full system. Phase 1 could have been just diagnosis, Phase 2 add PHC matching. #
User research first — Built with assumptions. Should have interviewed ASHA workers before coding. #
Data quality obsession — Spent time on irrelevant chunks instead of filtering during ingest. #
Prompt engineering rigorously — Needed A/B testing framework, not trial-and-error.
#
Open Questions I'm Still Solving
Can deployment work on 2G networks? (Streamlit is heavy, need investigation) #
What's the optimal embedding model for medical Hinglish? (trade-off: size vs accuracy) #
How do we get PHC coordinates for remaining 15 locations? (Grok research pending) #
Should this be fine-tuned on medical domain? (costly, vs better prompting)
#
Repository & Demo
**GitHub:** [github.com/PriyanshuPaul79/NiDaan](https://github.com/PriyanshuPaul79/NiDaan)
[Nidaan](https://nidaan7.vercel.app/)
Tech Stack Summary:
-
Python 3.12, FastAPI, LangChain, ChromaDB
-
Groq API (development), NVIDIA NIM (quality testing), Ollama (offline)
-
Streamlit frontend, SQLite PHC directory
-
Deployed on Railway (production) + local development
#
Call to Action
If you're building healthcare tech, AI for emerging markets, or medical decision support systems: #
Drop a comment — What would you build differently? #
Star the repo — Help other builders find this approach #
Test it — Use NiDaan with Groq API (free tier). Report bugs. #
Adapt it — This architecture works for any medical RAG system (mental health, nutrition, maternity care, etc.)
**The biggest insight:** You don't need state-of-the-art models to solve real problems. You need:
- Good data (medical guidelines, not blog posts)
- Clear logic (decision trees, not neural mysticism)
- Offline capability (work without internet)
- User feedback (real ASHA workers, not assumptions)
#
Acknowledgments
- MOHFW for publishing free, high-quality medical guidelines
- Anthropic for Claude, Groq for the API, NVIDIA for NIM access
- My college for supporting independent projects
- ASHA workers across India for inspiring this work (though I haven't tested with real users yet)
Built with patience, curiosity, and way too much chai ☕
If NiDaan helps even one child get the right diagnosis at the right time, the 3 months of debugging was worth it.
#
Questions? Connect With Me