cd /news/large-language-models/llms-extract-drug-discontinuations-f… · home topics large-language-models article
[ARTICLE · art-31794] src=letsdatascience.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

LLMs Extract Drug Discontinuations From Estonian EHRs

Researchers combined prescription records with free-text anamneses from a 10% sample of the Estonian population (2012-2019) and applied Llama-3.1-70B and GPT-4o to extract drug discontinuation events and reasons for statins and antidiabetic medications, demonstrating LLM utility for pharmacoepidemiology in a low-resource language.

read3 min views1 publishedJun 17, 2026

Per a JMIR preprint by Suvalov et al., researchers combined prescription records with free-text anamneses from a 10% sample of the Estonian population (2012-2019) to identify drug discontinuation events and their reasons. The study applied Llama-3.1-70B and GPT-4o to extract discontinuation phrases, map them into a clinician-developed taxonomy, and label who initiated the stoppage; performance was evaluated on 100 randomly selected cases per drug group (statins and antidiabetic medications), according to the preprint. This work demonstrates a practical application of LLMs to a low-resource language for pharmacoepidemiology, highlighting both potential gains for large-scale adherence research and the need for careful validation on clinical free text.

What happened

Per a JMIR preprint by Suvalov et al., the authors merged prescription data with free-text clinical anamneses from a 10% sample of the Estonian population covering 2012-2019. The study targeted discontinuations for statins and antidiabetic medications and applied two large language models, Llama-3.1-70B and GPT-4o, to:

  • •extract discontinuation phrases
  • •classify reasons using a clinician-developed taxonomy
  • •identify whether the patient or clinician initiated the discontinuation. Performance was measured on 100 randomly chosen cases per drug group, as reported in the preprint

Technical details

The preprint documents using Llama-3.1-70B and GPT-4o for information extraction and classification from Estonian-language clinical notes. The authors developed a taxonomy of discontinuation reasons with clinician input and applied the models to link free-text evidence to structured prescription records. The manuscript presents validation on a held-out sample; exact performance metrics are reported in the preprint.

Context and significance

Applying LLMs to extract clinically relevant events from free text addresses a long-standing barrier in pharmacoepidemiology: important discontinuation rationale is frequently recorded only in narrative notes. Systems that successfully pair prescriptions with extracted reasons can enable higher-fidelity signal detection for side effects, inefficacy, or access barriers. A concurrent Harvard / Brigham and Women's Hospital preprint (arXiv 2506.11137) covers the same problem on English EHR datasets, demonstrating that LLM-based medication status extraction scales without human annotation - reinforcing the broader applicability of this approach.

What to watch

Observers should watch for the peer-reviewed final JMIR publication for full performance metrics and error analysis, replication on other languages or EHR systems, and whether the authors publish the taxonomy, annotation guidelines, or evaluation code to enable reproducibility. External replication and transparent error breakdowns (false positives versus false negatives, initiator misclassification) will determine practical utility for downstream clinical research.

Scoring Rationale #

A solid niche preprint demonstrating LLM application to pharmacoepidemiology in a low-resource (Estonian) language, using population-scale prescription and free-text EHR data. Relevant to clinical NLP and pharmacoepidemiology practitioners but limited by single-country scope, small evaluation set (100 cases per drug group), and preprint status pending peer review.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

[Active Search Campaigns by BudgetEasy](/problems/sql/active-search-campaigns-by-budget)

[High CPC Clicks & Poor Landing PagesMedium](/problems/sql/high-cpc-clicks-poor-landing-page)

[Campaign ROAS by Attribution ModelHard](/problems/sql/campaign-roas-by-attribution-model)

250 free problems · No credit card

See all Ad Tech problems

── more in #large-language-models 4 stories · sorted by recency
── more on @suvalov 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/llms-extract-drug-di…] indexed:0 read:3min 2026-06-17 ·