# Researchers detect hidden self-harm histories with ML

> Source: <https://letsdatascience.com/news/researchers-detect-hidden-self-harm-histories-with-ml-04eced15>
> Published: 2026-06-06 04:50:22.671853+00:00

# Researchers detect hidden self-harm histories with ML

The University of New Mexico School of Medicine led a study analyzing electronic health records for more than **1.3 million** patients served by the Veterans Health Administration, according to a UNM press release. The researchers reported that diagnosis codes captured **1.85%** of patients with documented self-harm history, while their machine learning method estimated documented self-harm in **7.9%** of patients, over four times higher, per the study published in the Journal of Medical Internet Research (reported by News-Medical and UNM). The team combined a novel machine learning approach with expert chart review and statistical calibration, the UNM release states. The study also found that among veterans with a diagnosis code for self-harm, only **22.6%** had self-harm listed on the VHA problem list. "For research and planning, if we only count what is easy to see in diagnosis codes, we may substantially underestimate the need for mental health services," said Christophe Lambert, PhD, per the UNM release.

### What happened

The University of New Mexico School of Medicine led a study that analyzed electronic health records for more than **1.3 million** patients seen in the Veterans Health Administration, according to a UNM press release and reporting in News-Medical. The researchers reported that standard diagnosis codes identified self-harm history in **1.85%** of patients, while their calibrated machine learning method estimated documented self-harm in **7.9%** of patients, a gap of more than fourfold, per the study published in the Journal of Medical Internet Research. The study also reported that among veterans with a diagnosis code for self-harm, **22.6%** had self-harm or a history of self-harm listed on the VHA problem list, the UNM release states.

### Technical details

Editorial analysis - technical context: The authors applied a previously developed machine learning approach and then adjusted results with expert chart review and statistical calibration, reporting a higher detected prevalence than diagnosis codes alone. Public reporting does not provide model architecture, feature sets, or performance metrics beyond the calibrated prevalence estimates; the UNM release and News-Medical article do not disclose whether natural language processing, structured-data features, or hybrid methods were used. For practitioners, combining automated detection with human review and calibration is a common pattern in EHR phenotyping to control for label noise and documentation bias.

### Context and significance

The paper highlights a recurring measurement problem in clinical informatics: diagnostic coding and problem lists underrepresent clinically documented conditions. Accurate ascertainment of past self-harm matters because prior self-harm is a strong predictor of future suicide risk and influences resource planning, quality measurement, and risk stratification. Studies that rely solely on billing codes will likely undercount prevalence, which can bias research estimates and operational planning, a point the authors and the UNM release emphasize.

### What to watch

Observers should look for:

- •peer-reviewed details in the Journal of Medical Internet Research article about model inputs and performance metrics
- •replication or external validation in non-VHA systems to assess portability across different EHR vendors and documentation cultures
- •how health systems balance automated detection with privacy, governance, and clinical workflow integration. UNM has not provided full technical reproducibility materials in the press coverage cited; readers should consult the published article for code or data availability statements

### Bottom line

The reported results quantify a substantial visibility gap between documentation found in diagnosis codes and what a calibrated ML approach detects in chart records, with implications for researchers and health systems that use EHR data for surveillance, risk prediction, and planning.

## Scoring Rationale

This is a notable, practice-relevant application demonstrating measurement gaps in EHR-derived phenotypes and a pragmatic ML-plus-chart-review methodology. It is important for clinical informatics and risk-prediction work but is not a frontier-model breakthrough.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

[Active Search Campaigns by BudgetEasy](/problems/sql/active-search-campaigns-by-budget)

[High CPC Clicks & Poor Landing PagesMedium](/problems/sql/high-cpc-clicks-poor-landing-page)

[Campaign ROAS by Attribution ModelHard](/problems/sql/campaign-roas-by-attribution-model)

250 free problems · No credit card

[See all Ad Tech problems](/problems/datasets/adtech)