Study Advances GI Cancer Risk Prediction with ML

wpnews.pro

cd /news/machine-learning/study-advances-gi-cancer-risk-predic… · home › topics › machine-learning › article

[ARTICLE · art-21780] src=letsdatascience.com ↗ pub=2026-06-04T17:54Z topic=machine-learning verified=true sentiment=· neutral

Study Advances GI Cancer Risk Prediction with ML

A peer-reviewed study published in JMIR Medical Informatics evaluated machine learning methods for predicting gastrointestinal (GI) cancer risk in a South Korean prospective cohort of 7,652 participants, where only 2% developed cancer over 14 years. Researchers introduced a patient-centered undersampling technique (PCUSTe) to address severe class imbalance, achieving a sensitivity of 0.77 and AUC of 0.77 with an incrementally trained stochastic gradient descent model. The findings advance noninvasive risk stratification tools for earlier detection and targeted screening of GI cancers, a major health burden in South Korea.

read3 min views18 publishedJun 4, 2026

A peer-reviewed study published in JMIR Medical Informatics evaluates machine learning methods for predicting gastrointestinal (GI) cancer risk in a South Korean prospective cohort, where GI cancers are a major health burden. Analyzing 7,652 participants with 156 incident GI cancer cases (about 2%) over 14 years of follow-up, the authors tackle the severe class imbalance that makes rare-disease prediction difficult. They introduce a patient-centered undersampling technique (PCUSTe) modeled on frequency-matched case-control design and benchmark it against SMOTE, ADASYN, and hybrid resampling. An incrementally trained stochastic gradient descent model on PCUSTe data reached a sensitivity of 0.77 and an AUC of 0.77, while logistic regression without resampling produced balanced results (sensitivity 0.70, specificity 0.71, AUC 0.75). The authors frame the models as tools for earlier risk stratification and targeted screening.

What the study did

In a paper published in JMIR Medical Informatics (2026), researchers Daina Baublyte, Jeonghee Lee, Madhawa Gunathilake, and Jeongseon Kim evaluated machine learning approaches for predicting gastrointestinal (GI) cancer risk in a South Korean prospective cohort. GI cancers are a significant health concern in South Korea, and the team focused on noninvasive and minimally invasive predictors tied to modifiable behavioral and metabolic risk factors.

The data challenge

The cohort included 7,652 individuals, of whom only 156 (about 2%) developed a GI cancer over a 14-year follow-up. According to the study, this rarity creates severe class imbalance that pushes standard models toward the majority 'healthy' class at the expense of clinical sensitivity, the metric that matters most for catching true cases early.

Method

To address imbalance while preserving population structure, the authors developed a patient-centered undersampling technique (PCUSTe) based on the logic of frequency-matched case-control studies. They compared it against widely used resampling methods, including SMOTE, ADASYN, and SMOTE with edited nearest neighbors, across six classifiers in both batch and incremental forms, and applied probability correction to account for the shift introduced by resampling. Models were evaluated on a held-out test set using thresholds tied to the observed cumulative incidence.

Results

The study reports that an incrementally trained stochastic gradient descent model on PCUSTe data delivered the strongest overall performance, with a sensitivity of 0.77 (95% CI 0.64-0.89), specificity of 0.65, and AUC of 0.77 (95% CI 0.70-0.84). Logistic regression, by contrast, achieved balanced performance without any resampling (sensitivity 0.70, specificity 0.71, AUC 0.75). The authors note that PCUSTe mainly improved sensitivity in more complex models, and that in some cases adjusting the decision threshold alone matched or beat resampling.

Why this matters

The authors conclude that combining epidemiological principles, such as covariate frequency matching and incidence-based thresholds, can improve minority-class detection and support personalized risk stratification and targeted screening for rare cancers.

Editorial analysis

Class imbalance is a recurring obstacle whenever machine learning is applied to rare clinical outcomes, and this work illustrates a broader pattern in which domain knowledge, not only algorithmic complexity, drives gains. As an early-stage modeling study on a single cohort, external validation would be a typical next step before any clinical use.

Scoring Rationale #

Applied ML research addressing a significant regional disease burden; useful for clinicians and researchers but not a foundational model or industry-shaking result.

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Health & Insurance problems

source & further reading

letsdatascience.com — original article Court Reprimands Lawyer for AI Hallucinations in Briefs Ghostcommit: PNG prompt-injection makes AI agents leak repository secrets Google Expands Gemini Ad Agents In India

~/api · this article 200

$curl api.wpnews.pro/v1/news/study-advances-gi-cancer…

Read original on letsdatascience.com → letsdatascience.com/news/study-advances-gi-cance…

mentioned entities

JMIR Medical Informatics

Daina Baublyte

Jeonghee Lee

Madhawa Gunathilake

Jeongseon Kim

metadata

slugstudy-advances-gi-cancer-risk-prediction-with-ml

topic#machine-learning

secondary2 topics

sentimentneutral

canonicalletsdatascience.com

navigation

← prevAzure secures Kubernetes for AI …

next →Canada launches national AI stra…

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 22 Jul · #machine-learning

SEAM-V: A Hybrid-Decoupled RISC-V Vector Processor

letsdatascience.com · 26 Jun · #machine-learning

Health System Pilots Ambient AI for Clinical Value

letsdatascience.com · 24 Jun · #machine-learning

Study Maps Smart Bioelectronics Research 2020-2024

letsdatascience.com · 24 Jun · #machine-learning

Systematic Review Examines ML Prognostic Models for Spinal Cord Injury

── more on @jmir medical informatics 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required