Explainable ML Achieves Near-Perfect Alzheimer's Three-Class Detection

An arXiv preprint (arXiv:2606.03995) reports an explainable machine learning model that achieved near-perfect three-class classification of Alzheimer's disease, mild cognitive impairment, and normal cognition using eight routine clinical features from 1,641 ADNI baseline subjects. The XGBoost classifier, optimized with Optuna and SMOTE, attained a macro AUC of 0.982 and Cohen's kappa of 0.909 on a held-out test set of 247 subjects, with SHAP analysis identifying CDR Global and CDR-SB as key discriminators. The findings demonstrate that a compact, explainable pipeline using standard cognitive assessments can match or exceed the performance of imaging-based approaches on the ADNI dataset, though external validation on non-ADNI cohorts remains necessary for clinical deployment.

Explainable ML Achieves Near-Perfect Alzheimer's Three-Class Detection An arXiv preprint arXiv:2606.03995 presents an explainable machine learning approach that distinguishes normal cognition NC , mild cognitive impairment MCI , and Alzheimer's disease AD using routine clinical features from the Alzheimer's Disease Neuroimaging Initiative ADNI . The authors trained a XGBoost classifier on 1,641 baseline subjects using eight features: MMSE , CDR Global , CDR-SB , MoCA , FAQ , age, sex, and education, per the preprint. The model used Optuna hyperparameter tuning, SMOTE for class imbalance, and SHAP for feature-level explainability. On five-fold cross-validation the preprint reports mean macro AUC 0.983 and accuracy 0.944 ; on a held-out test set n = 247 it reports macro AUC 0.982 95% CI: 0.965-0.995 and Cohen's kappa 0.909 . The preprint states future work will add speech biomarkers for multimodal detection. What happened The arXiv preprint arXiv:2606.03995 submitted 14 Apr 2026 reports an explainable machine learning pipeline that performs three-class classification of cognitive status using the Alzheimer's Disease Neuroimaging Initiative ADNI baseline data. The paper reports 1,641 baseline subjects 608 NC, 767 MCI, 266 AD and evaluates a XGBoost model with SHAP explanations, per the preprint. Technical details According to the preprint, the model used eight clinical features: MMSE , CDR Global , CDR Sum of Boxes CDR-SB , MoCA , FAQ , age, sex, and education. Hyperparameters were optimised with Optuna 50 trials and SMOTE addressed class imbalance, the preprint states. Performance metrics reported by the authors include mean five-fold cross-validation macro AUC 0.983 SD 0.007 , accuracy 0.944 SD 0.006 , and a held-out test-set macro AUC 0.982 95% CI: 0.965-0.995 , accuracy 0.943 , macro F1 0.927 , balanced accuracy 0.932 , and Cohen's kappa 0.909 , all as reported in the preprint. The authors used SHAP values to identify feature importance, finding CDR Global dominant for NC and MCI separation and a combination of CDR-SB and MMSE driving AD classification, per the preprint. Editorial analysis Models achieving very high performance on ADNI are common in the literature, but industry observers note such results frequently reflect dataset characteristics, preprocessing choices, and temporal or cohort biases rather than guaranteed clinical generalizability. For practitioners, the use of XGBoost plus SHAP is a pragmatic explainability pattern that aids feature-level interpretation but does not replace prospective external validation. Context and significance For the clinical-ML community, the key reported contribution is a compact, explainable pipeline using routine cognitive assessments rather than imaging or molecular biomarkers. Industry context: compact models that rely on standard clinical scales lower data-collection barriers, which can accelerate initial validation in resource-constrained settings, but they also amplify the need for cross-cohort testing because measurement protocols and patient mixes vary across clinics. What to watch The preprint states future work will extend the framework with speech biomarkers for multimodal detection, per arXiv. For practitioners: observers should look for external validation on non-ADNI cohorts, prospective performance, and robustness checks against assessment timing, rater variability, and demographic shifts. Scoring Rationale The paper reports very strong, explainable performance on a widely used research dataset ADNI , which is notable for modelers and clinical ML teams. Impact is limited by single-dataset evaluation and the need for external prospective validation. Practice with real Ad Tech data 90 SQL & Python problems · 15 industry datasets Active Search Campaigns by BudgetEasy /problems/sql/active-search-campaigns-by-budget High CPC Clicks & Poor Landing PagesMedium /problems/sql/high-cpc-clicks-poor-landing-page Campaign ROAS by Attribution ModelHard /problems/sql/campaign-roas-by-attribution-model 250 free problems · No credit card See all Ad Tech problems /problems/datasets/adtech