Back to all papers

Real-World Generalizability of Alzheimer's Volumetric MRI Machine-Learning Models: External Validation with British Data.

June 24, 2026pubmed logopapers

Authors

Pereira HR,Diogo VS,Fonseca JM,Ferreira HA,Prata DP

Affiliations (7)

  • Institute of Biophysics and Biomedical Engineering, Faculty of Sciences, University of Lisbon, University of Lisbon, Lisbon, Portugal.
  • UNINOVA-CTS, NOVA School of Science and Technology, Lisbon, Portugal.
  • CIS-Iscte, University Institute of Lisbon, Lisbon, Portugal.
  • Instituto de Fisiologia, Faculdade de Medicina da Universidade de Lisboa, Lisbon, Portugal.
  • Laboratório de Instrumentação, Engenharia Biomedica e da Física das Radiações (LIBPhys-UNL), School of Science and Technology,, Universidade Nova de Lisboa, Lisbon, Portugal. [email protected].
  • Institute of Biophysics and Biomedical Engineering, Faculty of Sciences, University of Lisbon, University of Lisbon, Lisbon, Portugal. [email protected].
  • Institute of Psychiatry, Psychology and Neuroscience, King's College London, King's College London, London, United Kingdom. [email protected].

Abstract

Assessing generalizability and performance of machine learning models in clinical settings is crucial. In this study, we aimed to test our models' performance on an external validation real-world clinical dataset and to evaluate the impact of magnetic field strength and brain volume normalization on the classification. We validated two previously published models trained on public datasets (Alzheimer's disease [AD], mild cognitive impairment [MCI] and cognitively normal [CN] subjects) on a real-world clinical dataset from UK memory clinics (SLaM-BRC: 255 non-AD [subjects without cognitive complaints], 281 MCI and 711 AD). Our 'CN vs. AD' model showed similar performance when tested at different magnetic fields (1.5T vs. 3.0T: 87.1% of 93 subjects had the same class assignment). The volume-normalized 'CN vs. AD' model (87.7% balanced accuracy [BAC]) led to decreased performance in the SLaM-BRC (81.5% BAC) due to misclassifications of non-AD subjects with evidence of hippocampal atrophy. The non-normalized 'CN vs. MCI vs. AD' model's performance (initially 55.3% BAC) remained similar in the SLaM-BRC cohort (BAC: SLaM-BRC = 54.6%). Volumes normalized to the estimated total intracranial volume led to a smaller difference in performance between internal and external datasets than non-normalized volumes. Our 'CN vs. MCI vs. AD' model performance remained the same, denoting robustness. These findings suggest that dataset and disease/diagnostic heterogeneities, magnetic field, and brain volume normalization may affect models' performance.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.