Region-wise stacking ensembles for estimating brain-age using structural MRI.
Authors
Affiliations (3)
Affiliations (3)
- Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre, Jülich, Germany; Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany. Electronic address: [email protected].
- Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre, Jülich, Germany; Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany.
- Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre, Jülich, Germany; Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
Abstract
Predictive modeling using structural magnetic resonance imaging (MRI) data is a prominent approach to study brain-aging. Machine learning frameworks have been employed to improve predictions and explore healthy and accelerated aging due to diseases. The high-dimensional MRI data pose challenges to building generalizable and interpretable models as well as for data privacy. Common practices are resampling or averaging voxels within predefined parcels which reduces anatomical specificity and biological interpretability. Effectively, naive fusion by averaging can result in information loss and reduced accuracy. We present a conceptually novel two-level stacking ensemble (SE) approach. The first level comprises regional models for predicting individuals' age based on voxel-wise information, fused by a second-level model yielding final predictions. Eight data fusion scenarios were explored using Gray matter volume (GMV) estimates from four large datasets. Performance measured using mean absolute error (MAE), R<sup>2</sup>, correlation and prediction bias, showed that SE outperformed the region-wise averages. The best performance was obtained when first-level regional predictions were obtained as out-of-sample predictions on the application site with second-level models trained on independent and site-specific data (MAE = 4.75 vs baseline regional mean GMV MAE = 5.68). Performance improved as more datasets were used for training. First-level predictions showed improved and more robust aging signal providing new biological insights and enhanced data privacy. Overall, the SE improves accuracy compared to the baseline while preserving or enhancing data privacy. Finally, we show the utility of our SE model on a clinical cohort showing accelerated aging in cognitively impaired and Alzheimer's disease patients.