Feature selection leads to divergent neurobiological interpretations of brain-based machine learning biomarkers.

April 15, 2026

papers

DOI: 10.1038/s41562-026-02447-y PMID: 41986741

Authors

Adkinson BD,Rosenblatt M,Sun H,Dadashkarimi J,Tejavibulya L,Horien C,Westwater ML,Rodriguez RX,Noble S,Scheinost D

Affiliations (13)

Yale School of Medicine, New Haven, CT, USA. [email protected].
Department of Biomedical Engineering, Yale University, New Haven, CT, USA.
Yale School of Medicine, New Haven, CT, USA.
Department of Radiology, Perelman School of Medicine, Philadelphia, PA, USA.
Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, USA.
Department of Radiology & Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA.
Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK.
Department of Bioengineering, Northeastern University, Boston, MA, USA.
Department of Psychology, Northeastern University, Boston, MA, USA.
Institute for Cognitive & Behavioral Health, Northeastern University, Boston, MA, USA.
Department of Statistics & Data Science, Yale University, New Haven, CT, USA.
Child Study Center, Yale School of Medicine, New Haven, CT, USA.
Wu Tsai Institute, Yale University, New Haven, CT, USA.

Abstract

A central objective in human neuroimaging is to understand the neurobiology underlying cognition and mental health. Machine learning models trained on neuroimaging data are increasingly used as tools for predicting behavioural phenotypes, enhancing precision medicine and improving generalizability compared with traditional MRI studies. However, the high dimensionality of brain connectivity data makes model interpretation challenging. Prevailing practices rely on selecting features and, implicitly, interpreting identified feature networks as uniquely representative of a given phenotype while overlooking others. Despite its widespread use, how univariate feature selection balances the trade-off between simplification for optimizing modelling and oversimplification that misrepresents true neurobiology remains understudied. Here, using four large-scale neuroimaging datasets spanning over 12,000 participants and 13 outcomes, we demonstrate that edges discarded by feature selection can achieve significant prediction accuracies while yielding different neurobiological interpretations. These results are observed across cognitive, developmental and psychiatric phenotypes, extend to both functional connectivity (functional MRI) and structural (diffusion tensor imaging) connectomes, and remain evident in external validation. They suggest that focusing on only the top features may simplify the neurobiological bases of brain-behaviour associations. Such interpretations present only the tip of the iceberg when certain disregarded features may be just as meaningful, potentially contributing to ongoing issues surrounding reproducibility within the field. More broadly, our results reinforce that subtle brain-wide signals should not be ignored.

View Source Full Text PDF

Topics

Journal Article

Feature selection leads to divergent neurobiological interpretations of brain-based machine learning biomarkers.

Authors

Affiliations (13)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?