Back to all papers

How causal inference tools can support debiasing of machine learning models for meaningful brain-based predictions

November 4, 2025medrxiv logopreprint

Authors

Komeyer, V.,Eickhoff, S. B.,Rathkopf, C.,Grefkes, C.,Patil, K. R.,Raimondo, F.

Affiliations (1)

  • Research Center Juelich

Abstract

Machine learning (ML) offers transformative opportunities for neurobiomedicine, yet predictive models often exploit confounding-driven associations rather than genuine biological mechanisms, undermining generalizability and neurobiomedical validity. Current practice commonly defines confounders heuristically (e.g., age, sex) or correlationally, risking confusion with colliders or mediators. To address this, we propose a pragmatically integratable, causally informed three-step framework for confounder selection and adjustment aimed to support debiased, meaningful neurobiomedical supervised ML (SML) models. Step 1 involves a domain-knowledge-driven causal analysis of a specific research question, formalized in a directed acyclic graph (DAG). Step 2 applies graph-theoretic rules to the DAG to identify valid deconfounding variables. Additionally, it provides strategies for unmeasured variables, including discussion of their theoretical and practical strengths and limitations. Step 3 integrates the causal justification with empirical associations, ensuring that only statistically relevant confounders are adjusted for. We illustrate the frameworks practical application using a UK Biobank-based brain-behaviour prediction example and demonstrate the substantial impact of confounding on predictive models - underscoring the necessity of proper deconfounding. Despite the populartity of linear feature residualization, its reliance on linear assumptions and adjustment of only features (or target) limits its effectiveness. As a potential solution, we introduce double machine learning, originally developped for causal inference, and discuss its adaptability to associative SML. Importantly, causally informed deconfounded SML models should not be causally interpreted without further justifications. Nevertheless, they are essential for producing robust, generalizable, and neurobiomedically meaningful predictive insights.

Topics

health systems and quality improvement

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.