Back to all papers

Investigating the Data Addition Dilemma in Longitudinal TBI MRI

Authors

Titikhsha, A.,Akhtar, M.,Mollah, A. M.

Affiliations (1)

  • Carnegie Mellon University

Abstract

Clinical machine learning (CML) in brain MRI analysis often assumes that "more data = better performance." However, when added samples derive from a different distribution than the training set, accuracy can decline--a phenomenon known as the Data Addition Dilemma. Here, we present the first systematic study of this dilemma in longitudinal traumatic brain injury (TBI) MRI, where acute baseline scans (session 1, S1) and follow-up scans (session 2, S2) exhibit pronounced distributional shifts. We make three key contributions. First, we quantify how intra-subject shifts (S1 [->] S2) and inter-subject variability jointly affect classifier performance in a 14-subject (28-scan) cohort spanning mild to severe TBI. Second, we compare four training schemes--(1) intra-session upper bound (S1 [->] S1), (2) cross-session OOD test (S1 [->] S2), (3) pooled training (S1+S2 [->] S1, S2), and (4) LOSO-IPA, which augments training with one unlabeled S2 scan per patient--using a lightweight logistic-regression model on five-component PCA features. Third, we derive actionable deployment insights: naive pooling can impair accuracy; pooled training trades baseline performance for robustness; and LOSO-IPA recovers near-intra-session accuracy. Accordingly, we recommend unlabeled per-subject follow-up anchoring and diagonal CORrelation ALignment (CORAL) covariance adjustment prior to inference. These findings clarify when additional data aid versus hinder CML in medical imaging and establish a minimally invasive framework for reliable longitudinal severity assessment in TBI.

Topics

health informatics

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.