Forecasting Alzheimer's Disease Progression with Deep Multimodal Learning: Integration of 3D MRI and Tabular Clinical Records via a Large Vision-Language Model
Authors
Affiliations (1)
Affiliations (1)
- Oregon Health & Science University
Abstract
BackgroundAccurate forecasting of Alzheimers Disease (AD) progression is critical for personalized patient management and clinical trial stratification. However, current predictive models often struggle to effectively integrate high-dimensional neuroimaging with longitudinal clinical data. We introduce AD-LLaVA-3D, a novel multimodal framework designed to bridge this gap by adapting large vision-language models for volumetric and temporal forecasting. MethodsWe leveraged the LLaVA-NeXT-Video architecture to treat 3D MRI volumes as temporal sequences, enabling the model to process volumetric imaging alongside longitudinal Tabular Clinical Records (TCR). The model was trained on the Alzheimers Disease Neuroimaging Initiative (ADNI) cohort (n=764) and evaluated using a rigorous patient-level split. We assessed its ability to forecast a suite of future clinical indicators (e.g., CDR-SB, MMSE) against traditional machine learning baselines (Lasso, Random Forest, Gradient Boosting) and specialized deep learning models (ResNet-3D, Med-Flamingo). ResultsAD-LLaVA-3D demonstrated superior predictive accuracy on the ADNI test set, achieving a Coefficient of Determination (R2) of 0.68 for the critical CDR-SB score, surpassing the best-performing baseline (R2 = 0.66). Crucially, in an independent external validation on the Open Access Series of Imaging Studies (OASIS) cohort (n=76), our model exhibited exceptional generalization (R2 = 0.82, MSE = 0.54), whereas comparison models showed significant performance degradation (R2 < 0.60). ConclusionsThis study presents the first application of a video-based multimodal architecture for AD progression forecasting. By effectively integrating 3D MRI with tabular clinical records, AD-LLaVA-3D offers a robust, generalizable tool for monitoring disease trajectories, significantly advancing predictive capabilities beyond current unimodal or static methods. HighlightsFirst-in-Class Architecture: We introduce the first application of video-based Large Vision-Language Models (LVLMs) to interpret 3D volumetric MRI as a temporal sequence, capturing longitudinal neurodegeneration more effectively than static 3D-CNNs. Robust External Validation: The model achieved superior predictive accuracy (R2 = 0.82) on an independent external cohort (OASIS), demonstrating exceptional generalization beyond the training population (ADNI). Data-Efficient Multimodal Integration: We developed a novel prompting strategy that integrates sparse Tabular Clinical Records (TCR) without artificial imputation, allowing the model to leverage incomplete real-world medical history. Clinical Trial Enrichment: By accurately forecasting future cognitive scores (CDR-SB, MMSE), AD-LLaVA-3D serves as a precise screening tool to identify "rapid progressors" for clinical trials, potentially reducing failure rates in drug development.1