Multi-modal deep learning for predicting functional outcomes in intracerebral hemorrhage using 3D CT and clinical data.
Authors
Affiliations (6)
Affiliations (6)
- School of Software, Henan University, Kaifeng 475004, China. Electronic address: [email protected].
- Department of Neurosurgery, The Second Affiliated Hospital of Fujian Medical University, Quanzhou 362000, China. Electronic address: [email protected].
- School of Computer Science, Nanjing University, Nanjing 210093, China.
- School of Software, Henan University, Kaifeng 475004, China; School of Software, Nanchang University, Nanchang 330031, China. Electronic address: [email protected].
- Beijing Institute for Brain Research, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102206, China; Chinese Institute for Brain Research, Beijing 102206, China. Electronic address: [email protected].
- Department Psychiatry, Ji'an Third People's Hospital, Ji'an 343000, China. Electronic address: [email protected].
Abstract
This study aims to construct and validate a novel multimodal deep learning framework that integrates three-dimensional (3D) computed tomography (CT) images and clinical text information from the early admission phase to accurately predict functional outcomes at 90 days post-onset in patients with intracerebral hemorrhage (ICH). This retrospective study includes 508 ICH patients from two medical centers, divided into a training-validation cohort (n = 391) and an external test cohort (n = 117). We propose a multimodal model utilizing a 3D convolutional neural network (CNN) to extract imaging features and a pre-trained language model (BioClinicalBERT) to encode semantic information from clinical text. The features from these two modalities are deeply integrated via a dual-stream attention mechanism. Functional outcomes are assessed using the modified Rankin Scale (mRS), with a good prognosis defined as mRS < 3 and a poor prognosis as mRS ≥ 3. Model performance is evaluated through comparisons with single-modality baseline models and validated on an independent external dataset. Gradient-weighted class activation mapping (Grad-CAM) is employed for visualization to enhance model interpretability. In internal cross-validation, the multimodal model achieved an accuracy of 0.867 ± 0.027 and an AUC of 0.899 ± 0.015; in comparison, the image-only model obtained an accuracy of 0.696 ± 0.063 and an AUC of 0.718 ± 0.077. On the external test set, the proposed multimodal model reached an accuracy of 0.821 (95 % CI: 0.752-0.889) and an AUC of 0.846 (95 % CI: 0.769-0.919), whereas the image-only model achieved an accuracy of 0.675 (95 % CI: 0.581-0.761, P < 0.05) and an AUC of 0.737 (95 % CI: 0.637-0.821, P < 0.05). The proposed framework effectively integrates 3D CT imaging and clinical text data to accurately predict long-term functional outcomes in ICH patients. The model demonstrates robust performance and interpretability, highlighting its significant potential as an early clinical risk stratification tool. It may provide technical support for advancing personalized precision medicine.