Back to all papers

Towards a CMR Foundation Model for Multi-Task Cardiac Image Analysis.

Authors

Jacob AJ,Borgohain I,Chitiboi T,Sharma P,Comaniciu D,Rueckert D

Affiliations (4)

  • Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA; AI in Healthcare and Medicine, TUM University Hospital, Technical University of Munich (TUM), Germany. Electronic address: [email protected].
  • Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA.
  • Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA; Digital Technology and Innovation, Siemens Healthineers AG, Hamburg, Germany.
  • AI in Healthcare and Medicine, TUM University Hospital, Technical University of Munich (TUM), Germany; Department of Computing, Imperial College London, UK.

Abstract

Cardiac magnetic resonance (CMR) is a complex imaging modality requiring a broad variety of image processing tasks for comprehensive assessment of the study. Recently, foundation models (FM) have shown promise for automated image analyses in natural images (NI). In this study, a CMR-specific vision FM was developed and then finetuned in a supervised manner for 9 different imaging tasks typical to a CMR workflow, including classification, segmentation, landmark localization, and pathology detection. A ViT-S/8 model was trained in a self-supervised manner using DINO on 36 million CMR images from 27,524 subjects from three sources (UK Biobank and two clinical centers). The model was then finetuned for 9 tasks: classification (sequence, cine view), segmentation (cine SAX, cine LAX, LGE SAX, Mapping SAX), landmark localization, pathology detection (LGE, cardiac disease), on data from various sources (both public and 3 clinical datasets). The results were compared against metrics from state-of-the-art methods on the same tasks. A comparable baseline model was also trained on the same datasets for direct comparison. Additionally, the effect of pretraining strategy, as well as generalization and few-shot performance (training on few labeled samples) were explored for the pretrained model, compared to the baseline. The proposed model obtained similar performance or moderate improvements to results reported in the literature in most tasks (except disease detection), without any task-specific optimization of methodology. The proposed model outperformed the baseline in most cases, with an average increase of 6.8 percentage points (pp) for cine view classification, and 0.1 to 1.8 pp for segmentation tasks. The proposed method also obtained generally lower standard deviations in the metrics. Improvements of 3.7 and 6.6 pp for hyperenhancement detection from LGE and 14 pp for disease detection were observed. Ablation studies highlighted the importance of pretraining strategy, architecture and the impact of domain shifts from pretraining to finetuning. Moreover, CMR-pretrained model achieved better generalization and few-shot performance compared to the baseline. Vision FM specialized for medical imaging can improve accuracy and robustness over NI-FM. Self-supervised pretraining offers a resource-efficient, unified framework for CMR assessment, with the potential to accelerate the development of deep learning-based solutions for image analysis tasks, even with few annotated data available.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.