Automated characterization of abdominal MRI exams using deep learning.
Authors
Affiliations (6)
Affiliations (6)
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA. [email protected].
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. [email protected].
Abstract
Advances in magnetic resonance imaging (MRI) have revolutionized disease detection and treatment planning. However, the growing volume and complexity of MRI data-along with heterogeneity in imaging protocols, scanner technology, and labeling practices-creates a need for standardized tools to automatically identify and characterize key imaging attributes. Such tools are essential for large-scale, multi-institutional studies that rely on harmonized data to train robust machine learning models. In this study, we developed convolutional neural networks (CNNs) to automatically classify three core attributes of abdominal MRI: pulse sequence type, imaging orientation, and contrast enhancement status. Three distinct CNNs with similar backbone architectures were trained to classify single image slices into one of 12 pulse sequences, 4 orientations, or 2 contrast classes. The models achieved high classification accuracies of 99.51%, 99.87%, and 99.99% for pulse sequence, orientation, and contrast, respectively. We applied Grad-CAM to visualize image regions influencing pulse sequence predictions and highlight relevant anatomical features. To enhance performance, we implemented a majority voting approach to aggregate slice-level predictions, achieving 100% accuracy at the volume level for all tasks. External validation using the Duke Liver Dataset demonstrated strong generalizability; after adjusting for class label mismatch, volume-level accuracies exceeded 96.9% across all classification tasks.