Multi-center evaluation of radiomics and deep learning to stratify malignancy risk of IPMNs.
Authors
Affiliations (17)
Affiliations (17)
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, United States. [email protected].
- Department of BiomedicalEngineering, University of Wisconsin-Madison, Madison, United States.
- Machine & Hybrid Intelligence Lab, Department of Radiology, Northwestern University, Chicago, United States.
- Department of Radiology, Bakirkoy Dr. Sadi Konuk Research And Training Hospital, Istanbul, Turkey.
- Department of Preventive Medicine (Biostatistics), Northwestern University, Chicago, United States.
- Department of Biomedical Informatics, Stony Brook University Hospital, Stony Brook, United States.
- University of Catania, Catania, Italy.
- Department of Internal Medicine, Istanbul University, Istanbul, Turkey.
- Department of Radiology, Istanbul University, Istanbul, Turkey.
- Division of Gastroenterology and Hepatology, New York University, New York, United States.
- Nvidia (United States), Bethesda, United States.
- Department of Radiology, Columbia University, New York, United States.
- Department of Radiology and Nuclear Medicine, Erasmus MC, Rotterdam, Netherlands.
- Departments of Gastroenterology and Hepatology, Erasmus MC, Rotterdam, Netherlands.
- Department of Radiology, Mayo Clinic, Jacksonville, United States.
- Departments of Radiology, Biomedical Engineering, Medical Physics, University of Wisconsin-Madison, Madison, United States.
- William S. Middleton Memorial Veterans Hospital, Madison, United States.
Abstract
Distinguishing high-risk intraductal papillary mucinous neoplasms (IPMNs) from low-risk lesions remains a clinical challenge, often resulting in unnecessary procedures due to limited specificity of current methods. While radiomics and deep learning (DL) have been explored for pancreatic cancer, cyst-level malignancy risk stratification of IPMNs remains untapped. Our multi-institutional assessed the feasibility of AI for predicting IPMN dysplasia grade using cyst-level image features using 359 T2-weighted (T2W) MRI images from seven centers. We developed and compared 2D and 3D radiomics-only, DL-only, and radiomics-DL fusion models using expert radiologist scoring as a baseline reference. Model performance was evaluated using held-out test data. The radiomics-DL fusion model showed the highest discriminatory ability on the test set AUC of 69.2%, outperforming the radiomics-only model, AUC of 66.5%. Expert accuracy varied widely from 37.4% to 66.7%, and the inter-rater agreement varied as well with weighted Cohen's kappa coefficients of 0.33-0.67. The fusion model, which combines DL with radiomics features from routine T2W MRI, shows promise for objective, cyst-level risk stratification of IPMNs in a multi-center cohort, outperforming radiomics-only models and nearly matching expert radiologists using only T2W and T1-weighted (T1W) sequences. While performance requires improvement for standalone clinical use, this approach offers a scalable, non-invasive method to potentially improve diagnostic accuracy and reduce unnecessary surgical interventions.