Deep learning ensemble models for CT-based differentiation of malignant and benign sacral bone tumors: development and evaluation.
Authors
Affiliations (7)
Affiliations (7)
- Department of Radiology, Peking University People's Hospital, Beijing, P. R. China.
- Department of Radiology, Peking University Third Hospital, Beijing, P. R. China.
- Intelligent Manufacturing Research Institute, Visual 3D Medical Science and Technology Development, Beijing, P. R. China.
- Department of Radiology, Shanxi Provincial People's Hospital, Taiyuan, China.
- Department of Radiology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
- Department of Radiology, Peking University People's Hospital, Beijing, P. R. China. [email protected].
Abstract
Radiologists often face challenges in differentiating benign from malignant sacral bone lesions due to their similar imaging characteristics. This study aimed to develop an ensemble deep learning (DL) model that can preoperatively distinguish between benign and malignant sacral tumors using noncontrast computed tomography images. Preoperative sacral CT scans from 569 patients with confirmed sacral lesions were analyzed. Data from Center 1 were utilized in model development and internal test via fivefold cross-validation, and those from Centers 2 and 3 were employed in external test. Various ensemble models combining human-readable interpretation and DL were developed. The diagnostic performance of the models and radiologists was assessed using metrics such as precision, recall, accuracy, area under the curve (AUC), F1 score, and confusion matrix. Furthermore, the clinical benefits derived from radiologists' interpretations and supported by the DL model were evaluated. The ensemble model, which integrates 3D-DenseNet121 with human interpretation, exhibited the most robust performance. The ensemble model demonstrated high performance on the internal and external test sets and achieved AUCs of 0.9139 and 0.8713, F1 scores of 0.9054 and 0.8571, precision of 0.9041 and 0.8824, recall of 0.9136 and 0.8333, and accuracy of 0.8630 and 0.8182, respectively. Across the external test cohort, all radiologists experienced improvements in AUC, accuracy, sensitivity, and specificity. Notably, junior radiologists demonstrated significant improvements compared with senior radiologists. The potential clinical application of the DL model lies in its capacity to considerably enhance the diagnostic efficiency of radiologists. This study presents the first ensemble deep learning model integrating 3D-DenseNet121 with radiologists' interpretation for preoperative differentiation of sacral tumors on noncontrast CT that improved diagnostic performance across all experience levels, particularly for junior radiologists. First artificial intelligence-radiologist ensemble for noncontrast computed tomography (NCCT)-based sacral tumor classification. Boosts all radiologists' performance, with the greatest gains for juniors, potentially reducing referrals. Enables reliable NCCT diagnosis, overcoming contrast/magnetic resonance imaging dependency in musculoskeletal oncology.