Back to all papers

Deep Learning for Differentiating Benign From Malignant Bile Duct Dilation on MRCP: Development and Prospective Evaluation of an Xception-Logistic Regression Ensemble Model.

December 5, 2025pubmed logopapers

Authors

Liu J,Li L,Zhang J,Yang C,Huang X,Shu Y,He X,Shu J

Affiliations (3)

  • Department of Radiology, The Affiliated Hospital, Southwest Medical University, Luzhou, Sichuan, China.
  • Precision Imaging and Intelligent Analysis Key Laboratory of Luzhou, Southwest Medical University, Luzhou, Sichuan, China.
  • Department of Oncology, The Affiliated Hospital, Southwest Medical University, Luzhou, Sichuan, China.

Abstract

Accurate identification of benign and malignant bile duct dilatation (BDD) is needed to determine its management plan. Conventional imaging evaluation is subjective, whereas deep learning (DL) offers potential for automated objective assessment. To construct and evaluate DL models and ensemble strategies based on magnetic resonance cholangiopancreatography (MRCP) images for identifying benign and malignant BDD. Retrospective and prospective. A retrospective cohort (n = 378; median age, 60 years [range: 14, 90]; 194 male) from two institutions and a prospective cohort (n = 60; median age, 62.5 years [range: 15, 86]; 30 male) were included. Retrospective data were randomly stratified split into training, validation, and internal test sets (2:1:1) and an independent external test set. Benign cases were downsampled to balance class distribution. 3 T MRCP (3D turbo spin echo: VISTA and SPACE). The primary retrospective endpoint was area under the curve (AUC) across DL algorithms and ensembles. Prospectively, the accuracy, sensitivity, and specificity of the model was compared with those of three radiologists. Group comparisons used Mann-Whitney U and Chi-square tests (p < 0.05). Model performance was evaluated using the Hosmer-Lemeshow test, DeLong's test with Bonferroni correction (α = 0.005), and McNemar's test. The Xception model achieved AUCs of 0.816 (95% CI, 0.788-0.844) on the internal test set and 0.807 (95% CI, 0.779-0.835) on the external test set. The ensemble model incorporating logistic regression yielded higher patient-level AUCs of 0.890 and 0.885, with good calibration (p = 0.109). No significant differences were observed among the five ensemble strategies (minimum adjusted p = 0.62). In the prospective cohort, the model showed 90.0% accuracy, sensitivity, and specificity, comparable to radiologists (76.7%-86.7%) without a significant difference (p = 0.143, 0.302, and 0.774, respectively). The Xce-LR model shows potential for automating BDD differentiation using MRCP. Stage 2.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.