Development of a novel multimodal deep learning approach to improve diagnostic precision in ovarian cancer.
Authors
Affiliations (6)
Affiliations (6)
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, No. 17, Xu-Zhou Road, ZhongZheng District, Taipei City 100025, Taiwan.
- Department of Obstetrics and Gynecology, National Taiwan University Hospital Hsin-Chu Branch, No. 2, Sec. 1, Shengyi Rd., Zhubei City, Hsinchu County 302058, Taiwan.
- Department of Obstetrics and Gynecology, College of Medicine, National Taiwan University, No. 7, Chung Shan S. Rd., ZhongZheng Dist., Taipei City 100225, Taiwan.
- Department of Obstetrics and Gynecology, National Taiwan University Hospital, No. 7, Chung Shan S. Rd., ZhongZheng Dist., Taipei City 100225, Taiwan.
- Department of Public Health, College of Public Health, National Taiwan University, No. 17, Xu-Zhou Road, ZhongZheng District, Taipei City 100025, Taiwan.
- National Institute of Environmental Health Sciences, National Health Research Institutes, No. 35, Keyan Road, Zhunan Town, Miaoli County 350401, Taiwan.
Abstract
Ovarian cancer represents the primary cause of mortality from gynecological malignancies among women. Treatment strategies for benign versus malignant ovarian tumors differ significantly, making accurate preoperative diagnosis essential for clinical decision-making. Traditional ultrasound diagnosis is highly operator-dependent, introducing subjectivity and variability. To improve diagnostic precision in ovarian tumor classification, we developed a multimodal deep learning system that combines ultrasound images with corresponding clinical text reports. We retrospectively analyzed 1342 ultrasound images from 1062 patients who received surgical treatment for ovarian tumors at National Taiwan University Hospital from 2011 to 2021. Patients were classified into benign (n = 612) and malignant (including borderline, n = 450) groups based on pathology. A multimodal deep learning architecture was developed, incorporating DenseNet-121 and Swin Transformer for image feature extraction and Bio-Clinical BERT for processing clinical text reports. The dataset was split using subject-level stratification with five-fold cross-validation and a 15% independent test set. Furthermore, an external validation cohort of 268 effective cases from 3 independent medical centers was utilized to evaluate the model's generalizability. The multimodal model achieved superior performance at the subject level with 81.77% (95% CI: 75.89%, 86.48%) accuracy, 79.59% (95% CI: 70.57%, 86.38%) sensitivity, 83.81% (95% CI: 75.59%, 89.64%) specificity, and an area under the curve (AUC) of 0.88 (95% CI: 0.83, 0.93). In the external validation, the model maintained robust performance with an accuracy of 88.81%, sensitivity of 92.59%, and specificity of 84.96%, outperforming the International Ovarian Tumor Analysis Simple Rules (accuracy 86.4%). Integration of clinical text information significantly improved diagnostic performance compared to image-only models. Backward selection analysis revealed that both uterine findings and ovarian tumor descriptions contributed synergistically to the final diagnosis. This study successfully developed a multimodal deep learning model with diagnostic performance superior to traditional operator-dependent approaches. The model shows promise as a diagnostic tool for ovarian tumor classification, offering clinicians a way to improve preoperative diagnostic accuracy and enhance patient care quality.