Development and validation of an artificial intelligence-based model for diagnosing benign, borderline, and malignant adnexal masses.
Authors
Affiliations (7)
Affiliations (7)
- Cancer Centre, Department of Ultrasound Medicine, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, Hangzhou, Zhejiang, China.
- Key Discipline of Zhejiang Province in Public Health and Preventive Medicine (First Class, Category A), Hangzhou Medical College, Hangzhou, Zhejiang, China.
- School of Mathematical Sciences, Zhejiang University, Zijingang Campus, Hangzhou, Zhejiang, China.
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Third Military Medical University (Army Medical University) and Key Laboratory of Tumor Immunopathology, Ministry of Education of China, Chongqing, China.
- Department of Ultrasound Medicine, Second Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.
- Department of Ultrasound Medicine, Sichuan Provincial Maternity and Child Health Care Hospital, Chengdu, Sichuan, China. [email protected].
- Cancer Centre, Department of Ultrasound Medicine, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, Hangzhou, Zhejiang, China. [email protected].
Abstract
Classification of benign, borderline, and malignant adnexal masses is critical to effective clinical management, but remains a challenge. We developed Clinical-Ovarian Multi-Task Attention (Clinical-OMTA), an artificial intelligence model based on a dual-backbone architecture (benign vs. non-benign, and borderline vs. malignant) that integrates ultrasound, age, and Carbohydrate Antigen 125 (CA125) for multi-class classification. The model's performance, generalisability, and clinical utility were evaluated. Retrospective data were collected from 23 hospitals (1882 patients for training, validation, and internal testing from 21 hospitals; 340 and 159 patients for external testing from two hospitals). In the external image dataset, Clinical-OMTA demonstrated comparable diagnostic performance to ADNEX (area under the receiver operating characteristic curve [AUC]: 0.950 vs. 0.953, 0.870 vs. 0.853, 0.930 vs. 0.938) and subjective assessment by an expert examiner (accuracy: 85.6% vs. 87.4%). While Clinical-OMTA supported multimodal integration, it did not outperform Ovarian Multi-Task Attention (OMTA) that trained only with images, indicating that including age and CA125 did not improve performance. Clinical-OMTA performed similarly across acquisition modes, equipment types, scanning methods, and different centres (accuracy: 79.9%-87.7%). With Clinical-OMTA as a decision support tool, radiologists showed significantly improved inter-reader agreement (kappa: 0.17-0.78 vs. 0.86-0.98) and diagnostic accuracy (72.3% vs. 88.0%). Clinical-OMTA appears generalisable and could be especially useful in low-resource or remote settings where expert ultrasound examiners are scarce.