Back to all papers

External Validation of a Deep Learning-Based Artificial Intelligence System for Ultrasound Diagnosis of Thyroid Nodules: A Two-Center Retrospective Study.

June 17, 2026pubmed logopapers

Authors

Tang Y,Xu YD,Zhao CK,Fan PL,Jin YJ,Ji ZB,Han H,Xu HX,Xu BH

Affiliations (3)

  • Department of Ultrasound, Zhongshan Hospital, Fudan University, Shanghai, China.
  • Institute of Ultrasound in Medicine and Engineering, Fudan University, Shanghai, China.
  • Shanghai Institute of Medical Imaging, Shanghai, China.

Abstract

To investigate the performance of an artificial intelligence (AI) diagnostic system for thyroid nodule sonography based on deep learning convolutional neural network (CNN). We retrospectively included 485 thyroid nodules with definite pathology in two tertiary hospitals. The AI diagnostic system was constructed for automatic detection and diagnosis of nodules based on deep learning CNN equipped with image mode and video mode. One gray-scale ultrasound (US) image of each nodule from the two hospitals was selected for diagnosis in image mode (AI model<sub>img</sub>). A US video of each nodule from the second hospital was analyzed in video mode (AI model<sub>vid</sub>). Performance of AI model<sub>img</sub>, AI model<sub>vid</sub>, and three radiologists with 3-15 years of US experience was evaluated. Sonographic features probably influencing the accuracy of AI model<sub>img</sub> were screened out by binary logistic regression analysis. Although the experienced radiologist achieved highest sensitivity, accuracy and the area under the receiver operating characteristic curve (AUC) compared to AI model<sub>img</sub> and two junior radiologists, there was no significant difference between AUCs of AI model<sub>img</sub> and experienced radiologist (0.770 [0.718-0.816] vs. 0.799 [0.750-0.843] in first hospital dataset, p = 0.253; 0.731 [0.660-0.794] vs. 0.780 [0.712-0.838] in second hospital dataset, p = 0.105). When US videos were used for diagnosis instead of images, significantly higher specificity (0.575 [0.489-0.661] vs. 0.693 [0.613-0.773], p = 0.003), accuracy (0.667 [0.598-0.736] vs. 0.744 [0.681-0.808], p = 0.002) and AUC (0.731 [0.660-0.794] vs. 0.780 [0.713-0.839], p = 0.016) were achieved by AI model<sub>vid</sub>. AI model<sub>img</sub> was more likely to make a correct diagnosis in benign nodules with circumscribed margin (OR = 3.46, p = 0.003), hyperechoic or isoechoic echogenicity (OR = 8.83, p < 0.001) and none of echogenic foci or with large comet-tail artifacts (OR = 2.28, p = 0.041). Respectively, AI model<sub>img</sub> acquired higher accuracy in malignant nodules with hypoechoic or very hypoechoic echogenicity (OR = 3.33, p = 0.034) and irregular margin (OR = 4.51, p = 0.003). In TR1 and TR2 (ACR TI-RADS risk level) nodules, accuracy of AI model<sub>img</sub> was 100% (6 of 6) and 90% (27 of 30). The AI diagnostic system is feasible and reliable in automatic detection and diagnosis of thyroid nodules and acquires superior performance applied in US videos. Sonographic features of thyroid nodules are a crucial factor influencing the accuracy of the AI model.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.