Deep Learning-Based Multimodal Fusion of Ultrasound, Cytology, and Clinical Features to Distinguish Follicular Thyroid Carcinoma from Adenoma: A Multicenter Study.

April 14, 2026

papers

DOI: 10.1016/j.acra.2026.03.044 PMID: 41982038

Authors

Guo XF,Zhou L,Bao XY,Liu SQ,Feng JW,Zhu YL,Jiang Y,Zhang SY

Affiliations (6)

Department of Ultrasound, The Second Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China (X.-F.G., L.Z., S.-Y.Z.).
Department of Ultrasound, Zhangjiagang Hospital of Traditional Chinese Medicine, Nanjing, Jiangsu, China (X.-Y.B.).
Department of Ultrasound, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China (S.-Q.L.).
Department of Thyroid Surgery, The Third Affiliated Hospital of Soochow University, Changzhou First People's Hospital, Changzhou, Jiangsu, China (J.-W.F., Y.J.).
Department of Gastrointestinal Surgery, Southeast University Affiliated Xuzhou Central Hospital, Xuzhou, Jiangsu, China (Y.-L.Z.).
Department of Ultrasound, The Second Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China (X.-F.G., L.Z., S.-Y.Z.). Electronic address: [email protected].

Abstract

Preoperative differentiation between follicular thyroid carcinoma (FTC) and follicular thyroid adenoma (FTA) remains challenging due to overlapping cytological and ultrasonographic features. This study aimed to develop and validate a multimodal deep learning model integrating ultrasound images, fine-needle aspiration cytology (FNAC) images, and clinical features for preoperative differentiation of FTC from FTA in patients with cytologically indeterminate follicular thyroid neoplasms. This retrospective multicenter study included 714 patients with pathologically confirmed follicular thyroid neoplasms from three medical centers. Patients were divided into training set (n = 304), internal validation set (n = 130), and two external validation sets (n = 201 and n = 79). The multimodal model employed Swin Transformer for ultrasound feature extraction (intratumoral and peritumoral regions), attention-based multiple instance learning for cytological image processing, and self-attention multilayer perceptron for clinical feature encoding. Cross-modal feature fusion was achieved through a Transformer module. Model performance was evaluated using area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). The multimodal fusion model achieved AUCs of 0.947, 0.933, 0.936, and 0.928 in the training set, internal validation set, external validation set 1, and external validation set 2, respectively. Compared with unimodal models, the multimodal model demonstrated significant improvements (all P < 0.001): AUC increased by 0.080-0.098 versus the ultrasound model, 0.165-0.183 versus the cytological model, and 0.236-0.254 versus the clinical model. The addition of the peritumoral region improved ultrasound model AUC by 0.042-0.043 (P < 0.05). Modality contribution analysis revealed that the peritumoral ultrasound weight was significantly higher in FTC than FTA cases (29.3-30.5% vs. 23.9-24.6%, P < 0.001). DCA demonstrated superior net benefit of the multimodal model across threshold probabilities of 0.1-0.8. The multimodal deep learning model integrating ultrasound, cytological, and clinical features demonstrated favorable diagnostic performance for preoperative differentiation between FTC and FTA. The inclusion of peritumoral region provided significant incremental diagnostic value. This model may serve as an effective auxiliary tool for individualized diagnosis and treatment decision-making in patients with cytologically indeterminate follicular thyroid neoplasms.

View Source Full Text PDF

Topics

Journal Article

Deep Learning-Based Multimodal Fusion of Ultrasound, Cytology, and Clinical Features to Distinguish Follicular Thyroid Carcinoma from Adenoma: A Multicenter Study.

Authors

Affiliations (6)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?