The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.

July 1, 2025

papers DOI: 10.1007/s00404-024-07901-8 PMID: 39883137

Authors

Cai L,Golatta M,Sidey-Gibbons C,Barr RG,Pfob A

Affiliations (8)

Department of Obstetrics and Gynecology, Breast Cancer Center, Heidelberg University Hospital, Im Neuenheimer Feld 440, 69120, Heidelberg, Germany.
Breast Centre Heidelberg, Klinik St. Elisabeth, Heidelberg, Germany.
MD Anderson Center for INSPiRED Cancer Care (Integrated Systems for Patient-Reported Data), The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Department of Symptom Research, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Department of Radiology, Northeast Ohio Medical University, Ravenna, OH, USA.
Department of Obstetrics and Gynecology, Breast Cancer Center, Heidelberg University Hospital, Im Neuenheimer Feld 440, 69120, Heidelberg, Germany. [email protected].
Breast Centre Heidelberg, Klinik St. Elisabeth, Heidelberg, Germany. [email protected].
National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany. [email protected].

Abstract

Artificial Intelligence models based on medical (imaging) data are increasingly developed. However, the imaging software on which the original data is generated is frequently updated. The impact of updated imaging software on the performance of AI models is unclear. We aimed to develop machine learning models using shear wave elastography (SWE) data to identify malignant breast lesions and to test the models' generalizability by validating them on external data generated by both the original updated software versions. We developed and validated different machine learning models (GLM, MARS, XGBoost, SVM) using multicenter, international SWE data (NCT02638935) using tenfold cross-validation. Findings were compared to the histopathologic evaluation of the biopsy specimen or 2-year follow-up. The outcome measure was the area under the curve (AUROC). We included 1288 cases in the development set using the original imaging software and 385 cases in the validation set using both, original and updated software. In the external validation set, the GLM and XGBoost models showed better performance with the updated software data compared to the original software data (AUROC 0.941 vs. 0.902, p < 0.001 and 0.934 vs. 0.872, p < 0.001). The MARS model showed worse performance with the updated software data (0.847 vs. 0.894, p = 0.045). SVM was not calibrated. In this multicenter study using SWE data, some machine learning models demonstrated great potential to bridge the gap between original software and updated software, whereas others exhibited weak generalizability.

View Source Full Text PDF

Topics

Breast NeoplasmsMachine LearningSoftwareElasticity Imaging TechniquesJournal ArticleMulticenter Study

The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.

Authors

Affiliations (8)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?