The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.

Authors

Cai L,Golatta M,Sidey-Gibbons C,Barr RG,Pfob A

Affiliations (8)

  • Department of Obstetrics and Gynecology, Breast Cancer Center, Heidelberg University Hospital, Im Neuenheimer Feld 440, 69120, Heidelberg, Germany.
  • Breast Centre Heidelberg, Klinik St. Elisabeth, Heidelberg, Germany.
  • MD Anderson Center for INSPiRED Cancer Care (Integrated Systems for Patient-Reported Data), The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
  • Department of Symptom Research, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
  • Department of Radiology, Northeast Ohio Medical University, Ravenna, OH, USA.
  • Department of Obstetrics and Gynecology, Breast Cancer Center, Heidelberg University Hospital, Im Neuenheimer Feld 440, 69120, Heidelberg, Germany. [email protected].
  • Breast Centre Heidelberg, Klinik St. Elisabeth, Heidelberg, Germany. [email protected].
  • National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany. [email protected].

Abstract

Artificial Intelligence models based on medical (imaging) data are increasingly developed. However, the imaging software on which the original data is generated is frequently updated. The impact of updated imaging software on the performance of AI models is unclear. We aimed to develop machine learning models using shear wave elastography (SWE) data to identify malignant breast lesions and to test the models' generalizability by validating them on external data generated by both the original updated software versions. We developed and validated different machine learning models (GLM, MARS, XGBoost, SVM) using multicenter, international SWE data (NCT02638935) using tenfold cross-validation. Findings were compared to the histopathologic evaluation of the biopsy specimen or 2-year follow-up. The outcome measure was the area under the curve (AUROC). We included 1288 cases in the development set using the original imaging software and 385 cases in the validation set using both, original and updated software. In the external validation set, the GLM and XGBoost models showed better performance with the updated software data compared to the original software data (AUROC 0.941 vs. 0.902, p < 0.001 and 0.934 vs. 0.872, p < 0.001). The MARS model showed worse performance with the updated software data (0.847 vs. 0.894, p = 0.045). SVM was not calibrated. In this multicenter study using SWE data, some machine learning models demonstrated great potential to bridge the gap between original software and updated software, whereas others exhibited weak generalizability.

Topics

Breast NeoplasmsMachine LearningSoftwareElasticity Imaging TechniquesJournal ArticleMulticenter Study

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.