A Machine Learning Model for Predicting the HER2 Positive Expression of Breast Cancer Based on Clinicopathological and Imaging Features.

Authors

Qin X,Yang W,Zhou X,Yang Y,Zhang N

Affiliations (5)

  • College of Clinical Medicine, Ningxia Medical University, Yinchuan 750004, PR China (X.Q., X.Z.). Electronic address: [email protected].
  • Department of Radiology, General Hospital of Ningxia Medical University, Yinchuan 750004, PR China (W.Y.). Electronic address: [email protected].
  • College of Clinical Medicine, Ningxia Medical University, Yinchuan 750004, PR China (X.Q., X.Z.). Electronic address: [email protected].
  • Information Technology Center, 32752 Troop, Xiangyang 441000, PR China (Y.Y.). Electronic address: [email protected].
  • Department of Pathology, General Hospital of Ningxia Medical University, Yinchuan 750004, PR China (N.Z.). Electronic address: [email protected].

Abstract

To develop a machine learning (ML) model based on clinicopathological and imaging features to predict the Human Epidermal Growth Factor Receptor 2 (HER2) positive expression (HER2-p) of breast cancer (BC), and to compare its performance with that of a logistic regression (LR) model. A total of 2541 consecutive female patients with pathologically confirmed primary breast lesions were enrolled in this study. Based on chronological order, 2034 patients treated between January 2018 and December 2022 were designated as the retrospective development cohort, while 507 patients treated between January 2023 and May 2024 were designated as the prospective validation cohort. The patients were randomly divided into a train cohort (n=1628) and a test cohort (n=406) in an 8:2 ratio within the development cohort. Pretreatment mammography (MG) and breast MRI data, along with clinicopathological features, were recorded. Extreme Gradient Boosting (XGBoost) in combination with Artificial Neural Network (ANN) and multivariate LR analyses were employed to extract features associated with HER2 positivity in BC and to develop an ANN model (using XGBoost features) and an LR model, respectively. The predictive value was assessed using a receiver operating characteristic (ROC) curve. Following the application of Recursive Feature Elimination with Cross-Validation (RFE-CV) for feature dimensionality reduction, the XGBoost algorithm identified tumor size, suspicious calcifications, Ki-67 index, spiculation, and minimum apparent diffusion coefficient (minimum ADC) as key feature subsets indicative of HER2-p in BC. The constructed ANN model consistently outperformed the LR model, achieving the area under the curve (AUC) of 0.853 (95% CI: 0.837-0.872) in the train cohort, 0.821 (95% CI: 0.798-0.853) in the test cohort, and 0.809 (95% CI: 0.776-0.841) in the validation cohort. The ANN model, built using the significant feature subsets identified by the XGBoost algorithm with RFE-CV, demonstrates potential in predicting HER2-p in BC.

Topics

Breast NeoplasmsMachine LearningReceptor, ErbB-2Magnetic Resonance ImagingJournal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.