Hybrid metaheuristic feature selection for breast cancer detection in digital mammography: a radiomics and deep learning pilot feasibility study.
Authors
Affiliations (1)
Affiliations (1)
- Department of Medical Laboratory Sciences, College of Applied Medical Sciences, Shaqra University, Shaqra, Saudi Arabia. [email protected].
Abstract
Artificial intelligence (AI) can improve breast cancer detection in mammography, but high-dimensional feature spaces and feature-selection instability remain challenging. This study developed a hybrid metaheuristic feature-selection framework that combines radiomics and deep learning features and evaluated its methodological feasibility on a small real mammography pilot and a controlled synthetic comparison designed to test behavior under collapse-prone conditions. Using the public CBIS-DDSM dataset, 2,051 Image Biomarker Standardization Initiative (IBSI)-compliant radiomic features and 2,048-dimensional deep features from a pretrained, non-fine-tuned EfficientNet-B5 model were extracted for each lesion region of interest (ROI). A hybrid Grasshopper Optimization Algorithm and Crow Search Algorithm (GOA-CSA) with a proposed multi-constraint fitness function was used to select an optimal feature subset for a multilayer perceptron (MLP) classifier. Performance was assessed on a small CBIS-DDSM pilot subset (n = 22, 5-fold stratified cross-validation) and on a synthetic dataset (N = 16, D = 1114) designed to compare the proposed fitness against a legacy fitness under collapse-prone conditions. On the CBIS-DDSM pilot, the hybrid GOA-CSA model selected an average of 486 features, achieving a cross-validated area under the receiver operating characteristic curve (AUC) of 0.750 ± 0.433 and sensitivity of 0.433 ± 0.435, compared with an all-features baseline AUC of 0.900 ± 0.224 and sensitivity of 0.667 ± 0.471. In the synthetic comparison, the proposed fitness achieved an AUC of 0.810 ± 0.115 and sensitivity of 0.571 ± 0.198 versus 0.476 ± 0.210 and 0.286 ± 0.241, respectively, for the legacy fitness. The collapse-prevention penalty was implemented but was not empirically triggered in this pilot because both models maintained non-zero sensitivity. This pilot feasibility study demonstrates that the hybrid GOA-CSA framework can successfully identify compact feature subsets combining radiomic and deep features. The results are exploratory and hypothesis-generating, and the small real-data sample size limits definitive performance evaluation. The synthetic experiment supports the conceptual value of the multi-constraint fitness design, but the collapse-prevention penalty remains empirically unvalidated on real mammography data. External validation on independent cohorts such as VinDr-Mammo remains a crucial subject for future work.