Development of a machine learning predictive model for early detection of breast cancer.
Authors
Affiliations (3)
Affiliations (3)
- Department of Health Information Management, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
- Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
- School of Information Science, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
Abstract
Breast cancer remains a significant global health concern, with over 7.8 million cases reported in the last five years. Early detection and accurate classification are crucial for reducing mortality rates and improving outcomes. Machine learning (ML) has emerged as a transformative tool in medical imaging, enabling more efficient and accurate diagnostic processes. This study aims to develop a machine learning-based predictive model for early detection and classification of breast cancer using the Wisconsin Breast Cancer Diagnostic dataset. The dataset, comprising 569 samples and 32 features derived from fine needle aspirate biopsy images, was pre-processed through data cleaning, normalization using the Robust Scaler, and feature selection. Five supervised ML algorithms-Logistic Regression, Support Vector Classification (SVC) with linear and radial basis function (RBF) kernels, Decision Tree, and Random Forest-were implemented. Models were evaluated using performance metrics, including accuracy, precision, sensitivity, specificity, and F1 scores. The SVC-RBF model demonstrated the highest accuracy (98.68%) and balanced performance across other metrics, making it the most effective classifier for distinguishing between benign and malignant tumors. Key features such as texture mean and area (worst) significantly contributed to classification accuracy. This study highlights the potential of ML algorithms, particularly SVC-RBF, to revolutionize breast cancer diagnostics through improved accuracy and efficiency. Future research should validate these findings with diverse datasets and explore their integration into clinical workflows to enhance decision-making and patient care.