Performance Comparison of Machine Learning Using Radiomic Features and CNN-Based Deep Learning in Benign and Malignant Classification of Vertebral Compression Fractures Using CT Scans.
Authors
Affiliations (7)
Affiliations (7)
- Department of Bio-Health Medical Engineering, Gil Medical Center, Gachon University, Incheon, Republic of Korea.
- Department of Radiology, Gil Medical Center, Gachon University School of Medicine, 21, Namdong-Daero 774Beon-Gil, Namdong-Gu, Incheon, Republic of Korea.
- Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon, Republic of Korea.
- Department of Radiology, Gil Medical Center, Gachon University School of Medicine, 21, Namdong-Daero 774Beon-Gil, Namdong-Gu, Incheon, Republic of Korea. [email protected].
- Department of Biomedical Engineering, Gachon University, 191, Hambangmoe-Ro, Yeonsu-Gu, Incheon, 21936, Republic of Korea. [email protected].
- Department of Biomedical Engineering, Gachon University College of Medicine, Gil Medical Center, 38-13 Docjeom-Ro 3 Beon-Gil, Namdong-Gu, Incheon, 21565, Republic of Korea. [email protected].
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Seongnam-Si 13120, Republic of Korea. [email protected].
Abstract
Distinguishing benign from malignant vertebral compression fractures is critical for clinical management but remains challenging on contrast-enhanced abdominal CT, which lacks the soft tissue contrast of MRI. This study evaluates and compares radiomic feature-based machine learning and convolutional neural network-based deep learning models for classifying VCFs using abdominal CT. A retrospective cohort of 447 vertebral compression fractures (196 benign, 251 malignant) from 286 patients was analyzed. Radiomic features were extracted using PyRadiomics, with Recursive Feature Elimination selecting six key texture-based features (e.g., Run Variance, Dependence Non-Uniformity Normalized), highlighting textural heterogeneity as a malignancy marker. Machine learning models (XGBoost, SVM, KNN, Random Forest) and a 3D CNN were trained on CT data, with performance assessed via precision, recall, F1 score, accuracy, and AUC. The deep learning model achieved marginally superior overall performance, with a statistically significant higher AUC (77.66% vs. 75.91%, p < 0.05) and better precision, F1 score, and accuracy compared to the top-performing machine learning model (XGBoost). Deep learning's attention maps localized diagnostically relevant regions, mimicking radiologists' focus, whereas radiomics lacked spatial interpretability despite offering quantifiable biomarkers. This study underscores the complementary strengths of machine learning and deep learning: radiomics provides interpretable features tied to tumor heterogeneity, while DL autonomously extracts high-dimensional patterns with spatial explainability. Integrating both approaches could enhance diagnostic accuracy and clinician trust in abdominal CT-based VCF assessment. Limitations include retrospective single-center data and potential selection bias. Future multi-center studies with diverse protocols and histopathological validation are warranted to generalize these findings.