Performance Comparison of Machine Learning Using Radiomic Features and CNN-Based Deep Learning in Benign and Malignant Classification of Vertebral Compression Fractures Using CT Scans.

Authors

Yeom JC,Park SH,Kim YJ,Ahn TR,Kim KG

Affiliations (7)

  • Department of Bio-Health Medical Engineering, Gil Medical Center, Gachon University, Incheon, Republic of Korea.
  • Department of Radiology, Gil Medical Center, Gachon University School of Medicine, 21, Namdong-Daero 774Beon-Gil, Namdong-Gu, Incheon, Republic of Korea.
  • Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon, Republic of Korea.
  • Department of Radiology, Gil Medical Center, Gachon University School of Medicine, 21, Namdong-Daero 774Beon-Gil, Namdong-Gu, Incheon, Republic of Korea. [email protected].
  • Department of Biomedical Engineering, Gachon University, 191, Hambangmoe-Ro, Yeonsu-Gu, Incheon, 21936, Republic of Korea. [email protected].
  • Department of Biomedical Engineering, Gachon University College of Medicine, Gil Medical Center, 38-13 Docjeom-Ro 3 Beon-Gil, Namdong-Gu, Incheon, 21565, Republic of Korea. [email protected].
  • Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Seongnam-Si 13120, Republic of Korea. [email protected].

Abstract

Distinguishing benign from malignant vertebral compression fractures is critical for clinical management but remains challenging on contrast-enhanced abdominal CT, which lacks the soft tissue contrast of MRI. This study evaluates and compares radiomic feature-based machine learning and convolutional neural network-based deep learning models for classifying VCFs using abdominal CT. A retrospective cohort of 447 vertebral compression fractures (196 benign, 251 malignant) from 286 patients was analyzed. Radiomic features were extracted using PyRadiomics, with Recursive Feature Elimination selecting six key texture-based features (e.g., Run Variance, Dependence Non-Uniformity Normalized), highlighting textural heterogeneity as a malignancy marker. Machine learning models (XGBoost, SVM, KNN, Random Forest) and a 3D CNN were trained on CT data, with performance assessed via precision, recall, F1 score, accuracy, and AUC. The deep learning model achieved marginally superior overall performance, with a statistically significant higher AUC (77.66% vs. 75.91%, p < 0.05) and better precision, F1 score, and accuracy compared to the top-performing machine learning model (XGBoost). Deep learning's attention maps localized diagnostically relevant regions, mimicking radiologists' focus, whereas radiomics lacked spatial interpretability despite offering quantifiable biomarkers. This study underscores the complementary strengths of machine learning and deep learning: radiomics provides interpretable features tied to tumor heterogeneity, while DL autonomously extracts high-dimensional patterns with spatial explainability. Integrating both approaches could enhance diagnostic accuracy and clinician trust in abdominal CT-based VCF assessment. Limitations include retrospective single-center data and potential selection bias. Future multi-center studies with diverse protocols and histopathological validation are warranted to generalize these findings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.