Development of a machine learning model using structured and unstructured features for predicting surgery among patients with carpal tunnel syndrome: development and validation.
Authors
Affiliations (4)
Affiliations (4)
- Department of Radiology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, 20, Boramae-ro 5-gil, Dongjak-gu, Seoul, Republic of Korea.
- Data Science Center, Biomedical Research Institute, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, 20, Boramae-ro 5-gil, Dongjak-gu, Seoul, Republic of Korea. [email protected].
- Department of Orthopaedic Surgery, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, 20, Boramae-ro 5-gil, Dongjak-gu, Seoul, 07061, Republic of Korea. [email protected].
- Department of Orthopaedic Surgery, Seoul National University College of Medicine, Seoul, Republic of Korea. [email protected].
Abstract
Carpal tunnel syndrome (CTS) is the most common entrapment neuropathy and often requires surgery when symptoms persist. However, identifying patients requiring surgery remains challenging. This study aimed to develop a predictive model for surgery in CTS patients by integrating standardized common data model (CDM)-based structured data with unstructured data from electromyography (EMG) reports and radiology reports. We retrospectively analyzed 3602 adults diagnosed with CTS at Seoul Metropolitan Boramae Medical Center, of whom 696 (19.3%) underwent surgery within 365 days. Structured CDM data were combined with unstructured text processed using topic modeling, and EMG severity grades were extracted. Predictive models were developed using LASSO logistic regression, gradient boosting machine (GBM), and random forest. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC) and other metrics, and feature importance was evaluated using SHapley Additive exPlanations (SHAP) values. Models incorporating unstructured text and EMG grades outperformed those using structured data alone (AUROC 0.792 vs. 0.759, P < 0.001), with GBM achieving the best performance. SHAP analysis highlighted EMG grade and key text-derived topics as major predictors. These findings demonstrate that combining structured clinical data with unstructured text analysis can effectively predict surgical outcomes in CTS and support clinical decision-making through an integrated data-driven approach.