Comparative diagnostic accuracy of ChatGPT-4 and machine learning in differentiating spinal tuberculosis and spinal tumors.

Authors

Hu X,Xu D,Zhang H,Tang M,Gao Q

Affiliations (5)

  • Department of Spine Surgery and Orthopaedics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Orthopedics, The Second Xiangya Hospital of Central South University, Changsha, 410011, Hunan, China.
  • Department of Spine Surgery and Orthopaedics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Spine Surgery, The Third Xiangya Hospital, Central South University, Changsha, Hunan, 410013, China.
  • Department of Spine Surgery and Orthopaedics, Xiangya Hospital, Central South University, Changsha 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China.
  • Department of Spine Surgery and Orthopaedics, Xiangya Hospital, Central South University, Changsha 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China. Electronic address: [email protected].
  • Department of Spine Surgery and Orthopaedics, Xiangya Hospital, Central South University, Changsha 410008, China; National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha 410008, China. Electronic address: [email protected].

Abstract

In clinical practice, distinguishing between spinal tuberculosis (STB) and spinal tumors (ST) poses a significant diagnostic challenge. The application of AI-driven large language models (LLMs) shows great potential for improving the accuracy of this differential diagnosis. To evaluate the performance of various machine learning models and ChatGPT-4 in distinguishing between STB and ST. A retrospective cohort study. A total of 143 STB cases and 153 ST cases admitted to Xiangya Hospital Central South University, from January 2016 to June 2023 were collected. This study incorporates basic patient information, standard laboratory results, serum tumor markers, and comprehensive imaging records, including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT), for individuals diagnosed with STB and ST. Machine learning techniques and ChatGPT-4 were utilized to distinguish between STB and ST separately. Six distinct machine learning models, along with ChatGPT-4, were employed to evaluate their differential diagnostic effectiveness. Among the 6 machine learning models, the Gradient Boosting Machine (GBM) algorithm model demonstrated the highest differential diagnostic efficiency. In the training cohort, the GBM model achieved a sensitivity of 98.84% and a specificity of 100.00% in distinguishing STB from ST. In the testing cohort, its sensitivity was 98.25%, and specificity was 91.80%. ChatGPT-4 exhibited a sensitivity of 70.37% and a specificity of 90.65% for differential diagnosis. In single-question cases, ChatGPT-4's sensitivity and specificity were 71.67% and 92.55%, respectively, while in re-questioning cases, they were 44.44% and 76.92%. The GBM model demonstrates significant value in the differential diagnosis of STB and ST, whereas the diagnostic performance of ChatGPT-4 remains suboptimal.

Topics

Machine LearningSpinal NeoplasmsTuberculosis, SpinalJournal ArticleComparative Study

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.