Could a New Method of Acromiohumeral Distance Measurement Emerge? Artificial Intelligence vs. Physician.

Authors

Dede BT,Çakar İ,Oğuz M,Alyanak B,Bağcıer F

Affiliations (5)

  • Department of Physical Medicine and Rehabilitation, Prof. Dr. Cemil Tascioglu City Hospital, Istanbul, Turkey. [email protected].
  • Department of Radiology, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.
  • Department of Physical Medicine and Rehabilitation, Istanbul Training and Research Hospital, Istanbul, Turkey.
  • Department of Physical Medicine and Rehabilitation, Golcuk Necati Celik State Hospital, Kocaeli, Turkey.
  • Department of Physical Medicine and Rehabilitation, Basaksehir Cam and Sakura City Hospital, Istanbul, Turkey.

Abstract

The aim of this study was to evaluate the reliability of ChatGPT-4 measurement of acromiohumeral distance (AHD), a popular assessment in patients with shoulder pain. In this retrospective study, 71 registered shoulder magnetic resonance imaging (MRI) scans were included. AHD measurements were performed on a coronal oblique T1 sequence with a clear view of the acromion and humerus. Measurements were performed by an experienced radiologist twice at 3-day intervals and by ChatGPT-4 twice at 3-day intervals in different sessions. The first, second, and mean values of AHD measured by the physician were 7.6 ± 1.7, 7.5 ± 1.6, and 7.6 ± 1.7, respectively. The first, second, and mean values measured by ChatGPT-4 were 6.7 ± 0.8, 7.3 ± 1.1, and 7.1 ± 0.8, respectively. There was a significant difference between the physician and ChatGPT-4 between the first and mean measurements (p < 0.0001 and p = 0.009, respectively). However, there was no significant difference between the second measurements (p = 0.220). Intrarater reliability for the physician was excellent (ICC = 0.99); intrarater reliability for ChatGPT-4 was poor (ICC = 0.41). Interrater reliability was poor (ICC = 0.45). In conclusion, this study demonstrated that the reliability of ChatGPT-4 in AHD measurements is inferior to that of an experienced radiologist. This study may help improve the possible future contribution of large language models to medical science.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.