Back to all papers

Application of artificial intelligence chatbots in interpreting magnetic resonance imaging reports: a comparative study.

Authors

Bai X,Feng M,Ma W,Liao Y

Affiliations (3)

  • Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, No. 1 Shuaifuyuan Hutong, Dongcheng District, Beijing, 100730, China.
  • Department of Neurosurgery, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, No. 1 Shuaifuyuan Hutong, Dongcheng District, Beijing, 100730, China. [email protected].
  • Neurosurgery of The First Affiliated Hospital, Jinan University, Guangzhou, China. [email protected].

Abstract

Artificial intelligence (AI) chatbots have emerged as promising tools for enhancing medical communication, yet their efficacy in interpreting complex radiological reports remains underexplored. This study evaluates the performance of AI chatbots in translating magnetic resonance imaging (MRI) reports into patient-friendly language and providing clinical recommendations. A cross-sectional analysis was conducted on 6174 MRI reports from tumor patients across three hospitals. Two AI chatbots, GPT o1-preview (Chatbot 1) and Deepseek-R1 (Chatbot 2), were tasked with interpreting reports, classifying tumor characteristics, assessing surgical necessity, and suggesting treatments. Readability was measured using Flesch-Kincaid and Gunning Fog metrics, while accuracy was evaluated by medical reviewers. Statistical analyses included Friedman and Wilcoxon signed-rank tests. Both chatbots significantly improved readability, with Chatbot 2 achieving higher Flesch-Kincaid Reading Ease scores (median: 58.70 vs. 46.00, p < 0.001) and lower text complexity. Chatbot 2 outperformed Chatbot 1 in diagnostic accuracy (92.05% vs. 89.03% for tumor classification; 95.12% vs. 84.73% for surgical necessity, p < 0.001). Treatment recommendations from Chatbot 2 were more clinically relevant (98.10% acceptable vs. 75.41%), though both demonstrated high empathy (92.82-96.11%). Errors included misinterpretations of medical terminology and occasional hallucinations. AI chatbots, particularly Deepseek-R1, effectively enhance the readability and accuracy of MRI report interpretations for patients. However, physician oversight remains critical to mitigate errors. These tools hold potential to reduce healthcare burdens but require further refinement for clinical integration.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.