Appropriateness of acute breast symptom recommendations provided by ChatGPT.

Authors

Byrd C,Kingsbury C,Niell B,Funaro K,Bhatt A,Weinfurtner RJ,Ataya D

Affiliations (3)

  • H. Lee Moffitt Cancer Center and Research Institute, Division of Breast Imaging, Department of Diagnostic Imaging, 12902 USF Magnolia Drive, Tampa, FL 33612, United States of America.
  • University of South Florida Morsani College of Medicine, 560 Channelside Dr, Tampa, FL 33602, United States of America.
  • H. Lee Moffitt Cancer Center and Research Institute, Division of Breast Imaging, Department of Diagnostic Imaging, 12902 USF Magnolia Drive, Tampa, FL 33612, United States of America. Electronic address: [email protected].

Abstract

We evaluated the accuracy of ChatGPT-3.5's responses to common questions regarding acute breast symptoms and explored whether using lay language, as opposed to medical language, affected the accuracy of the responses. Questions were formulated addressing acute breast conditions, informed by the American College of Radiology (ACR) Appropriateness Criteria (AC) and our clinical experience at a tertiary referral breast center. Of these, seven addressed the most common acute breast symptoms, nine addressed pregnancy-associated breast symptoms, and four addressed specific management and imaging recommendations for a palpable breast abnormality. Questions were submitted three times to ChatGPT-3.5 and all responses were assessed by five fellowship-trained breast radiologists. Evaluation criteria included clinical judgment and adherence to the ACR guidelines, with responses scored as: 1) "appropriate," 2) "inappropriate" if any response contained inappropriate information, or 3) "unreliable" if responses were inconsistent. A majority vote determined the appropriateness for each question. ChatGPT-3.5 generated responses were appropriate for 7/7 (100 %) questions regarding common acute breast symptoms when phrased both colloquially and using standard medical terminology. In contrast, ChatGPT-3.5 generated responses were appropriate for 3/9 (33 %) questions about pregnancy-associated breast symptoms and 3/4 (75 %) questions about management and imaging recommendations for a palpable breast abnormality. ChatGPT-3.5 can automate healthcare information related to appropriate management of acute breast symptoms when prompted with both standard medical terminology or lay phrasing of the questions. However, physician oversight remains critical given the presence of inappropriate recommendations for pregnancy associated breast symptoms and management of palpable abnormalities.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.