Radiologist-Large Language Model Collaboration in Dermatologic Ultrasound Reporting: Evaluating the Clinical Utility of ChatGPT.
Authors
Affiliations (2)
Affiliations (2)
- Department of Dermatology, Ankara Bilkent City Hospital, Ankara, Türkiye.
- Department of Radiology, Ankara 29 Mayıs State Hospital, Ankara, Türkiye.
Abstract
Large language models (LLMs) are increasingly explored in medical imaging, but their reliability in independently interpreting images remains uncertain. This study evaluated the clinical utility of radiology reports generated under three reporting conditions using dermatologic ultrasound images: Condition 1 (radiologist's reporting), Condition 2 (LLM reporting based solely on the ultrasound image), and Condition 3 (LLM reporting using both the ultrasound image and the radiologist's report). A total of 202 dermatologic ultrasound images from a public dataset were analyzed. Reports were evaluated for diagnostic accuracy, appropriateness of next-step recommendations, and readability. Diagnostic accuracy was highest in Condition 3 (83.2%), compared with Condition 1 (55.4%) and Condition 2 (26.2%) (p<0.001). Next-step suggestion accuracy was also highest in Condition 3 (77.7%), followed by Condition 2 (59.9%) and Condition 1 (38.6%) (p<0.001). Report readability was also highest in Condition 3. Integrating LLM outputs with radiologist reports may improve clinical communication and decision support in dermatologic ultrasound.