The Effect of Structured Context on Chest Radiograph Interpretation by a Multimodal Large Language Model: A Pilot Comparative Study.
Authors
Affiliations (2)
Affiliations (2)
- Radiology, Rocky Vista University College of Osteopathic Medicine, Parker, USA.
- Radiology, University of Nebraska Medical Center, Omaha, USA.
Abstract
Background Multimodal large language models are increasingly discussed as potential adjuncts for medical image interpretation, yet the extent to which structured contextual guidance affects performance remains uncertain. Objective The aim of this study was to determine whether a structured contextual intervention improves ChatGPT performance on chest radiograph interpretation. Methods We conducted a comparative pilot study using 50 chest radiographs with established reference diagnoses. In the baseline condition, ChatGPT interpreted each image using only the standardized prompt, "Diagnose this X-ray image." In the structured-context condition, the model first reviewed a Radiopaedia-derived teaching module containing 100 labeled chest radiographs spanning 10 common diagnoses and then interpreted the same test set using an author-developed radiologic framework that emphasized study identification, image-quality assessment, systematic visual review, separation of findings from diagnostic impressions, explicit communication of uncertainty, and structured reporting. Performance was assessed with percent-correct scoring, with partial credit awarded when responses demonstrated appropriate reasoning but lacked full specificity. Results Overall accuracy improved from 28% at baseline to 62% after the structured-context intervention. Accuracy reached 100% for chronic obstructive pulmonary disease, pulmonary edema, and foreign body identification, and improved to 80% for pneumothorax, cardiomegaly, and pleural effusion. Performance remained limited for lung cancer (20%) and feeding tube placement (40%), and rib fractures were not identified in either condition. Accuracy for atelectasis declined from 40% to 20%. Conclusions Structured contextual guidance improved performance on selected chest radiograph tasks, but overall diagnostic reliability remained inadequate for independent clinical use. Larger studies with standardized scoring, external validation, and imaging-specific evaluation are needed.