Back to all papers

The Effect of Structured Context on Chest Radiograph Interpretation by a Multimodal Large Language Model: A Pilot Comparative Study.

May 12, 2026pubmed logopapers

Authors

Cusick A,Guy S,Herz C,Manfre C,DeVries R

Affiliations (2)

  • Radiology, Rocky Vista University College of Osteopathic Medicine, Parker, USA.
  • Radiology, University of Nebraska Medical Center, Omaha, USA.

Abstract

Background Multimodal large language models are increasingly discussed as potential adjuncts for medical image interpretation, yet the extent to which structured contextual guidance affects performance remains uncertain. Objective The aim of this study was to determine whether a structured contextual intervention improves ChatGPT performance on chest radiograph interpretation. Methods We conducted a comparative pilot study using 50 chest radiographs with established reference diagnoses. In the baseline condition, ChatGPT interpreted each image using only the standardized prompt, "Diagnose this X-ray image." In the structured-context condition, the model first reviewed a Radiopaedia-derived teaching module containing 100 labeled chest radiographs spanning 10 common diagnoses and then interpreted the same test set using an author-developed radiologic framework that emphasized study identification, image-quality assessment, systematic visual review, separation of findings from diagnostic impressions, explicit communication of uncertainty, and structured reporting. Performance was assessed with percent-correct scoring, with partial credit awarded when responses demonstrated appropriate reasoning but lacked full specificity. Results Overall accuracy improved from 28% at baseline to 62% after the structured-context intervention. Accuracy reached 100% for chronic obstructive pulmonary disease, pulmonary edema, and foreign body identification, and improved to 80% for pneumothorax, cardiomegaly, and pleural effusion. Performance remained limited for lung cancer (20%) and feeding tube placement (40%), and rib fractures were not identified in either condition. Accuracy for atelectasis declined from 40% to 20%. Conclusions Structured contextual guidance improved performance on selected chest radiograph tasks, but overall diagnostic reliability remained inadequate for independent clinical use. Larger studies with standardized scoring, external validation, and imaging-specific evaluation are needed.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.