Radiologist Interaction with AI-Generated Preliminary Reports: A Longitudinal Multi-Reader Study.
Authors
Affiliations (3)
Affiliations (3)
- Mass General Brigham, Department of Radiology. Electronic address: [email protected].
- Asan Medical Center, Department of Radiology.
- Mass General Brigham, Department of Radiology.
Abstract
To investigate the integration of multimodal AI-generated reports into radiology workflow over time, focusing on their impact on efficiency, acceptability, and report quality. A multicase, multireader study involved 756 publicly available chest radiographs interpreted by five radiologists using preliminary reports generated by a radiology-specific multimodal AI model, divided into seven sequential batches of 108 radiographs each. Two thoracic radiologists assessed the final reports using RADPEER criteria for agreement and 5-point Likert scale for quality. Reading times, rate of acceptance without modification, agreement, and quality scores were measured, with statistical analyses evaluating trends across seven sequential batches. Radiologists' reading times for chest radiographs decreased from 25.8 seconds in Batch 1 to 19.3 seconds in Batch 7 (p < .001). Acceptability increased from 54.6% to 60.2% (p < .001), with normal chest radiographs demonstrating high rates (68.9%) compared to abnormal chest radiographs (52.6%; p < .001). Median agreement and quality scores remained stable for normal chest radiographs but varied significantly for abnormal chest radiographs (ps < .05). The introduction of AI-generated reports improved efficiency of chest radiograph interpretation, acceptability increased over time. However, agreement and quality scores showed variability, particularly in abnormal cases, emphasizing the need for oversight in the interpretation of complex chest radiographs.