Multimodal Generative Artificial Intelligence Model for Creating Radiology Reports for Chest Radiographs in Patients Undergoing Tuberculosis Screening.
Authors
Affiliations (8)
Affiliations (8)
- Mass General Brigham, Boston, USA.
- St. Mary's Hospital, Seoul, South Korea.
- Korea University College of Medicine, Seoul, South Korea.
- Severence Hospital, Seoul, Korea.
- SoombitAI, Seoul South Korea.
- Kakaobrain, Seoul, South Korea.
- Seoul National University Hospital, Seoul, South Korea.
- Kakao corp, Seoul, South Korea.
Abstract
<b>Background:</b> Chest radiographs play a crucial role in tuberculosis screening in high-prevalence regions, although widespread radiographic screening requires expertise that may be unavailable in settings with limited medical resources. <b>Objectives:</b> To evaluate a multimodal generative artificial intelligence (AI) model for detecting tuberculosis-associated abnormalities on chest radiography in patients undergoing tuberculosis screening. <b>Methods:</b> This retrospective study evaluated 800 chest radiographs obtained from two public datasets originating from tuberculosis screening programs. A generative AI model was used to create free-text reports for the radiographs. AI-generated reports were classified in terms of presence versus absence and laterality of tuberculosis-related abnormalities. Two radiologists independently reviewed the radiographs for tuberculosis presence and laterality in separate sessions, without and with use of AI-generated reports and recorded if they would accept the report without modification. Two additional radiologists reviewed radiographs and clinical readings from the datasets to determine the reference standard. <b>Results:</b> By the reference standard, 422/800 radiographs were positive for tuberculosis-related abnormalities. For detection of tuberculosis-related abnormalities, sensitivity, specificity, and accuracy were 95.2%, 86.7%, and 90.8% for AI-generated reports; 93.1%, 93.6%, and 93.4% for reader 1 without AI-generated reports; 93.1%, 95.0%, and 94.1% for reader 1 with AI-generated reports; 95.8%, 87.2%, and 91.3% for reader 2 without AI-generated reports; and 95.8%, 91.5%, and 93.5% for reader 2 with AI-generated reports. Accuracy was significantly lower for AI-generated reports than for both readers alone (p<.001), but significantly higher with than without AI-generated reports for one reader (reader 1: p=.47; reader 2: p=.47). Localization performance was significantly lower (p<.001) for AI-generated reports (63.3%) than for reader 1 (79.9%) and reader 2 (77.9%) without AI-generated reports and did not significantly change for either reader with AI-generated reports (reader 1: 78.7%, p=.71; reader 2: 81.5%, p=.23). Among normal and abnormal radiographs, reader 1 accepted 91.7% and 52.4%, while reader 2 accepted 83.2% and 37.0%, respectively, of AI-generated reports. <b>Conclusion:</b> While AI-generated reports may augment radiologists' diagnostic assessments, the current model requires human oversight given inferior standalone performance. <b>Clinical Impact:</b> The generative AI model could have potential application to aid tuberculosis screening programs in medically underserved regions, although technical improvements remain required.