Performance Comparison Between Two Versions of a Commercial Artificial Intelligence System for Chest Radiograph Interpretation: A Multicenter Study.
Authors
Affiliations (2)
Affiliations (2)
- Department of Radiology, HT Médica Jaén Las Nieves, C. Carmelo Torres, 2, Jaen, 23007, Spain. [email protected].
- Department of Radiology, HT Médica Jaén Las Nieves, C. Carmelo Torres, 2, Jaen, 23007, Spain.
Abstract
The purpose of the study was to compare the diagnostic performance of version 1.5.0 and version 1.5.4 of Gleamer ChestView, a deep learning-based artificial intelligence system for chest X-ray analysis, across multiple thoracic findings. A retrospective multicenter study including 187 chest radiographs from six centers using equipment from four manufacturers (Agfa-Gevaert N.V., Mortsel, Belgium; IRay Technology Co., Ltd., Shanghai, China; LG Electronics Inc., Seoul, South Korea; Siemens Healthineers, Erlangen, Germany) was conducted. Inclusion criteria were chest radiographs acquired during the month following the implementation of version 1.5.0 of Gleamer ChestView. Each radiograph was analyzed by both versions. Ground truth was established through chest CT performed within a week of the radiograph when available (49 cases) and consensus by three board-certified general radiologists in the remaining 138 cases. Standard reference included 57 positive cases (pleural effusion, alveolar disease, mediastinal mass, pneumothorax, pulmonary nodule) and 130 normal studies. Performance metrics (sensitivity, specificity, precision, F1 score) were calculated for each version. A total of 187 chest radiographs were analyzed (101 females, 86 males; mean age 59.2 ± 19.7 years; range 15-95). Overall performance improved from version 1.5.0 to 1.5.4, with higher accuracy (87.7% vs 92.5%), precision (75.0% vs 85.2%), specificity (86.9% vs 93.1%), and F1 score (0.816 vs 0.881). For nodule detection, version 1.5.4 showed increased precision (47.8% to 73.3%) while maintaining sensitivity. Gleamer ChestView version 1.5.4 demonstrated improved lesion-specific performance compared to version 1.5.0, with fewer false positives and higher diagnostic confidence. These findings support the implementation of updated AI systems following systematic version-to-version validation.