Domain-Specific and Computer-Vision-Driven Versus General-Purpose AI Models in PA-CXR Analysis: a Comparative Study with Emergency-Medicine Specialists.
Authors
Affiliations (2)
Affiliations (2)
- Department of Emergency Medicine, Haseki Training and Research Hospital, University of Health Sciences, Uğur Mumcu Mah. Belediye Sok. No:7 Sultangazi, Istanbul, Turkey.
- Department of Emergency Medicine, Haseki Training and Research Hospital, University of Health Sciences, Uğur Mumcu Mah. Belediye Sok. No:7 Sultangazi, Istanbul, Turkey. [email protected].
Abstract
This study aimed to evaluate and compare the diagnostic accuracy of three AI models-GPT-5, Xray-GPT, and Qure.ai-with that of emergency-medicine specialists (EMSs) in the interpretation of PA-CXRs using a standardized test set derived from real-world emergency-department cases. In this prospective, cross-sectional diagnostic accuracy study conducted between June and December 2024, a total of 40 PA-CXR questions were compiled from actual emergency-department presentations (comprising 40 patients; mean age: 42.8 ± 16.2 years; 62.5% male) and categorized into six diagnostic groups. Thirty EMSs completed the test once. Each AI model was evaluated based on the same set daily over 30 consecutive days. Diagnostic accuracy was compared across predefined PA-CXR subcategories and clinical case types. Qure.ai achieved the highest overall accuracy (median, 37.0; IQR, 36.8-38.0), significantly outperforming both the EMSs and the GPT-based models (p < 0.001). In contrast, GPT-5 and Xray-GPT each achieved only approximately 50% accuracy and were significantly less accurate than the EMSs across nearly all case categories (p < 0.001). Qure.ai also outperformed all other groups in life-threatening cases, including pneumonia, pneumothorax, and pleural effusion, while performing comparably to the EMSs in identifying normal CXRs, perforated peptic ulcers, and tuberculosis lesions. In conclusion, Qure.ai was the only system to achieve diagnostic performance comparable to or exceeding that of human experts in PA-CXR interpretation. These findings demonstrate the clinical value of domain-specific training and image-optimized architecture in AI systems.