Back to all papers

Performance of chest X-ray with computer-aided detection powered by deep learning-based artificial intelligence for tuberculosis presumptive identification during case finding in the Philippines.

Authors

Marquez N,Carpio EJ,Santiago MR,Calderon J,Orillaza-Chi R,Salanap SS,Stevens L

Affiliations (4)

  • TBIHSS, FHI 360, Makati City, Philippines. [email protected].
  • Philippine College of Radiology (PCR), Quezon City, Philippines.
  • TBIHSS, FHI 360, Makati City, Philippines.
  • Asia Pacific Regional Office, FHI 360, Bangkok, Thailand.

Abstract

The Philippines' high tuberculosis (TB) burden calls for effective point-of-care screening. Systematic TB case finding using chest X-ray (CXR) with computer-aided detection powered by deep learning-based artificial intelligence (AI-CAD) provided this opportunity. We aimed to comprehensively review AI-CAD's real-life performance in the local context to support refining its integration into the country's programmatic TB elimination efforts. Retrospective cross-sectional data analysis was done on case-finding activities conducted in four regions of the Philippines between May 2021 and March 2024. Individuals 15 years and older with complete CXR and molecular World Health Organization-recommended rapid diagnostic (mWRD) test results were included. TB presumptive was detected either by CXR or TB signs and symptoms and/or official radiologist readings. The overall diagnostic accuracy of CXR with AI-CAD, stratified by different factors, was assessed using a fixed abnormality threshold and mWRD as the standard reference. Given the imbalanced dataset, we evaluated both precision-recall (PRC) and receiver operating characteristic (ROC) plots. Due to limited verification of CAD-negative individuals, we used "pseudo-sensitivity" and "pseudo-specificity" to reflect estimates based on partial testing. We identified potential factors that may affect performance metrics. Using a 0.5 abnormality threshold in analyzing 5740 individuals, the AI-CAD model showed high pseudo-sensitivity at 95.6% (95% CI, 95.1-96.1) but low pseudo-specificity at 28.1% (26.9-29.2) and positive predictive value (PPV) at 18.4% (16.4-20.4). The area under the operating characteristic curve was 0.820, whereas the area under the precision-recall curve was 0.489. Pseudo-sensitivity was higher among males, younger individuals, and newly diagnosed TB. Threshold analysis revealed trade-offs, as increasing the threshold score to 0.68 saved more mWRD tests (42%) but led to an increase in missed cases (10%). Threshold adjustments affected PPV, tests saved, and case detection differently across settings. Scaling up AI-CAD use in TB screening to improve TB elimination efforts could be beneficial. There is a need to calibrate threshold scores based on resource availability, prevalence, and program goals. ROC and PRC plots, which specify PPV, could serve as valuable metrics for capturing the best estimate of model performance and cost-benefit ratios within the context-specific implementation of resource-limited settings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.