Threshold optimization in AI chest radiography analysis: integrating real-world data and clinical subgroups.
Authors
Affiliations (6)
Affiliations (6)
- Department of Radiology, LMU University Hospital, LMU Munich, Munich, Germany. [email protected].
- XP Technology and Innovation, Siemens Healthineers AG, Forchheim, Germany.
- Department of Radiology, LMU University Hospital, LMU Munich, Munich, Germany.
- Comprehensive Pneumology Center, German Center for Lung Research, Munich, Germany.
- Department of Radiology, Asklepios Fachklinik München, Gauting, Germany.
- Institute of Neuroradiology, LMU University Hospital, LMU Munich, Munich, Germany.
Abstract
Manufacturer-defined AI thresholds for chest x-ray (CXR) often lack customization options. Threshold optimization strategies utilizing users' clinical real-world data along with pathology-enriched validation data may better address subgroup-specific and user-specific needs. A pathology-enriched dataset (study cohort, 563 (CXRs)) with pleural effusions, consolidations, pneumothoraces, nodules, and unremarkable findings was analysed by an AI system and six reference radiologists. The same AI model was applied to a routine dataset (clinical cohort, 15,786 consecutive routine CXRs). Iterative receiver operating characteristic analysis linked achievable sensitivities (study cohort) to resulting AI alert rates in clinical routine inpatient or outpatient subgroups. "Optimized" thresholds (OTs) were defined by a 1% sensitivity increase leading to more than a 1% rise in AI alert rates. Threshold comparisons (OTs versus AI vendor's default thresholds (AIDT) versus Youden's thresholds) were based on 400 clinical cohort cases with expert radiologists' reference. AIDTs, OTs, and Youden's thresholds varied across scenarios, with OTs differing based on tailoring for inpatient or outpatient CXRs. AIDT lowering most reasonably improved sensitivity for pleural effusion, with increases from 46.8% (AIDT) to 87.2% (OT) for outpatients and from 76.3% (AIDT) to 93.5% (OT) for inpatients; similar trends appeared for consolidations. Conversely, regarding inpatient nodule detection, increasing the threshold improved accuracy from 69.5% (AIDT) to 82.5% (OT) without compromising sensitivity. Graphical analysis supports threshold selection by illustrating estimated sensitivities and clinical routine AI alert rates. An innovative, subgroup-specific AI threshold optimization is proposed, automatically implemented and transferable to other AI algorithms and varying clinical subgroup settings. Individually customizing thresholds tailored to specific medical experts' needs and patient subgroup characteristics is promising and may enhance diagnostic accuracy and the clinical acceptance of diagnostic AI algorithms. Customizing AI thresholds individually addresses specific user/patient subgroup needs. The presented approach utilizes pathology-enriched and real-world subgroup data for optimization. Potential is shown by comparing individualized thresholds with vendor defaults. Distinct thresholds for in- and outpatient CXR AI analysis may improve perception. The automated pipeline methodology is transferable to other AI models or subgroups.