Back to all papers

How threshold customisation affects the performance of a multiclass X-ray AI model for primary care triage: a retrospective study.

February 12, 2026pubmed logopapers

Authors

Sim JZT,Lin J,Fong QW,Soon AYQ,Khin LW,Balakrishnan S,Lin T,Wong S,Tan CH

Affiliations (7)

  • Diagnostic Radiology, Tan Tock Seng Hospital, Singapore, Singapore [email protected].
  • Clinical Research and Innovation Office, Tan Tock Seng Hospital, Singapore, Singapore.
  • Geylang Polyclinic, National Healthcare Group Polyclinics, Singapore, Singapore.
  • Diagnostic Radiology, Tan Tock Seng Hospital, Singapore, Singapore.
  • Resaro, Singapore, Singapore.
  • Clinical Research Unit, National Healthcare Group Polyclinics, Singapore, Singapore.
  • Lee Kong Chian School of Medicine, Singapore, Singapore.

Abstract

To describe the structured process of threshold optimisation for a commercially available multiclass chest X-ray (CXR) deep learning model, to evaluate its diagnostic performance across different operating thresholds, and to estimate its potential operational impact within an artificial intelligence (AI)-enabled triage workflow in a primary care setting. Retrospective diagnostic performance evaluation with threshold-based analysis. Primary care radiography services in Singapore, using data derived from two primary care clinics and a tertiary hospital. A total of 816 adult frontal chest radiographs were included (multiethnic Asian, 464 males, 352 females; mean age 60.8 years). Images were selected to represent the spectrum of findings often encountered in primary care. Exclusion criteria included paediatric studies, lateral or oblique radiographs, and findings not supported by the AI model (eg, bony abnormalities and medical devices). Primary outcome measures were sensitivity, specificity, and negative and positive predictive value (NPV and PPV). Secondary outcomes included estimated potential operational improvement, which is calculated by dividing the number of true negatives by the total number of CXRs. At the default threshold of 0.15, the AI model achieved a sensitivity of 87.3% (95% CI 83.9% to 90.4%) and an NPV of 87.0% (95% CI 83.6% to 90.2%). Lowering the threshold to 0.10 increased sensitivity to 93.2% (95% CI 90.7% to 95.5%) and NPV to 91.3% (95% CI 88.2% to 94.3%), with specificity of 71.7% (95% CI 67.3% to 76.1%). These trade-offs were considered acceptable for a safety-focused co-triage workflow prioritising minimisation of false negatives. Threshold optimisation is critical for adapting AI models to context-specific clinical workflows. Our study shows that adjusting the operating threshold enabled prioritisation of sensitivity and NPV, supporting safe AI-assisted triage in primary care. This is a deeply collaborative process that must involve radiology and clinical teams: selecting appropriate thresholds aligned with clinical objectives for safe and effective implementation. Future work will assess real-world operational impact and user acceptance following prospective deployment.

Topics

TriagePrimary Health CareArtificial IntelligenceRadiography, ThoracicDeep LearningJournal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.