Back to all papers

Task-Based Sampling of Patient Data for Rigorous Machine Learning/AI Performance Assessment.

March 10, 2026pubmed logopapers

Authors

Baughan N,Whitney HM,Drukker K,Sahiner B,Hu T,Kim GH,McNitt-Gray M,Myers KJ,Giger ML

Affiliations (6)

  • Department of Radiation Oncology, Henry Ford Health, Detroit, MI, 48202, USA. [email protected].
  • Department of Radiology, University of Chicago, Chicago, IL, 60637, USA. [email protected].
  • Department of Radiology, University of Chicago, Chicago, IL, 60637, USA.
  • US Food and Drug Administration, Bethesda, MD, USA.
  • University of California Los Angeles, Los Angeles, CA, USA.
  • Puente Solutions, Phoenix, AZ, USA.

Abstract

To assess the performance of an AI algorithm, an independent dataset is needed that matches the intended clinical claim and intended population (e.g., patient characteristics) for which the algorithm is meant. Using all available data for performance assessment may not be practical or optimal; to reduce the risk of sampling bias, the user is expected to utilize training and test data that are representative of the intended population. This work outlines a computational method for task-based sampling of data from a large repository and demonstrates its use, utilizing demographic characteristics and disease states as examples of the clinical attributes to match to an intended population. To run our developed task-based sampling algorithm, the user defines the initial cohort from which to sample, a target distribution profile, and a maximum allowable deviation in any subcategory. The functionality and results of the developed workflow are described in the context of sampling the Medical Imaging and Data Resource Center (MIDRC) data commons for algorithm performance assessment. An initial cohort of over 4000 patients was selected from the MIDRC public data commons. The task-based sampling algorithm was used to select samples matched to an approximate CDC demographic distribution with maximum allowable deviations of 5% and 10%. Resulting final cohorts of 542 and 870 unique patients with average clinical attribute differences of 1.0% and 2.1% were sampled, respectively. This investigation demonstrates that the developed task-based sampling algorithm can generate matched samples from a large dataset for reducing sampling bias in algorithm training and performance assessment.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.