Back to all papers

Metrics for Artificial Intelligence in Medicine: A Reference Resource.

March 11, 2026pubmed logopapers

Authors

Gonzales RA,Takahashi MS,Retson T,Banerjee I,Park SH,Kahn CE

Affiliations (6)

  • Balliol College, Radcliffe Department of Medicine, Oxford University, Oxford, UK.
  • Department of Radiology, University of North Carolina, Chapel Hill, NC.
  • Department of Radiology, University of California San Diego, San Diego, Calif.
  • Department of Radiology and Department of Artificial Intelligence and Informatics, Mayo Clinic, Phoenix, Ariz.
  • Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.
  • Department of Radiology and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce St, 1 Silverstein, Philadelphia, PA 19104-6243.

Abstract

The effective integration of artificial intelligence (AI) systems into clinical medicine depends on comprehensive and transparent performance evaluation; however, the lack of standardized and widely accepted metrics poses challenges for reproducibility and model adoption. A comprehensive, machine-interpretable framework is presented to formalize the nomenclature and descriptions of 207 graphical, matrix, and scalar metrics used to measure AI model performance. The metrics taxonomy, developed as part of the Radiology Ontology of AI Datasets, Models and Projects (ROADMAP), provides a logically structured representation that captures the semantics of AI evaluation metrics, supports reasoning over metric classes, and enables automated completeness checks for AI model reporting. For each metric, the taxonomy incorporates a definition and citations to authoritative reference sources; where applicable, the taxonomy also includes synonyms, abbreviations, alternate language forms, mathematical formulae, and numerical bounds. The taxonomy supports evaluation of models operating on structured data, medical images, audio signals, and/or unstructured text. Logical axioms link each metric to one or more of 18 AI model performance criteria, including classification, calibration, image segmentation, and text analysis. By harmonizing terminology and enabling structured queries, ROADMAP's taxonomy of AI performance metrics facilitates model comparison, bias detection, and selection of appropriate evaluation methods across diverse datasets and clinical tasks. © RSNA, 2026 See also accompanying Special Report on ROADMAP ontology.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.