Metrics for Artificial Intelligence in Medicine: A Reference Resource.
Authors
Affiliations (6)
Affiliations (6)
- Balliol College, Radcliffe Department of Medicine, Oxford University, Oxford, UK.
- Department of Radiology, University of North Carolina, Chapel Hill, NC.
- Department of Radiology, University of California San Diego, San Diego, Calif.
- Department of Radiology and Department of Artificial Intelligence and Informatics, Mayo Clinic, Phoenix, Ariz.
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.
- Department of Radiology and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce St, 1 Silverstein, Philadelphia, PA 19104-6243.
Abstract
The effective integration of artificial intelligence (AI) systems into clinical medicine depends on comprehensive and transparent performance evaluation; however, the lack of standardized and widely accepted metrics poses challenges for reproducibility and model adoption. A comprehensive, machine-interpretable framework is presented to formalize the nomenclature and descriptions of 207 graphical, matrix, and scalar metrics used to measure AI model performance. The metrics taxonomy, developed as part of the Radiology Ontology of AI Datasets, Models and Projects (ROADMAP), provides a logically structured representation that captures the semantics of AI evaluation metrics, supports reasoning over metric classes, and enables automated completeness checks for AI model reporting. For each metric, the taxonomy incorporates a definition and citations to authoritative reference sources; where applicable, the taxonomy also includes synonyms, abbreviations, alternate language forms, mathematical formulae, and numerical bounds. The taxonomy supports evaluation of models operating on structured data, medical images, audio signals, and/or unstructured text. Logical axioms link each metric to one or more of 18 AI model performance criteria, including classification, calibration, image segmentation, and text analysis. By harmonizing terminology and enabling structured queries, ROADMAP's taxonomy of AI performance metrics facilitates model comparison, bias detection, and selection of appropriate evaluation methods across diverse datasets and clinical tasks. © RSNA, 2026 See also accompanying Special Report on ROADMAP ontology.