Mind the Gap: Human and AI Uncertainties in Cardiac MRI Segmentation.
Authors
Affiliations (5)
Affiliations (5)
- School of Computer Science, University of Nottingham, Ningbo, China. [email protected].
- School of Computer Science, University of Nottingham, Nottingham, NG81BB, UK.
- Nottingham University Hospitals NHS Trust, Nottingham, NG51PB, UK.
- Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, LS29JT, UK.
- School of Computer Science, University of Nottingham, Ningbo, China.
Abstract
This study conducts quantitative and qualitative analyses to investigate the relationship between human-annotated and AI-derived uncertainty in cardiac MRI segmentation, aiming to enhance the reliability of AI-based cardiac MRI segmentation models and foster better human-AI collaboration. The CMRI dataset used in the experiments consists of 483 scans, each with two types of labels: an annotated segmentation mask and an uncertainty score per CMRI slice, both provided by clinicians. First, the AI-derived uncertainty estimated by the fuzzy-based algorithm is utilized to indicate the quality of segmentation. Multiple levels of uncertainties are derived from the method, including class-wise, slice-wise, subject-wise, etc. Subsequently, they are compared to the human-annotated uncertainty scores. Finally, qualitative analyses are conducted with clinicians to investigate all uncertainty measures potentially impacting real clinical applications. Experimental results show a strong inverse correlation between AI-derived uncertainty and Dice score, a standard metric for segmentation quality, indicating that lower uncertainty predicts higher segmentation quality. Additionally, it is found that human-annotated uncertainty coincides with AI-derived uncertainty for some anatomical structures (e.g., papillary muscle). However, high human-annotated uncertainty does not necessarily correlate with low AI segmentation quality, and there is no obvious association between human-annotated uncertainty and the size of the segmented structure. Concluded from qualitative analysis with clinicians, humans are better at utilizing prior knowledge (e.g. cardiac structural and contextual information) for uncertainty scoring, while the current AI method lacks this capability and is mainly data-driven for decision-making and uncertainty estimation. AI-derived uncertainty could be utilized as quality control for CMRI segmentation. Humans utilize structural and contextual information to formulate uncertainty, while AI models currently lack this capability.