Back to all papers

A MultiRater MultiOrgan Abdominal CT Dataset for Calibration Analysis and Uncertainty Modeling in Segmentation.

January 9, 2026pubmed logopapers

Authors

Riera-Marin M,Kleiss JM,Aubanell A,Antolin A,Moreno-Vedia J,Rodriguez-Comas J,O K S,May M,Garcia-Lopez J,Galdran A,González Ballester MA

Affiliations (11)

  • Sycai Technologies SL, Scientific and Technical Department, Barcelona, 08018, Spain. [email protected].
  • Universitat Pompeu Fabra, BCN Medtech, Department of Engineering, Barcelona, 08018, Spain. [email protected].
  • Universitätsklinikum Erlangen, Department of Radiology of the Uniklinikum Erlangen (UKER), Erlangen, 91054, Germany.
  • Hospital de Sant Pau i la Santa Creu, Diagnostic Imaging Department, Barcelona, 08025, Spain.
  • Institut de Recerca Sant Pau - Centre CERCA, Advanced Medical Imaging, Artificial Intelligence, and Imaging-Guided Therapy Research Group, Barceona, 08025, Spain.
  • Hospital Universitari Vall d'Hebron, Department of Radiology, Institut de Diagnòstic per la Imatge (IDI), Barcelona, 08035, Spain.
  • Sycai Technologies SL, Scientific and Technical Department, Barcelona, 08018, Spain.
  • Universitat Pompeu Fabra, BCN Medtech, Department of Engineering, Barcelona, 08018, Spain.
  • University Hospital Erlangen, Imaging Science Institute, Erlangen, 91054, Germany.
  • Tecnalia Research and Innovation (Tecnalia), Basque Research and Technology Alliance (BRTA), Bizkaia, 48160, Spain.
  • Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, 08010, Spain.

Abstract

In medical imaging, deep learning (DL) models often struggle to delineate ambiguous structures such as tumors or organ boundaries, leading to uncertainty in defining precise contours. This challenge is amplified by inter-rater variability, where experts may disagree on boundary delineations, resulting in inconsistent segmentation outcomes. Addressing these issues requires robust algorithms capable of quantifying uncertainty, standardizing annotation practices, and improving calibration to ensure reliable predictions, particularly in multi-class and multi-rater scenarios. When models are miscalibrated and overconfident, their outputs can mislead clinical decision-making, potentially influencing radiologists to over- or under-estimate malignancy risks. The CURVAS challenge (Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation) was established to address these challenges by jointly assessing uncertainty, calibration, and segmentation quality, as well as promoting clinical relevance by evaluating organ volumes while accounting for annotation variability. To support this, a dataset of 90 contrast-enhanced CT scans from University Hospital Erlangen was curated, containing pancreas, liver, and kidney segmentations annotated by three experts. This resource provides a foundation for developing and benchmarking algorithms that balance segmentation accuracy, calibration, and reliability. A quantitative analysis of the annotations shows that kidney and liver segmentations exhibit strong consistency, whereas the pancreas remains challenging, emphasizing the need for refined labeling protocols and improved training strategies.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 8,200+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.