Hemorica: A Comprehensive CT Scan Dataset for Automated Brain Hemorrhage Classification, Segmentation, and Detection
Authors
Abstract
Timely diagnosis of Intracranial hemorrhage (ICH) on Computed Tomography (CT) scans remains a clinical priority, yet the development of robust Artificial Intelligence (AI) solutions is still hindered by fragmented public data. To close this gap, we introduce Hemorica, a publicly available collection of 372 head CT examinations acquired between 2012 and 2024. Each scan has been exhaustively annotated for five ICH subtypes-epidural (EPH), subdural (SDH), subarachnoid (SAH), intraparenchymal (IPH), and intraventricular (IVH)-yielding patient-wise and slice-wise classification labels, subtype-specific bounding boxes, two-dimensional pixel masks and three-dimensional voxel masks. A double-reading workflow, preceded by a pilot consensus phase and supported by neurosurgeon adjudication, maintained low inter-rater variability. Comprehensive statistical analysis confirms the clinical realism of the dataset. To establish reference baselines, standard convolutional and transformer architectures were fine-tuned for binary slice classification and hemorrhage segmentation. With only minimal fine-tuning, lightweight models such as MobileViT-XS achieved an F1 score of 87.8% in binary classification, whereas a U-Net with a DenseNet161 encoder reached a Dice score of 85.5% for binary lesion segmentation that validate both the quality of the annotations and the sufficiency of the sample size. Hemorica therefore offers a unified, fine-grained benchmark that supports multi-task and curriculum learning, facilitates transfer to larger but weakly labelled cohorts, and facilitates the process of designing an AI-based assistant for ICH detection and quantification systems.