Validation of aortic valve calcification quantification on contrast-enhanced computed tomography against ex vivo gravimetric analysis: comparison of fixed Hounsfield unit thresholds and deep learning segmentation.
Authors
Affiliations (11)
Affiliations (11)
- Department of Cardiovascular Surgery, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
- Department of Cardiovascular Surgery, Zhongshan Hospital, Fudan University, Shanghai, 200032, China.
- Department of Cardiovascular Surgery, Xijing Hospital, Air Force Medical University, Xi'an, 710032, China.
- Department of Cardiac Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing, 100029, China.
- Department of Cardiothoracic Surgery, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, China.
- Department of Cardiovascular Surgery, Xinqiao Hospital, Army Medical University (Third Military Medical University), Chongqing, 400037, China.
- Department of Cardiovascular Surgery, The Second Xiangya Hospital, Central South University, Changsha, 410011, China.
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China.
- Department of Cardiovascular Surgery, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Southern Medical University, Guangzhou, China. [email protected].
- Department of Cardiovascular Surgery, West China Hospital, Sichuan University, Chengdu, 610041, China. [email protected].
Abstract
Accurate quantification of aortic valve calcification (AVC) on contrast-enhanced computed tomography angiography (CTA) is pivotal for planning surgical and transcatheter aortic valve replacement. The optimal Hounsfield unit (HU) threshold for calcification detection on contrast-enhanced images remains unresolved, and every prior validation study has relied on non-contrast Agatston scoring-itself an imaging estimate-as the reference standard. This study validated two widely used fixed HU thresholds (450 HU and 850 HU) and a self-configuring nnU-Net deep learning model against ex vivo gravimetric calcium weight as an absolute physical ground truth. Four hundred patients were included in a retrospective cohort study with a pre-specified temporal validation split: 300 with CT-confirmed AVC and 100 with normal aortic valves. Fifty chronologically later AVC patients who underwent elective open surgical aortic valve replacement (SAVR) within seven days of clinically indicated pre-operative contrast-enhanced CTA formed the locked surgical validation cohort; their excised native leaflets underwent standardised high-temperature ashing (550 °C, 12 h) and analytical weighing (precision 0.1 mg) to obtain gravimetric calcium mass. The remaining 350 cases served exclusively for nnU-Net development (280 training / 70 internal validation). CT-derived calcium mass-equivalent estimates were quantified on the validation cohort and compared with gravimetric weight using Pearson and Spearman correlation and Bland-Altman analysis. The nnU-Net achieved the strongest observed correlation with gravimetric weight (Pearson r = 0.967; bias + 6.2 mg; RMSE 13.7 mg), significantly outperforming the 450 HU threshold for correlation (r = 0.864; bias + 36.2 mg; RMSE 42.1 mg; Steiger p < 0.001) and showing a non-significant trend toward stronger correlation than 850 HU (r = 0.929; bias + 17.5 mg; RMSE 23.9 mg; Steiger p = 0.085). Compared with 850 HU, nnU-Net provided lower bias and RMSE, although the difference in Pearson r did not reach statistical significance. The 450 HU method exhibited significant proportional bias (p = 0.024), whereas neither 850 HU nor nnU-Net did. The nnU-Net achieved a mean Dice coefficient of 0.873 and intersection-over-union of 0.812. Against physically weighed calcium, nnU-Net deep learning segmentation provided the most favourable overall performance profile on contrast-enhanced CTA, with the lowest bias and RMSE and the strongest observed correlation. The improvement in Pearson correlation over 850 HU represented a non-significant trend, whereas the error and agreement metrics favoured nnU-Net. Among fixed thresholds, 850 HU substantially outperformed 450 HU, offering direct physical-rather than surrogate imaging-evidence to support 850 HU as the preferred fixed threshold in standard contrast-enhanced protocols.