Deep learning for liver lesion segmentation and classification on staging CT scans of colorectal cancer patients: a multi-site technical validation study.
Authors
Affiliations (10)
Affiliations (10)
- Radiology Department, St Barts Hospital, W Smithfield, London EC1A 7BE, UK. Electronic address: [email protected].
- Faculty of Health, Science, Social Care and Education, Kingston University, Kingston Upon Tames KT2 7LB, UK. Electronic address: [email protected].
- Radiology Department, St Barts Hospital, W Smithfield, London EC1A 7BE, UK. Electronic address: [email protected].
- School of Electronic Engineering and Computer Science, Queen Mary University of London, UK; Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, UK. Electronic address: [email protected].
- School of Electronic Engineering and Computer Science, Queen Mary University of London, UK. Electronic address: [email protected].
- Radiology Department, St Barts Hospital, W Smithfield, London EC1A 7BE, UK. Electronic address: [email protected].
- Radiology Department, St Barts Hospital, W Smithfield, London EC1A 7BE, UK. Electronic address: [email protected].
- Barts Cancer Institute, Queen Mary University, London EC1M 6BQ, UK. Electronic address: [email protected].
- Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, UK. Electronic address: [email protected].
- School of Electronic Engineering and Computer Science, Queen Mary University of London, UK. Electronic address: [email protected].
Abstract
To validate a liver lesion detection and classification model using staging computed tomography (CT) scans of colorectal cancer (CRC) patients. A UNet-based deep learning model was trained on 272 public liver tumour CT scans and tested on 220 CRC staging CTs acquired from a single institution (2014-2019). Performance metrics included lesion detection rates by size (<10 mm, 10-20 mm, >20 mm), segmentation accuracy (dice similarity coefficient, DSC), volume measurement agreement (Bland-Altman limits of agreement, LOAs; intraclass correlation coefficient, ICC), and classification accuracy (malignant vs benign) at patient and lesion levels (detected lesions only). The model detected 743 out of 884 lesions (84%), with detection rates of 75%, 91.3%, and 96% for lesions <10 mm, 10-20 mm, and >20 mm, respectively. The median DSC was 0.76 (95% CI: 0.72-0.80) for lesions <10 mm, 0.83 (95% CI: 0.79-0.86) for 10-20 mm, and 0.85 (95% CI: 0.82-0.88) for >20 mm. Bland-Altman analysis showed a mean volume bias of -0.12 cm<sup>3</sup> (LOAs: -1.68 to +1.43 cm<sup>3</sup>), and ICC was 0.81. Lesion-level classification showed 99.5% sensitivity, 65.7% specificity, 53.8% positive predictive value (PPV), 99.7% negative predictive value (NPV), and 75.4% accuracy. Patient-level classification had 100% sensitivity, 27.1% specificity, 59.2% PPV, 100% NPV, and 64.5% accuracy. The model demonstrates strong lesion detection and segmentation performance, particularly for sub-centimetre lesions. Although classification accuracy was moderate, the 100% NPV suggests strong potential as a CRC staging screening tool. Future studies will assess its impact on radiologist performance and efficiency.