Back to all papers

Submanifold sparse convolutional networks for automated 3D segmentation of kidneys and kidney tumours in computed tomography.

June 1, 2026pubmed logopapers

Authors

Alonso-Monsalve S,Whitehead LH,Aurisano A,Escudero Sanchez L

Affiliations (4)

  • Institute for Particle Physics and Astrophysics, ETH Zürich, Zürich, 8093, Switzerland. [email protected].
  • Department of Physics, University of Cambridge, Cambridge, CB3 0US, UK.
  • Department of Physics, University of Cincinnati, Cincinnati, 45221-0011, OH, USA.
  • Department of Radiology, University of Cambridge, Cambridge, CB2 0QQ, UK. [email protected].

Abstract

Accurate delineation of kidney tumours in Computed Tomography (CT) is essential for downstream quantitative analysis and precision oncology that could enable personalised treatments, but manual segmentation is a specialised task, time-consuming and difficult to scale in routine practice. Automated 3D segmentation remains challenging in medical imaging, where images are large and dense volumes of data, making high-resolution processing with conventional dense convolutional neural networks computationally expensive, and often reliant on downsampling or patch-based inference. To overcome this problem, we propose a two-stage 3D segmentation methodology based on voxel sparsification and submanifold sparse convolutional networks (SSCNs). In Stage 1, a low-resolution sparse network identifies a region of interest (ROI); in Stage 2, a high-resolution sparse network performs refined segmentation within the cropped ROI. This design enables native 3D processing at high resolution while reducing CPU/GPU memory usage and inference time. We evaluate the method on the KiTS23 dataset of renal cancer CT scans using 5-fold cross-validation. Our method achieved Dice similarity coefficients of 95.8% for kidneys + masses, 85.7% for tumours + cysts, and 80.3% for tumours alone, with performance competitive with top KiTS23 approaches. In direct comparisons on the same cross-validation folds, the proposed sparse method achieves tumour + cyst and tumour-only Dice scores comparable to, and slightly higher than, a patch-based nnU-Net baseline, while consistently requiring less VRAM and shorter inference time across the tested hardware. Across the tested GPUs, our sparse model is markedly faster than both nnU-Net and the zero-shot zoom-out/zoom-in foundation model SegVol, which localises kidneys well but underperforms on small heterogeneous lesions. Compared to an equivalent dense implementation of the same architecture, the proposed sparse approach achieves up to a 60% reduction in inference time and up to a 75% reduction in VRAM usage across both CPU and the GPU configurations tested.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.