A fully optimized deep learning framework for explainable and efficient kidney stone detection in computed tomography imaging.
Authors
Affiliations (2)
Affiliations (2)
- Computer Engineering Department, Faculty of Engineering and Natural Sciences, Sivas University of Science and Technology, Sivas, Turkey.
- Computer Engineering Department, Faculty of Engineering and Natural Sciences, Sivas University of Science and Technology, Sivas, Turkey. [email protected].
Abstract
To develop and rigorously evaluate a hyperparameter-optimized custom convolutional neural network for kidney stone detection on axial computed tomography (CT) and to compare it, under matched optimization conditions, with widely used pretrained backbones, emphasizing statistical reliability, computational efficiency, and cross-institutional generalization. A custom convolutional neural network with residual blocks was trained on 3,364 axial CT slices (Stone n = 1,577; Non-stone n = 1,787) under a stratified 70/15/15 split. The same Bayesian hyperparameter optimization (1,000 trials per architecture, tree-structured Parzen estimator) was applied uniformly to the proposed model and six pretrained backbones (ResNet101, DenseNet121, EfficientNetB3, MobileNetV3Large, VGG16, Xception). Robustness was characterized by stratified five-fold cross-validation, ten repeated fixed-split runs with controlled seeds, and 95% confidence intervals. External validation used an independent Bangladesh CT cohort (4,121 slices). Interpretability was probed with Grad-CAM++. The proposed model and ResNet101 jointly achieved the best test performance (accuracy 0.9960, F1 0.9957, ROC-AUC ≥ 0.9995). The proposed model attained the lowest single-sample inference latency (2.24 ms; 1.5-53× faster than baselines) with 1.53 M parameters (2.3-28× fewer). Five-fold cross-validation gave Macro-F1 0.9856 ± 0.0036; the Friedman test confirmed significant cross-architecture differences (p < 0.001). 20% in-domain calibration recovered F1 to 0.9806 on the external cohort. Grad-CAM + + activations were anatomically plausible. Custom architecture search under a unified Bayesian protocol can match the ceiling of fully-optimized pretrained backbones at a fraction of their computational cost. Methodologically consistent benchmarking, not peak-accuracy reporting, is essential for clinically deployable kidney stone detection.