Fractal-driven self-supervised learning enhances early-stage lung cancer GTV segmentation: a novel transfer learning framework.
Authors
Affiliations (4)
Affiliations (4)
- Department of Radiation Oncology, Tohoku University Graduate School of Medicine, Sendai, Japan.
- Department of Radiology, University of Yamanashi, Yamanashi, Japan.
- Department of Radiation Oncology, Tohoku University Graduate School of Medicine, Sendai, Japan. [email protected].
- Department of Computer Science, University of Yamanashi, Yamanashi, Japan.
Abstract
To develop and evaluate a novel deep learning strategy for automated early-stage lung cancer gross tumor volume (GTV) segmentation, utilizing pre-training with mathematically generated non-natural fractal images. This retrospective study included 104 patients (36-91 years old; 81 males; 23 females) with peripheral early-stage non-small cell lung cancer who underwent radiotherapy at our institution from December 2017 to March 2025. First, we utilized encoders from a Convolutional Neural Network and a Vision Transformer (ViT), pre-trained with four learning strategies: from scratch, ImageNet-1K (1,000 classes of natural images), FractalDB-1K (1,000 classes of fractal images), and FractalDB-10K (10,000 classes of fractal images), with the latter three utilizing publicly available models. Second, the models were fine-tuned using CT images and physician-created contour data. Model accuracy was then evaluated using the volumetric Dice Similarity Coefficient (vDSC), surface Dice Similarity Coefficient (sDSC), and 95th percentile Hausdorff Distance (HD95) between the predicted and ground truth GTV contours, averaged across the fourfold cross-validation. Additionally, the segmentation accuracy was compared between simple and complex groups, categorized by the surface-to-volume ratio, to assess the impact of GTV shape complexity. Pre-trained with FractalDB-10K yielded the best segmentation accuracy across all metrics. For the ViT model, the vDSC, sDSC, and HD95 results were 0.800 ± 0.079, 0.732 ± 0.152, and 2.04 ± 1.59 mm for FractalDB-10K; 0.779 ± 0.093, 0.688 ± 0.156, and 2.72 ± 3.12 mm for FractalDB-1K; 0.764 ± 0.102, 0.660 ± 0.156, and 3.03 ± 3.47 mm for ImageNet-1K, respectively. In conditions FractalDB-1K and ImageNet-1K, there was no significant difference in the simple group, whereas the complex group showed a significantly higher vDSC (0.743 ± 0.095 vs 0.714 ± 0.104, p = 0.006). Pre-training with fractal structures achieved comparable or superior accuracy to ImageNet pre-training for early-stage lung cancer GTV auto-segmentation.