Back to all papers

Impact of data augmentation size on deep learning-based third lumbar vertebra computed tomography skeletal muscle segmentation performance.

May 18, 2026pubmed logopapers

Authors

Zhao X,Du Y,Zhu X,Liu Y,Xiao Y,Tian H

Affiliations (5)

  • School of Information Network Security, People's Public Security University of China, Beijing, China.
  • Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China.
  • Institute of Medical Technology, Peking University Health Science Center, Beijing, China.
  • Proton Center & Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus) & Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Autonomous Region, China.
  • School of National Security, People's Public Security University of China, Beijing, China.

Abstract

Various deep learning (DL) models have been proposed for skeletal muscle segmentation at the level of the third lumbar vertebra (L3) on computed tomography (CT) images, typically incorporating data augmentation to improve model robustness and mitigate overfitting. However, the impact of augmentation size on model performance remains unclear. This study aims to quantitatively evaluate this impact and provide evidence-based guidance for selecting appropriate augmentation sizes. A total of 400 patients diagnosed with rectal or cervical cancer were included, each with a pelvic CT series acquired using a clinical CT scanner. For each patient, skeletal muscle regions were manually annotated on three consecutive axial CT images at the L3 level to serve as ground truth. The dataset was divided into training, validation, and test sets comprising 280, 60, and 60 patients, respectively. Ten experimental groups (Labs 1-10) were established by varying the number of augmented images included in the training set to represent different augmentation sizes. Lab 1 contained only the original images. To create Labs 2-10, each original training image was augmented up to nine times using random combinations of horizontal flipping, translation, scaling, and Gaussian noise addition. Specifically, Labs 2-9 incorporated 1-8 randomly selected augmented images per original image, while Lab 10 included all nine augmented images. To enhance experimental reliability, the dataset creation process for Labs 2-10 was repeated ten times. Three representative DL models-U-Net, attention U-Net, and attention V-Net-were trained on each group and evaluated on the same test set using five quantitative metrics: dice similarity coefficient (DSC), precision, recall, 95th percentile Hausdorff distance (HD95), and average surface distance (ASD). Model performance generally improved with increasing augmentation size, accompanied by enhanced robustness in skeletal muscle morphology and boundary delineation. All three models achieved their best performance within Labs 7-10. The U-Net achieved the best averaged DSC, precision, recall, HD95, ASD values of 97.481%, 97.299%, 98.040%, 1.782 mm, and 0.268 mm, respectively. The attention U-Net achieved corresponding best values of 97.455%, 97.208%, 98.013%, 1.822 mm, and 0.270 mm, while the attention V-Net achieved 97.627%, 97.436%, 97.939%, 1.781 mm, and 0.250 mm, respectively. Data augmentation effectively improves DL model performance for L3 CT skeletal muscle segmentation. Based on systematic evaluation across ten augmentation sizes, an augmentation size of 6× or greater per L3 CT image is recommended to ensure high segmentation accuracy and robust generalization.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.