Back to all papers

The Impact of Data Consistency on Deep Learning Models for Nasopharyngeal Cancer Organ Auto-Segmentation.

March 31, 2026pubmed logopapers

Authors

Fang Y,Wang J,He X,Hu C,Yu L,Guo Y,Zhong Y,Mi J,Chen S,Qiao J,Yang Y,Hu W

Affiliations (6)

  • Department of Radiation Oncology, Fudan University Shanghai Cancer Center, 270 Dong'an Road, Xuhui District, Shanghai, 200032, China.
  • Department of Radiation Oncology Cancer Hospital and Department of Oncology Shanghai Medical College, Fudan University, Shanghai, Shanghai, 200433, China.
  • Department of Radiation Oncology, Fudan University Shanghai Cancer Center, shanghai, Shanghai, 200032, China.
  • Department of Radiation Oncology, Fudan University Shanghai Cancer Center, shanghai, Shanghai, 200433, China.
  • Fudan University Shanghai Cancer Center, 270 Dong 'an Road, Shanghai, 200032, China.
  • Department of Radiation Oncology Cancer Hospital and Department of Oncology Shanghai Medical College, Fudan University Shanghai Cancer Center, Shanghai, Shanghai, Shanghai, 200032, China.

Abstract

To investigate how annotation consistency influences deep learningbased auto-contouring performance for organs-at-risk (OARs) in nasopharyngeal cancer radiotherapy. We evaluated CT scans from 1,301 nasopharyngeal carcinoma patients: 65 contoured by Physician A, 76 by Physician B, and 1,160 by heterogeneous multi-physician teams. Three cohorts (50 samples each for Physicians A/B; 1,000 for multi-physician) with standardized U-Net training protocols generated Models A, B, and C. Model C underwent physician-specific fine-tuning. Performance was quantified via Dice similarity coefficients (DSC) against ground-truth contours across 14 critical OARs. Each model achieved peak accuracy on physician-matched test data. Critically, small-consistency models (A/B) outperformed large-heterogeneous Model C on target cohorts (Model A: 0.777 vs. Model C's 0.743 on Test A; Model B: 0.806 vs. 0.765 on Test B). Physician-specific fine-tuning closed institutional data gaps, boosting Model C's DSC to 0.795 (+7.08% vs. original) on Test A and 0.814 (+6.32%) on Test B-surpassing both original Model C and dedicated small-data models (Model A: +2.39%; Model B: +0.97%). Annotation consistency supersedes dataset scale as the primary performance driver for OAR auto-contouring. Small high-consistency datasets enable optimal native model accuracy, whereas fine-tuning large pre-trained models with targeted physician data closes domain adaptation gaps and delivers state-of-the-art segmentation, advancing precision radiotherapy for head and neck oncology.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.