Back to all papers

Transformer-based cardiac substructure segmentation from contrast and non-contrast computed tomography for radiotherapy planning.

June 2, 2026pubmed logopapers

Authors

Rangnekar A,Mankuzhy N,Willmann J,Seo Choi CM,Wu A,Thor M,Rimner A,Veeraraghavan H

Affiliations (3)

  • Department of Medical Physics, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
  • Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
  • Department of Radiation Oncology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, German Cancer Consortium (DKTK), Partner Site DKTK-Freiburg, Freiburg, Germany.

Abstract

Accurate segmentation of cardiac substructures on computed tomography (CT) scans is essential for radiotherapy planning. This study evaluated whether pretrained transformers enabled data-efficient training using a fixed architecture with balanced curriculum learning while achieving robust generalization to imaging and patient variations. A hybrid pretrained transformer-convolutional network, self-distilled masked image transformer (SMIT), was fine-tuned using lung cancer patient scans (Cohort I, training N = 180) and tested on held-out Cohort I lung cancer scans (testing N = 60) and breast cancer scans (Cohort II, N = 65). Two configurations were evaluated: SMIT-Balanced (32 contrast-enhanced CTs, 32 non-contrast CTs) and SMIT-Oracle (180 CTs). Performance was compared with nnU-Net and TotalSegmentator. Segmentation accuracy was assessed primarily using the 95th percentile Hausdorff distance (HD95), along with radiation dose and overlap-based metrics as secondary endpoints. SMIT-Balanced approached SMIT-Oracle performance despite using 64% fewer training scans, with mean HD95 of 6.6 versus 5.4 mm in Cohort I and 10.0 versus 9.4 mm in Cohort II. On the Cohort I held-out test set, SMIT-Balanced mean HD95 was within 1.0 mm of nnU-Net. Cross-cohort testing showed larger accuracy degradation with nnU-Net than SMIT-Balanced (62% versus 50%, absolute change 4.5 mm versus 3.4 mm). Dose metrics derived from SMIT-Balanced were equivalent to manual delineations. Balanced curriculum training reduced labeled data requirements within the SMIT architecture. SMIT-Balanced was comparable to nnU-Net on Cohort I held-out data and showed smaller cross-cohort HD95 degradation.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.