Back to all papers

Efficient transformer integration in nnU-Net for liver tumor segmentation: an external validation study.

May 22, 2026pubmed logopapers

Authors

Cao H,Tao L,Li F,Pan X

Affiliations (3)

  • Inner Mongolia Medical University, Hohhot, China.
  • International Mongolian Medicine Hospital of Inner Mongolia Autonomous Region, Hohhot, China.
  • International Mongolian Medicine Hospital of Inner Mongolia Autonomous Region, Inner Mongolia Medical University, Hohhot, China. [email protected].

Abstract

Small and low-contrast liver tumors remain challenging targets for contrast-enhanced CT segmentation because of severe class imbalance and limited long-range contextual modeling in conventional CNN encoders. We developed OF-TransUNet, a minimalist hybrid that differs from parameter-heavy TransUNet-style variants by inserting a single lightweight mid-level Conv-Transformer block at encoder stage 3 (×8 downsampling) within an otherwise unchanged 2D nnU-Net, together with an output-focused progressive unfreezing schedule intended to improve adaptation stability. This design was motivated by unstable and poorly reproducible optimization observed under immediate full fine-tuning in internal ablation experiments. A public dataset (n = 104) was used descriptively for architecture and schedule selection. The pre-specified primary endpoint was the external per-patient tumor Dice difference versus a standardized 2D nnU-Net baseline on an independent cohort (n = 42). A pre-defined secondary lesion-level analysis focused on medium-sized tumors (10-50 mm). On the external cohort, OF-TransUNet showed a numerically higher per-patient tumor Dice than nnU-Net (0.2788 ± 0.2575 vs 0.2400 ± 0.2426), with a mean paired difference of +0.0388 (95% CI 0.0029 to 0.0748). Because paired differences were non-normal, the pre-specified Wilcoxon signed-rank test was retained as the primary inferential analysis and was borderline (p = 0.0553); a supplementary paired t-test yielded nominal significance (p = 0.0347). In the pre-defined medium-lesion analysis (10-50 mm; n = 84), detection increased from 0.190 to 0.286 at the pre-specified 10% overlap threshold (McNemar exact p = 0.021; GEE OR 1.97, 95% CI 1.15-3.35). Post hoc sensitivity analyses using 5% and 15% overlap thresholds preserved the directional advantage. Relative to baseline, OF-TransUNet increased parameters by 8.4% and FLOPs by 18.3%, with no measurable latency penalty and only minimal memory increase. In this single-center external validation cohort, a single mid-level Conv-Transformer insertion plus output-focused progressive unfreezing was associated with a numerically higher per-patient tumor Dice and a statistically supported improvement in the pre-defined medium-lesion detection analysis, while preserving a lightweight computational profile. Because the pre-specified non-parametric primary patient-level analysis was borderline and did not reach conventional significance, the Dice finding should be interpreted cautiously. Tumor HD95 was numerically higher in OF-TransUNet, indicating a possible recall-boundary trade-off that requires further boundary-focused evaluation. Overall, this minimally invasive modification supports further multi-center validation rather than definitive claims of superiority. A supplementary controlled benchmark on the public LiTS dataset, limited to a single pre-fixed validation fold because only 131 public cases have released annotations for local evaluation, provided directionally consistent contextual evidence under an identical pipeline.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.