Comparison of segmentation performance of cnns, vision transformers, and hybrid networks for paranasal sinuses with sinusitis on CT images.
Authors
Affiliations (7)
Affiliations (7)
- Interdisciplinary Program in Bioengineering, Graduate School of Engineering, Seoul National University, Seoul, Korea.
- Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, Seongnam, Korea.
- Department of Otolaryngology-Head and Neck Surgery, Gachon University Gil Hospital, Incheon, 21565, Korea. [email protected].
- Interdisciplinary Program in Bioengineering, Graduate School of Engineering, Seoul National University, Seoul, Korea. [email protected].
- Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea. [email protected].
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, 03080, Korea. [email protected].
Abstract
Accurate segmentation of the paranasal sinuses, including the frontal sinus (FS), ethmoid sinus (ES), sphenoid sinus (SS), and maxillary sinus (MS), plays an important role in supporting image-guided surgery (IGS) for sinusitis, facilitating safer intraoperative navigation by identifying anatomical variations and delineating surgical landmarks on CT imaging. To the best of our knowledge, no comparative studies of convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid networks for segmenting each paranasal sinus in patients with sinusitis have been conducted. Therefore, the objective of this study was to compare the segmentation performance of CNNs, ViTs, and hybrid networks for individual paranasal sinuses with varying degrees of anatomical complexity and morphological and textural variations caused by sinusitis on CT images. The performance of CNNs, ViTs, and hybrid networks was compared using Jaccard Index (JI), Dice similarity coefficient (DSC), precision (PR), recall (RC), and 95% Hausdorff Distance (HD95) for segmentation accuracy metrics and the number of parameters (Params) and inference time (IT) for computational efficiency. The Swin UNETR hybrid network outperformed the other networks, achieving the highest segmentation scores, with a JI of 0.719, a DSC of 0.830, a PR of 0.935, and a RC of 0.758, and the lowest HD95 value of 10.529 with the smallest number of the model architectural parameter, with 15.705 M Params. Also, CoTr, another hybrid network, demonstrated superior segmentation performance compared to CNNs and ViTs, and achieved the fastest inference time with 0.149 IT. Compared with CNNs and ViTs, hybrid networks significantly reduced false positives and enabled more precise boundary delineation, effectively capturing anatomical relationships among the sinuses and surrounding structures. This resulted in the lowest segmentation errors near critical surgical landmarks. In conclusion, hybrid networks may provide a more balanced trade-off between segmentation accuracy and computational efficiency, with potential applicability in clinical decision support systems for sinusitis.