Wavelet-Transformed Frequency Linked Attention with Selective Hierarchy for Abdominal Organ Segmentation.
Authors
Affiliations (5)
Affiliations (5)
- Institute of Artificial Intelligence Innovation, Industry Academia Innovation School, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.
- Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Rd, Minhsiung, Chiayi, 621301, Taiwan.
- Department of Radiology, Mackay Memorial Hospital, No. 92, Sec. 2, Zhongshan N. Rd., Taipei City, 10449, Taiwan. [email protected].
- Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Rd, Minhsiung, Chiayi, 621301, Taiwan. [email protected].
- Advanced Institute of Manufacturing With High-Tech Innovations, National Chung Cheng University, Chiayi, Taiwan. [email protected].
Abstract
The clinical use of computed tomography (CT) scans has been steadily rising due to their ability to provide detailed three-dimensional (3D) representations of organs, along with advantages such as speed and high resolution. With CT, physicians can identify potential diseased organs and plan appropriate treatments. However, accurate quantitative evaluation is time-consuming, and precise automatic segmentation can significantly enhance the effectiveness of CT scans. Frequency linked attention with selective hierarchy (FLASH) was introduced as a novel deep learning architecture for multi-organ segmentation in abdominal CT scans. FLASH incorporated the 3D discrete wavelet transform into Transformer blocks as the backbone of the encoder. By distinguishing 3D frequency bands, the network addressed the challenge of unclear organ boundaries in abdominal CT scans caused by low contrast. Two-directional skip connections were introduced to link the multi-scale frequency features into the bottleneck and to assign adaptive weights to frequency components using attention mechanisms. By training from scratch on the organ dataset with five-fold validation, FLASH achieved the highest dice similarity coefficient = 0.826, normalized surface distance = 0.698 with the lowest standard deviation of dice similarity coefficient = 0.034, standard deviation of normalized surface distance = 0.046 compared to other networks in the experiments. With better segmentation results, FLASH had fewer parameters (96.86 M vs. 139.45 M) and computation time (36.3 h vs. 50.4 h) compared to Swin UNETR, which is more suitable in clinical use with limited resources and waiting time. Our code can be found at https://github.com/jorden0721/FLASH-UNETR.git.