CVT-HNet: a fusion model for recognizing perianal fistulizing Crohn's disease based on CNN and ViT.

Authors

Li L,Wang Z,Wang C,Chen T,Deng K,Wei H,Wang D,Li J,Zhang H

Affiliations (9)

  • College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China.
  • Fujian Key Laboratory for Intelligent Processing and Wireless Transmission of Media Information, Fuzhou University, Fuzhou, 350108, China.
  • College of Chemical and Engineering, Fuzhou University, Fuzhou, 350108, China. [email protected].
  • Department of Endoscopic Surgery, The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510655, China. [email protected].
  • Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, China. [email protected].
  • Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangzhou, 510655, China. [email protected].
  • Department of General Surgery (Colorectal Surgery), The Sixth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510655, China. [email protected].
  • Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510655, China. [email protected].
  • Biomedical Innovation Center, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangzhou, 510655, China. [email protected].

Abstract

Accurate identification of anal fistulas is essential, as it directly impacts the severity of subsequent perianal infections, prognostic indicators, and overall treatment outcomes. Traditional manual recognition methods are inefficient. In response, computer vision methods have been adopted to improve efficiency. Convolutional neural networks(CNNs) are the main basis for detecting anal fistulas in current computer vision techniques. However, these methods often struggle to capture long-range dependencies effectively, which results in inadequate handling of images of anal fistulas. This study proposes a new fusion model, CVT-HNet, that integrates MobileNet with vision transformer technology. This design utilizes CNNs to extract local features and Transformers to capture long-range dependencies. In addition, the MobileNetV2 with Coordinate Attention mechanism and encoder modules are optimized to improve the precision of detecting anal fistulas. Comparative experimental results show that CVT-HNet achieves an accuracy of 80.66% with significant robustness. It surpasses both pure Transformer architecture models and other fusion networks. Internal validation results demonstrate the reliability and consistency of CVT-HNet. External validation demonstrates that our model exhibits commendable transportability and generalizability. In visualization analysis, CVT-HNet exhibits a more concentrated focus on the region of interest in images of anal fistulas. Furthermore, the contribution of each CVT-HNet component module is evaluated by ablation experiments. The experimental results highlight the superior performance and practicality of CVT-HNet in detecting anal fistulas. By combining local and global information, CVT-HNet demonstrates strong performance. The model not only achieves high accuracy and robustness but also exhibits strong generalizability. This makes it suitable for real-world applications where variability in data is common.These findings emphasize its effectiveness in clinical contexts.

Topics

Rectal FistulaNeural Networks, ComputerCrohn DiseaseImage Interpretation, Computer-AssistedJournal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.