A novel multimodal computer-aided diagnostic model for pulmonary embolism based on hybrid transformer-CNN and tabular transformer.

Authors

Zhang W,Gu Y,Ma H,Yang L,Zhang B,Wang J,Chen M,Lu X,Li J,Liu X,Yu D,Zhao Y,Tang S,He Q

Affiliations (6)

  • School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
  • School of Digital and Intelligent Industry, Inner Mongolia University of Science and Technology, Baotou, 014010, China. [email protected].
  • School of Automation and Electrical Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China.
  • School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081, China.
  • College of Information Engineering, Inner Mongolia University of Technology, Hohhot, 010051, China.
  • School of Computer Science and Technology, Baotou Medical College, Inner Mongolia University of Science and Technology, Baotou, 014040, China.

Abstract

Pulmonary embolism (PE) is a life-threatening clinical problem where early diagnosis and prompt treatment are essential to reducing morbidity and mortality. While the combination of CT images and electronic health records (EHR) can help improve computer-aided diagnosis, there are many challenges that need to be addressed. The primary objective of this study is to leverage both 3D CT images and EHR data to improve PE diagnosis. First, for 3D CT images, we propose a network combining Swin Transformers with 3D CNNs, enhanced by a Multi-Scale Feature Fusion (MSFF) module to address fusion challenges between different encoders. Secondly, we introduce a Polarized Self-Attention (PSA) module to enhance the attention mechanism within the 3D CNN. And then, for EHR data, we design the Tabular Transformer for effective feature extraction. Finally, we design and evaluate three multimodal attention fusion modules to integrate CT and EHR features, selecting the most effective one for final fusion. Experimental results on the RadFusion dataset demonstrate that our model significantly outperforms existing state-of-the-art methods, achieving an AUROC of 0.971, an F1 score of 0.926, and an accuracy of 0.920. These results underscore the effectiveness and innovation of our multimodal approach in advancing PE diagnosis.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.