Back to all papers

CvTFuse: An unsupervised medical image fusion method of gliomas T1-DWI mode.

January 15, 2026pubmed logopapers

Authors

Huang Q,Chen W,Zeng J,Ding J,Xie K,Cao N,Sun K,Jiao Z,Cai J,Ni X

Affiliations (6)

  • School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213159, China; Department of Radiotherapy, The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center for Medical Physics, Nanjing Medical University, Changzhou 213003, China.
  • School of Mechanical Engineering and Rail Transit, Changzhou University, Changzhou 213164, China.
  • Department of Radiotherapy, The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center for Medical Physics, Nanjing Medical University, Changzhou 213003, China.
  • School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213159, China. Electronic address: [email protected].
  • Department of Health Technology and Informatics, The Hong Kong Polytechnic University, HongKong 999077, China.
  • Department of Radiotherapy, The Affiliated Changzhou NO.2 People's Hospital of Nanjing Medical University, Changzhou 213003, China; Jiangsu Province Engineering Research Center of Medical Physics, Changzhou 213003, China; Center for Medical Physics, Nanjing Medical University, Changzhou 213003, China. Electronic address: [email protected].

Abstract

DWI can provide microscopic information on the diffusion of water molecules, whereas T1WI can provide high-resolution anatomical and histological information. Accurately and effectively fusing different MRI modalities can precisely localize lesion areas and provide rich information for analyzing the nature of lesions. We propose a dual-branch medical image fusion network that combines convolutional neural network (CNN) and vision transformer (CvTFuse). CvTFuse consists of three parts: encoder, fusion layer, and decoder. The encoder is divided into a CNN module and a transformer module, which are used to extract local and global features of the source image. To completely capture the contextual information of the image, a global context aggregation module (GCAM) is proposed, which aggregates multi-scale features extracted from the transformer branch to improve the quality of the fused image. The fusion layer employs an energy-aware and gradient-enhanced fusion strategy to help retain the details in the source images for feature fusion of different MRI modalities. The decoder consists of five convolutional layers and two skip connections to reconstruct the fused features. Qualitative results showed that this method presented clear texture details and sharp boundaries, preserving the salient information of the source images to the greatest extent. Quantitative results indicated that the method achieved average gradient, information entropy, mutual information, and visual saliency of 4.5975, 4.9073, 2.5181, and 0.77, respectively. Qualitative and quantitative results demonstrated that compared with deep learning fusion methods such as DenseFuse, RFN-Nest, MSDNet, IFCNN, CDDFuse, and SwinFusion, this method maintained gradient information, texture information, and edge details very well, while also minimizing information loss and reducing distortion. This method can combine information from different modalities of MR images, allowing for accurate localization of lesion areas. It also utilizes rich clinical information to aid in the precise diagnosis and formulation of treatment plans.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 8,500+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.