A Pilot Study: PET/CT Cross-Modal-Based Multi-Head Fusion Attention Generative Adversarial Network (MHFA-GAN) for PET Image Super-Resolution.
Authors
Affiliations (9)
Affiliations (9)
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, China.
- Department of Radiotherapy, Xuzhou Central Hospital, The Xuzhou School of Clinical, Medicine of Nanjing Medical University, Xuzhou Clinical School of Xuzhou Medical University, Xuzhou, Jiangsu, China.
- Department of Anesthesiology, The Yancheng Clinical College of Xuzhou Medical University, The First People's Hospital of Yancheng, Yancheng, 224008, China.
- Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- Institute of Sensors, Signals and Systems, Heriot-Watt University, Edinburgh, EH14 4AS, UK.
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, China. [email protected].
- Department of Biomedical Imaging and Radiological Sciences, National Yang Ming Chiao Tung University, Taipei, Taiwan. [email protected].
- Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung, Taiwan. [email protected].
- School of Medical Imaging, Xuzhou Medical University, Xuzhou, 221004, China. [email protected].
Abstract
Positron emission tomography (PET) is a valuable imaging technique for assessing cellular metabolic activity and is widely used for early disease detection and diagnosis. While high-resolution (HR) PET images offer richer diagnostic information, achieving such quality often requires expensive equipment and time-intensive reconstruction processes. To address these challenges without hardware upgrades, this paper introduces a novel multi-head fusion attention generative adversarial network (MHFA-GAN), designed to enhance the spatial resolution and reconstruction efficiency of PET images through super-resolution (SR) techniques. The MHFA-GAN employs a multi-head mixed convolution mechanism to extract complementary features from PET and computed tomography (CT) images. An enhanced multi-head fusion attention module (EMHFA) is incorporated to adaptively adjust feature weights and align cross-modal information, effectively minimizing the impact of sub-pixel displacement while integrating metabolic and anatomical details. During the reconstruction phase, a global residual block (GRB) is utilized to balance local high-frequency details with global contextual information. Experiments conducted on small animal (SA), phantom, capillary, and <sup>22</sup>Na point source datasets demonstrate that MHFA-GAN outperforms current state-of-the-art methods in both qualitative and quantitative evaluations. Additionally, the model also shows significant results in SR tasks on different device datasets, further verifying its cross-device applicability and robustness. Using the full width at half maximum (FWHM) metric to assess spatial resolution, the SR images from the capillary dataset achieved a FWHM value of 0.625 mm, approaching the physical limit of the SA PET scanner (0.6 mm) and significantly outperforming the HR images (0.871 mm). These results highlight the effectiveness of MHFA-GAN in improving PET image quality, offering a cost-effective solution to enhance spatial resolution and support clinical diagnostics.