Back to all papers

Deep Learning for Brain Tumour Analysis: A Systematic Review of CNN-Transformer Hybrids in Multimodal Imaging.

June 16, 2026pubmed logopapers

Authors

Antwi SB,Appiahene P,Ayawli BBK,Nimbe P

Affiliations (3)

  • Department of Computer Science and Informatics, University of Energy and Natural Resources, Sunyani, Ghana, uenr.edu.gh.
  • Department of Computer Science, Sunyani Technical University, Sunyani, Ghana, stu.edu.gh.
  • Department of Information Technology and Decision Sciences, University of Energy and Natural Resources, Sunyani, Ghana, uenr.edu.gh.

Abstract

Brain tumour detection and analysis using medical imaging requires the extraction of both local spatial features and global contextual representations. Although convolutional neural networks (CNNs) excel at capturing local spatial patterns and Transformer-based architectures model long-range dependencies effectively, the optimal architectural paradigm for clinical deployment remains unresolved. This systematic review and meta-analysis evaluates hybrid CNN-Transformer architectures for brain tumour detection, focusing on the integration of local and global feature learning, diagnostic accuracy and computational efficiency. The roles of generative adversarial networks (GANs) for addressing data scarcity and multimodal imaging fusion for diagnostic completeness are also critically examined. A systematic search was conducted across IEEE Xplore, PubMed, Scopus and Google Scholar for studies published between January 2021 and May 2025. From 1876 initially identified articles, 94 met the prespecified inclusion criteria following quality assessment using the QUADAS-2 and ROBINS-I frameworks. A random-effects meta-analysis of diagnostic accuracy was performed using the DerSimonian-Laird estimator, with statistical heterogeneity quantified using I<sup>2</sup> and publication bias assessed using funnel plot asymmetry and Egger's test. Computational efficiency was standardised to GigaFLOPs using a reference input of 240 × 240 × 155 voxels (BraTS benchmark), with FLOP estimates derived from primary publications where available and bounded by theoretical complexity formulas otherwise, with estimated values explicitly distinguished throughout. Across all 94 included studies, the pooled diagnostic accuracy was 93.5% (95% CI: 92.7%-94.4%); however, confirmed publication bias (Egger's <i>p</i> = 0.043) indicates this represents an upper-bound approximation rather than an unbiased population estimate. Because subgroup study counts were insufficient for formal random-effects pooling (CNN-only: <i>n</i> = 3; Transformer-only: <i>n</i> = 2; CNN-Transformer hybrid: <i>n</i> = 4; minimum recommended <i>n</i> = 10 per subgroup), no subgroup meta-analysis was performed. Instead, descriptive mean accuracies are reported as hypothesis-generating observations only: CNN-only models 91.7%, Transformer-only models 93.6% and CNN-Transformer hybrid models 94.6%. These figures must not be interpreted as pooled meta-analytic estimates; they reflect mean observed accuracy across a small number of included studies and are reported solely to illustrate directional trends consistent with the mechanistic rationale for hybridisation. Substantial heterogeneity was observed (I<sup>2</sup> = 78.3<i>%</i>; <i>p</i> < 0.001). Three integration paradigms were identified: sequential (45% of models; 93.8% accuracy; 1.8 GFLOPs), parallel (32%; 94.3%; 2.8 GFLOPs) and hierarchical (23%; 94.9%; 3.5 GFLOPs). Parallel architectures demonstrated optimal clinical viability, balancing accuracy with a mean inference time of 2.1 s. GAN-based augmentation improved rare tumour class detection by 7%-10%, with conditional GANs outperforming vanilla architectures. Multimodal MRI + PET fusion achieved 94.2% accuracy at 2.8 GFLOPs, whereas triple-modality integration yielded marginal additional gains (95.1%) at substantially elevated computational cost (9.1 GFLOPs). Notably, 65% of included studies used the BraTS benchmark exclusively, and hybrid model accuracy declined from 94.6% on high-grade gliomas to 88.3% on low-grade gliomas, with hybrid architectures exhibiting 2.3× greater susceptibility to Gaussian noise than CNN-only equivalents, limitations that constrain generalisation to real-world clinical settings. Descriptive comparison of mean observed accuracies based on study counts is insufficient for confirmatory meta-analysis, suggesting hybrid CNN-Transformer architectures may offer diagnostic accuracy advantages over CNN- and Transformer-only approaches; this observation is hypothesis-generating only and requires validation in a larger, more balanced evidence base. Among integration strategies, parallel architectures demonstrated the most favourable accuracy efficiency balance in the reviewed evidence. GANs and multimodal imaging function as essential architectural enablers, addressing data scarcity and diagnostic incompleteness, respectively. Significant challenges remain in computational efficiency, noise robustness and generalisation to rare tumour subtypes, representing priority directions for future research.

Topics

Journal ArticleReview

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.