Back to all papers

Factors predicting MRI glioma segmentation accuracy in deep learning models: a systematic review and meta-analysis.

May 7, 2026pubmed logopapers

Authors

Di Cosmo L,Colella FE,Łajczak P,Schifino E,Cuervo SN,El Choueiri J,Centini FR,Pellicanò F,Łajczak A,Mazzapicchi E,Schiariti MP,de Almeida AGC,Zaed I,Santos BFO

Affiliations (8)

  • Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy. Electronic address: [email protected].
  • Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy.
  • Medical University of Silesia, Katowice, Poland.
  • Department of Neurosurgery, Azienda Ospedaliero Universitaria Pisana, Pisa, Italy.
  • Department of Neurosurgery, Fondazione IRCCS Istituto Neurologico C. Besta, University of Milan, Milan, Italy.
  • Department of Medicine, Federal University of Sergipe, Aracaju, SE, Brazil.
  • Department of Neurosurgery, Neurocenter of South Switzerland, EOC, Lugano, Switzerland.
  • Health Sciences Graduate Program, Federal University of Sergipe, Aracaju, SE, Brazil.

Abstract

Accurate segmentation is central for the diagnosis and treatment of gliomas. Although manual segmentation remains the clinical standard, it is time-consuming and subject to inter-operator variability. In recent years, deep learning (DL) models have been developed to automate this process, offering scalable alternatives. Performance across these models remains variable, and the factors driving this heterogeneity are poorly understood. Following PRISMA guidelines, databases were searched for studies reporting the performance of models preoperatively segmenting gliomas. Data regarding model and patient characteristics were extracted, and subgroup analyses along with mixed-effects meta-regressions were performed to identify factors linked to segmentation accuracy, as measured by the Dice Similarity Coefficient (DSC). 88 models were identified, of which 36 were included in quantitative analyses. Whole tumor segmentation demonstrated a general high accuracy (DSC 0.860, 95% CI 0.840-0.881), with significantly lower performance found in enhancing, non-enhancing, and tumor core delineation. Subgroup analyses found models using 3D and multiparametric MRI inputs consistently outperformed those that did not. Models trained on BraTS datasets were associated with higher performance compared to original institutional data. Segmentation of high-grade gliomas showed a trend toward improved accuracy but was not statistically significant. Training dataset size was not associated with segmentation performance. In multivariate meta-regression, only publication year independently predicted improved accuracy (β=0.023, p = 0.017). Segmentation performance in DL-based glioma MRI is most consistently associated with the use of 3D model architectures and multiparametric MRI inputs. Models trained on BraTS datasets showed a trend toward higher performance, suggesting a possible benchmarking effect. However, in both univariate and multivariate analyses we found no single factor explained the variability observed across studies. Future studies should prioritize multivariable analyses to better define determinants of model performance and in turn support the application of these models into everyday practice.

Topics

Journal ArticleReview

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.