Deep Learning for Automated Meningioma Segmentation: Toward Clinical Integration and Workflow Efficiency
Authors
Affiliations (1)
Affiliations (1)
- Queen Square Institute of Neurology, University College London
Abstract
Key ResultsO_LIIn five-fold cross-validation (1000 cases, six institutions), the model achieved mean Dice similarity coefficients of 0.939 for enhancing tumor, 0.937 for tumor core, and 0.921 for whole tumor, with tumor core volumes strongly correlated with reference volumes (r = 0.995). C_LIO_LIIn external validation (310 cases, single institution), mean tumor core Dice was 0.872 despite heterogeneous MRI protocols and incomplete sequences; tumor core volume correlation remained strong (r = 0.971). C_LIO_LIIn a blinded evaluation by 10 radiologists across 510 cases, model segmentations scored higher than reference annotations, with the advantage fourfold larger in real-world clinical data; mean inference time was 1.2 seconds per case. C_LI Summary StatementA fully automated deep learning model achieved high meningioma segmentation accuracy, generalized to heterogeneous clinical imaging in 1.2 seconds, and surpassed reference annotation quality in blinded radiologist evaluation. BackgroundMeningiomas are the most common primary intracranial tumors in adults, and volumetric assessment increasingly guides surveillance and treatment decisions. Automated segmentation could enable standardized volumetry but requires robust validation. PurposeTo develop a fully automated three-dimensional deep learning model for meningioma segmentation on multiparametric MRI, and to evaluate segmentation accuracy, external generalizability, failure modes, radiologist-rated clinical plausibility, and workflow feasibility. MethodsFrom 2024 to 2026, this retrospective study trained a custom 3D nnU-Net residual encoder model. Expert segmentations covered enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Dice similarity coefficient (DSC) was the primary metric. External validation used an independent single-institution dataset (n = 310 intracranial cases) with incomplete MRI protocols. Failure modes, model equity, and inference time were assessed. A blinded multi-rater study (10 radiologists; 510 cases) rated TC segmentations using a 0-10 Likert scale, analyzed with linear mixed-effects models. ResultsModel training used the BraTS Meningioma 2023 dataset (n = 1000; mean age 60.2 {+/-} 14.5; 705 female). In cross-validation, mean DSC was 0.939 for ET, 0.937 for TC, and 0.921 for WT. In external validation, mean DSC was 0.872 for TC and 0.842 for WT, despite heterogeneous protocols and incomplete sequences. Predicted TC volumes correlated strongly with reference volumes in cross-validation (r = 0.995) and external validation (r = 0.971). Most common failure modes were skull base and intraosseous tumors with performance equitable across demographic subgroups. Mean inference time was 1.2 seconds. In blinded evaluation (1120 ratings), model segmentations received higher scores than reference annotations (+0.32 BraTS; +1.38 external validation). ConclusionA fully automated deep-learning model achieved high meningioma segmentation accuracy across multi-institutional training data and external clinical imaging. In a blinded study, model segmentation quality exceeded reference annotations, and 1.2-second inference supported workflow integration. Prospective evaluation is warranted before routine deployment.