CMHF-3DNet: A Transformer-Based Framework for Improved Brain Tumor Segmentation Across Modalities.
Authors
Affiliations (2)
Affiliations (2)
- School of Software, Northwestern Polytechnical University, Xi'an 710000, China (A.H., W.J., A.T., Y.G., I.A., A.W.). Electronic address: [email protected].
- School of Software, Northwestern Polytechnical University, Xi'an 710000, China (A.H., W.J., A.T., Y.G., I.A., A.W.).
Abstract
Accurate segmentation of brain tumor subregions in multi-modal MRI is essential for diagnosis, treatment planning, and longitudinal monitoring. However, delineating heterogeneous regions such as the enhancing tumor (ET) and tumor core remains challenging due to weak boundaries and substantial appearance variability across imaging modalities and scanners. To address these challenges, we propose Cross-Modal Hierarchical Fusion U-Net 3D (CMHF-3DNet), a 3D encoder-decoder framework that performs voxel-wise cross-modal transformer fusion to explicitly model inter-modal dependencies and employs hierarchy-aware multi-task learning to enforce anatomical consistency constraints (ET ⊂ TC ⊂ WT). The proposed method was evaluated on the BraTS 2023, BraTS 2024, and BraTS 2025 validation datasets. CMHF-3DNet achieved a mean Dice Similarity Coefficient (DSC) of 0.8527 on BraTS 2025 and 0.8526 on BraTS 2023, along with a mean 95th percentile Hausdorff Distance (HD95) of 6.47 mm on BraTS 2025. These results indicate improved boundary delineation for clinically relevant tumor subregions, particularly the ET and tumor core. Overall, CMHF-3DNet demonstrates robust performance across multiple BraTS benchmarks and suggests potential utility in supporting automated brain tumor assessment.