Automated identification of MRI series using a hierarchical modular machine-learning pipeline.

May 28, 2026

papers

DOI: 10.1186/s41747-026-00740-z PMID: 42207236

Authors

Kujawa MJ,Fernández-Patón M,Cerdá Alberich L,Veiga-Canuto D,Martí-Bonmatí L

Affiliations (4)

Grupo de Investigación Biomédica en Imagen (GIBI230), Instituto de Investigación Sanitaria La Fe, Valencia, Spain. [email protected].
2nd Department of Radiology, Medical University of Gdansk, Gdansk, Poland. [email protected].
Grupo de Investigación Biomédica en Imagen (GIBI230), Instituto de Investigación Sanitaria La Fe, Valencia, Spain.
Área Clínica de Imagen Médica, Hospital Universitario y Politécnico La Fe, Valencia, Spain.

Abstract

The volume and diversity of large MR imaging datasets require efficient automated labelling tools for cataloguing MR series, as manual annotation is impractical and costly. However, relying on DICOM header fields alone is unreliable because sequence descriptors are heterogeneous and locally defined, frequently missing or incorrect, and may be altered or removed during anonymisation. We developed an AI-based modular model to classify MR series. The pipeline comprises five sequential classifiers (Family, Weighting, Fat Suppression, Contrast, and Others) and was trained and tested on 18,181 MRI series from the multicentre PRIMAGE repository. The dataset was split by patient into 80% training/validation and 20% testing; within the training/validation subset, five-fold cross-validation was used. With the exception of contrast classification, all modules used DICOM tag-based machine learning models (CatBoost/Random Forest), while the Contrast classifier incorporated image analysis using a pretrained single-slice ResNet-50. Ethical approval for the study was obtained. Accuracy was 0.994 for Weighting, 0.984 for Family, 0.959 for Fat suppression and 0.958 for Others; the Contrast classifier recorded 0.841. Overall, the end-to-end classification yielded a weighted F1 of 0.849 (CI: 0.837-0.861) and an accuracy of 0.853 (CI: 0.841-0.865). The proposed approach provides a reliable and scalable solution for labelling large, heterogeneous MRI datasets across multiple anatomical regions. The pipeline achieved excellent performance for Weighting and Family classification, solid performance for Fat Suppression and 'Others'. However, Contrast classification remains the main limitation and warrants further refinement and/or additional modules. Reliable, automated MRI sequence labelling enables faster, reproducible cohort selection and protocol harmonisation in large archives, supporting downstream clinical research and AI tools while reducing manual curation effort and error. Multicentre PRIMAGE cohort (18,181 MR series) enabling evaluation in a heterogeneous setting. Machine learning model combining DICOM metadata and image features. High accuracy in series classification: up to 0.994 on key tasks. A scalable pipeline reduces manual annotation workload for radiologists.

View Source Full Text PDF

Topics

Magnetic Resonance ImagingMachine LearningImage Processing, Computer-AssistedJournal Article

Automated identification of MRI series using a hierarchical modular machine-learning pipeline.

Authors

Affiliations (4)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?