Self-supervised Text-vision Alignment for Automated Brain MRI Abnormality Detection: A Multicenter Study (ALIGN Study).
Authors
Affiliations (11)
Affiliations (11)
- School of Biomedical Engineering and Imaging Sciences, King's College London, Rayne Institute, 4th Floor, Lambeth Wing, London SE17 7EH, UK.
- King's College Hospital NHS Foundation Trust, London, United Kingdom.
- Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom.
- Department of Neuroimaging, Institute of Psychiatry, Psychology, & Neuroscience, King's College London, United Kingdom.
- Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom.
- Bedfordshire Hospitals NHS Foundation Trust, Bedford Hospital, South Wing, Kempston Road, Bedford, United Kingdom.
- Radiological Sciences, School of Medicine, University of Nottingham, Nottingham, United Kingdom.
- University Hospitals NHS Trust, Nottingham, United Kingdom.
- Department of Neuroradiology, Floor B, Clarendon Wing, Leeds General Infirmary, Leeds, United Kingdom.
- Yeovil Hospital, Somerset NHS Foundation Trust, Yeovil, United Kingdom.
- Department of Radiology, Norfolk and Norwich University Hospital, Norwich, Norfolk, United Kingdom.
Abstract
Purpose To develop a self-supervised text-vision framework to detect abnormalities on brain MRI scans by leveraging free-text neuroradiology reports, eliminating the need for expertlabeled training datasets. Materials and Methods This retrospective and prospective multicenter study included 81,936 brain MRI examinations and corresponding radiology reports for adult patients at two UK National Health Service (NHS) hospitals during January 2008-December 2019 for training and internal testing, and 1,369 prospectively collected examinations between March 2022-March 2024 from four separate NHS hospitals for external testing (clinicaltrials.gov NCT043681). A neuroradiology language model (NeuroBERT) was trained using self-supervised tasks to generate report embeddings. Convolutional neural networks (one per MRI sequence) were trained to map scans to embeddings by minimizing mean squared error loss. The framework then detected abnormalities in new examinations by scoring scans against query sentences using textimage similarity. Model diagnostic performance was assessed using the area under the receiver operating characteristic curve (AUC). Results The framework achieved an AUC of 0.95 (95% CI: 0.94, 0.97) for normal versus abnormal classification and generalized to external sites with examination-level AUCs of 0.90 (95% CI: 0.86, 0.93), 0.87 (95% CI: 0.83, 0.90), 0.86 (95% CI: 0.83, 0.90), and 0.85 (95% CI: 0.81, 0.89). In five zero-shot classification tasks-acute stroke, multiple sclerosis, intracranial hemorrhage, meningioma, and hydrocephalus-the framework achieved a mean AUC of 0.89 (range, 0.77-0.93). For visual-semantic image retrieval, mean precision was 0.84 among the top 15 images across seven pathologies. Conclusion The self-supervised text-vision framework accurately detected brain MRI abnormalities without expert-labeled datasets. © The Author(s) 2025. Published by the Radiological Society of North America under a CC BY 4.0 license.