Back to all papers

Natural Language Processing Based Solution for Labeling Brain Metastasis Identified in Radiology Reports

June 15, 2026medrxiv logopreprint

Authors

Liu, T.,Han, Y. T.,Zuo, H.,Das, S.,Lin, H.-M.,Colak, E.,Istasy, M.,Ladak, A. M.,Bigenimana, J. C.,Gondara, L.,Simkin, J.,Lee, J.,Roozbeh, D.,Nichol, A. M.,Easaw, J.,Walker, E.,Yip, S.,Mou, L.,Yuan, Y.

Affiliations (1)

  • University of Alberta

Abstract

PurposeBrain metastases (BM) far exceed primary CNS tumours and constitute the majority workload for neuro-oncology care providers. Currently, the cancer registries only capture synchronous BMs, which is only a small proportion of all BMs. We aim to develop and validate a natural language processing (NLP) algorithm that identifies brain metastases in radiology reports, enabling scalable surveillance of asynchronous BMs. MethodsUsing population-based cancer registry data in Alberta, Canada, we identified a cancer cohort diagnosed between 2012-2019 with follow-up to 2022. All brain/head radiology reports at and post-cancer diagnosis were identified. Reports were sampled through a multi-phase approach and manually labeled for BM presence. We trained two Bio_ClinicalBERT models on the "Findings" and "Impressions" sections, respectively, and took the maximum predicted probability as the report-level prediction. Internal and external validation used reports from the Canadian provinces of Alberta, Ontario, and British Columbia. ResultsThe models were trained on 1,879 samples. For internal validation, 1,833 reports from 357 patients were tested. At a probability threshold of 0.4, the model achieved a sensitivity of 0.888 and precision of 0.499. The ensemble substantially outperformed single-section models, which achieved sensitivities of only 67.8% (Findings) and 74.2% (Impressions). On external validation, sensitivity was 0.918 in Ontario and 0.726 in British Columbia, demonstrating robustness across diverse data distributions. ConclusionsAn NLP-based pipeline processing both Findings and Impressions sections has been developed and validated in three Canadian provinces. It meets cancer registry operational requirements and to be implemented into the surveillance workflow in Alberta and British Columbia, providing a foundation for population-level BM surveillance.

Topics

epidemiology

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.