Back to all papers

Performance of Natural Language Processing Model in Extracting Information from Free-Text Radiology Reports: A Systematic Review and Meta-Analysis.

October 28, 2025pubmed logopapers

Authors

Yang Q,Jiang J,Dong X,Yang H,Wang Q,Yang Z,Yang D,Liu P

Affiliations (3)

  • Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 Yong'an Road, Xicheng District, Beijing, 100050, China.
  • Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 Yong'an Road, Xicheng District, Beijing, 100050, China. [email protected].
  • Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 Yong'an Road, Xicheng District, Beijing, 100050, China. [email protected].

Abstract

The free-text format is widely used in radiology reports for its flexibility of expression; however, its unstructured nature leads to substantial amounts of report data remaining underutilized. A natural language processing (NLP) model for automatic extraction of information from free-text radiology reports can significantly contribute to the development of structured databases, thereby optimizing data utilization. This study aimed to perform a systematic review and meta-analysis that evaluates the performance of NLP systems in extracting information from free-text radiology reports. A systematic literature search was conducted from November 21 to 23, 2024, in PubMed/MEDLINE, Embase, EBSCO, Ovid, Web of Science, and the Cochrane Library. Study quality was assessed using the QUADAS-2 tool. A bivariate random-effects model was applied to obtain the pooled sensitivity, specificity, diagnostic odds ratio (DOR), positive likelihood ratio (PLR), negative likelihood ratio (NLR), and area under the summary receiver operating characteristic curve (AUC). Subgroup analyses (e.g., NLP model types, dataset source, and language types) and a random-effects multivariable meta-regression based on the restricted maximum likelihood (REML) method were conducted to explore potential sources of heterogeneity. Sensitivity analyses (excluding high-risk studies, leave-one-out method, and data integration strategy comparison) were performed to assess the robustness of the findings. A total of 28 studies were included in the final analysis, with 421,692 extracted entities in 51,187 free-text radiology reports. NLP systems achieved high pooled sensitivity (91% [95% CI: 87, 93]) and specificity (96% [95% CI: 93, 97]), with a diagnostic odds ratio of 220 (95% CI: 112, 435) and an area under the curve of 0.98 (95% CI: 0.96, 0.99). Subgroup analysis revealed significantly better performance for extracting single anatomical sites (AUC 0.99; 95% CI: 0.97, 0.99) compared with multiple sites (AUC 0.95; 95% CI: 0.93, 0.97; p = 0.001). No significant differences were observed across NLP model types, dataset sources, external validations, languages, or imaging modalities. Multivariable meta-regression further identified anatomical site as the only significant contributor to heterogeneity (coefficient = 2.26; 95% CI: 0.25, 4.27; p = 0.027). Sensitivity analyses confirmed the robustness of the findings, and no evidence of publication bias was detected. NLP models demonstrated excellent performance in extracting information from free-text radiology reports. However, the observed heterogeneity highlights the need for enhanced report standardization and improved model generalizability.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.