Natural Language Processing of Large Numbers of Radiology Reports in a Public Health System to Extract Structured Data, With a Test Case of CT KUB.
Authors
Affiliations (3)
Affiliations (3)
- Department of Anatomy and Medical Imaging, University of Auckland, Auckland, New Zealand.
- Amazon Web Services New Zealand, Auckland, New Zealand.
- Waitemata District Health Board, Auckland, New Zealand.
Abstract
Natural language processing (NLP) was used to extract structured information from large numbers of radiology reports with the aim of showing the feasibility of this approach for system monitoring, conducting clinical research and improving practice. In total, 220,000 consecutive radiology reports were processed using an NLP pipeline (Radiology Text Analysis, or RATA). The indications, modality, technique, anatomy and findings were mapped to SNOMED CT codes. A subset of 941 reports identified as CT-KUB was analysed to examine the pipeline's performance in detecting renal tract stones (RTS), compared with a manual reference standard. The Fisher exact and Cohen kappa tests were applied. Compared with the reference standard, RATA had accuracy 95%, sensitivity 94%, specificity 97%, positive predictive value 98% and negative predictive value 91%, with kappa statistic 0.9. Sub analysis showed that, of 366 females, 50% had negative RTS diagnoses, while only 32% of 566 males had negative RTS diagnoses (two-tailed p < 0.00001). The RATA pipeline has acceptable performance in extracting structured data from large numbers of radiology reports. Clinically relevant information such as variations in use can be uncovered.