Automated Detection and Classification of Radiology Report Discrepancies Using NLP: A Tool for Resident Education and Quality Assurance.

March 31, 2026

papers

DOI: 10.1016/j.jacr.2026.03.015 PMID: 41933671

Authors

Wang K,Wang H,Wu J,Wang K,Liu W,Zhang Y,Wang X

Affiliations (7)

Department of Radiology, Peking University First Hospital, Beijing 100034, China. Electronic address: [email protected].
Department of Radiology, Peking University First Hospital, Beijing 100034, China. Electronic address: [email protected].
Department of Radiology, Peking University First Hospital, Beijing 100034, China. Electronic address: [email protected].
Department of Radiology, Peking University First Hospital, Beijing 100034, China. Electronic address: [email protected].
Beijing Smart Tree Medical Technology Co.,Ltd., Beijing 100011, China. Electronic address: [email protected].
Beijing Smart Tree Medical Technology Co.,Ltd., Beijing 100011, China. Electronic address: [email protected].
Department of Radiology, Peking University First Hospital, Beijing 100034, China. Electronic address: [email protected].

Abstract

To develop and evaluate a natural language processing (NLP) system that automatically detects and classifies discrepancies between preliminary and final radiology reports, with the goal of enhancing resident education through structured feedback. We retrospectively analyzed 889 de-identified lumbar spine MRI reports (768 with revisions) from December 2023 to March 2024. Preliminary full diagnostic reports were generated by trainee residents during daytime rotations; final reports were subsequently verified by attending radiologists remotely. Discrepancies in the diagnostic impression section were extracted using a multi-step NLP pipeline: sentence segmentation, BERT-based sentence matching, GPT-4-based named entity recognition, and rule-based classification into 11 correction types (e.g., missed diagnosis, misdiagnosis, missed image feature, misidentified image feature, localization error, diagnostic reasoning error, clinical query omission, severity error, confidence difference, typographic error, terminology refinement). Ground truth was established by three radiologists. System performance was evaluated for each correction type individually using accuracy, sensitivity, specificity, and inter class coefficient (ICC). Resident and attending radiologist performance trends were analyzed at the report level. The NLP system achieved high accuracy (0.983-0.999), sensitivity (0.977-1.000), and specificity (0.900-1.000) for each of the 11 correction types, with strong inter-rater reliability (ICC > 0.75). Most common corrections were misdiagnosis (504/768, 65.6%) and missed diagnosis (356/768, 46.4%). Residents showed significant variability in error rates, especially in missed diagnosis (range 11.1-59.1% across 16 residents) and misdiagnosis (range 24.0-71.1% across 16 residents). Attending radiologists exhibited marked heterogeneity in correction patterns (n=6, individual workloads 95-187 reports, median 159), with significant variability across all major error types (p<0.001 for missed diagnosis [20.6%-82.0%], misdiagnosis [31.4%-66.7%], localization error [15.8%-54.7%], and terminology refinement [3.2%-36.7%]). The NLP-based discrepancy tracking system accurately identifies and classifies report modifications, enabling scalable, targeted feedback for radiology residents. Inter-resident and inter-attending variability highlights the need for individualized training and standardized review practices.

View Source Full Text PDF

Topics

Journal Article

Automated Detection and Classification of Radiology Report Discrepancies Using NLP: A Tool for Resident Education and Quality Assurance.

Authors

Affiliations (7)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?