Development of a Hybrid Algorithm of Claims Data and EMRs with NLP for Lung Cancer Identification.
Authors
Affiliations (2)
Affiliations (2)
- Department of Medical Informatics, The University of Osaka, Graduate School of Medicine, Osaka, Japan.
- Department of Transformative System for Medical Information, The University of Osaka, Graduate School of Medicine, Osaka, Japan.
Abstract
The use of real-world data, which encompassing administrative claims and electronic medical records, has gained significance in clinical research. Although administrative claims data are widely used, they often lack the clinical specificity and diagnostic reliability. To overcome this, we developed a hybrid "NLP-complemented" algorithm. This approach uses natural language processing (NLP) to integrate claims data with detailed clinical information extracted from unstructured radiology reports. Performance was compared against a claims-only baseline for identifying lung cancer cases, using a institutional cancer registry as the reference standard. The NLP-complemented algorithm improved sensitivity by 5.8 % (from 92.7% to 98.5%) by successfully identifying registered cases without treatment codes through the analysis of their radiology reports. However, the positive predictive value (PPV) decreased by 4.4 % (from 86.0% to 81.6%). The PPV decrease was attributed to the algorithm's successful detection of clinically significant, pre-invasive lesions not included in the registry. These results highlight the limitations of using cancer registries as a gold standard for early detection and the need for new evaluation frameworks for identifying clinically relevant patient cohorts.