Automated classification of clinical T staging pulmonary oncology from chest CT reports: a natural language processing framework for real-world data utilization.

June 17, 2026

DOI: 10.1186/s12911-026-03596-w PMID: 42310616

Authors

Nishijima N,Sugimoto K,Konishi S,Wada S,Hirata H,Yanagawa M,Tsuboyama T,Fujii A,Murata T,Mihara N,Okada K,Matsumura Y,Tomiyama N,Takeda T

Affiliations (7)

Department of Medical Informatics, The University of Osaka Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka, 565-0871, Japan.
Department of Medical Informatics, The University of Osaka Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka, 565-0871, Japan. [email protected].
Department of Transformative System for Medical Information, The University of Osaka Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka, 565-0871, Japan.
Department of Respiratory Medicine and Clinical Immunology, The University of Osaka Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka, 565-0871, Japan.
Department of Diagnostic and Interventional Radiology, The University of Osaka Graduate School of Medicine, 2-2 Yamadaoka, Suita, Osaka, 565-0871, Japan.
Department of Medical Informatics, The University of Osaka Hospital, 2-15 Yamadaoka, Suita, Osaka, 565-0871, Japan.
National Hospital Organization Osaka National Hospital, Hoenzaka Chuo-ku, Osaka, 2-1-14, Japan.

Abstract

Clinical Tumor, Node, and Metastasis (cTNM) classification is vital for predicting treatment efficacy and prognosis in patients with cancer; however, its utilization in real-world data (RWD) research is hindered by its frequent storage as free text. While chest CT reports include essential information for labeling cT, the extraction process is typically manual and time-consuming. We developed a natural language processing (NLP) system capable of labeling clinical T (cT) classifications from chest computed tomography reports. This NLP system employs a combination of deep learning and rule-based algorithms, adhering to the 9th Edition of the TNM Classification for Lung Cancer. The system initially identifies reports containing sufficient information for cT classification and subsequently assigns the appropriate cT substage. In our training and test sets of 284 and 165 reports, 83% and 85% contained sufficient information for labeling cT substage, respectively. The NLP system demonstrated effective extraction capabilities, achieving weighted F1 scores of 0.96 (0.94-0.98) and 0.93 (0.89-0.97) for the training and test sets, respectively. Among reports accurately predicted to contain adequate information, the cT substage classification achieved weighted F1 scores of 0.97 (0.94-0.99) and 0.95 (0.92-0.99) for the training and test sets, respectively. This system significantly alleviates annotation costs and enhances the feasibility of utilizing cT data from RWD.

View Source Full Text PDF

Topics

Journal Article

Automated classification of clinical T staging pulmonary oncology from chest CT reports: a natural language processing framework for real-world data utilization.

Authors

Affiliations (7)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?