Large Language Models Utility for Rapid On-Site Evaluation in Interventional Pulmonology.

May 28, 2026

papers

DOI: 10.3390/diagnostics16111658 PMID: 42279526

Authors

Flaschner M,Kramer MR,Krolik A,Gershman E,Tovar A,Rosengarten D,Amor SM,Pertzov B,Heching M,Shtraichman O,Freidkin L

Affiliations (4)

Pulmonary Institute, Rabin Medical Center, Petah Tikva 4941492, Israel.
Gray School of Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.
Department of Pathology, Rabin Medical Center, Petah Tikva 4941492, Israel.
Institute of Pulmonary Medicine, Tel Aviv Sourasky Medical Center, Tel Aviv 6423906, Israel.

Abstract

Background/Objectives: Rapid on-site evaluation (ROSE) is a valuable technique in interventional procedures to immediately assess the adequacy and quality of biopsy specimens at the time they are obtained. The integration of artificial intelligence (AI) into ROSE workflows has demonstrated diagnostic accuracy comparable to that of experienced cytologists. However, clinical implementation of AI-based ROSE models is limited by complex and expensive development. In contrast, the use of free or near-free global large language models (LLMs) offers a significant advantage, making diagnostic support more accessible. Assess the diagnostic accuracy of the LLMs ChatGPT and Gemini in evaluating cytological smears during interventional pulmonology procedures. Methods: Retrospective evaluation of the efficacy of LLMs for assessment of cytological smears obtained from adult patients who underwent interventional bronchoscopic and ultrasound-guided biopsies between 2020 and 2025. Images of ROSE-prepared samples were analyzed by ChatGPT-4o, ChatGPT-5, ChatGPT-5 "thinking", and Gemini 2.5 models. Results: Forty-eight procedures in 47 patients (mean age 65 years) were analyzed; 79% of biopsies were malignant. Using the final histopathology report as reference, cytologists achieved balanced accuracy of 0.75 (Gwet's AC1 = 0.53, sensitivity 0.71, specificity 0.78). ChatGPT-5 "thinking" showed high concordance (accuracy 0.65, Gwet's AC1 = 0.81, sensitivity 1.00, specificity 0.30). Gemini reached an accuracy of 0.59 (Gwet's AC1 = 0.76, sensitivity 0.97, specificity 0.20). Conclusions: To our knowledge, this study is the first to evaluate LLM-assisted ROSE in interventional pulmonology. The results suggest the potential feasibility of integrating this AI technology into the workflow within the pulmonary division. Larger prospective studies are needed to confirm effects on diagnostic yield.

View Source Full Text PDF

Topics

Journal Article

Large Language Models Utility for Rapid On-Site Evaluation in Interventional Pulmonology.

Authors

Affiliations (4)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?