Back to all papers

Large Language Models Utility for Rapid On-Site Evaluation in Interventional Pulmonology.

May 28, 2026pubmed logopapers

Authors

Flaschner M,Kramer MR,Krolik A,Gershman E,Tovar A,Rosengarten D,Amor SM,Pertzov B,Heching M,Shtraichman O,Freidkin L

Affiliations (4)

  • Pulmonary Institute, Rabin Medical Center, Petah Tikva 4941492, Israel.
  • Gray School of Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.
  • Department of Pathology, Rabin Medical Center, Petah Tikva 4941492, Israel.
  • Institute of Pulmonary Medicine, Tel Aviv Sourasky Medical Center, Tel Aviv 6423906, Israel.

Abstract

<b>Background/Objectives</b>: Rapid on-site evaluation (ROSE) is a valuable technique in interventional procedures to immediately assess the adequacy and quality of biopsy specimens at the time they are obtained. The integration of artificial intelligence (AI) into ROSE workflows has demonstrated diagnostic accuracy comparable to that of experienced cytologists. However, clinical implementation of AI-based ROSE models is limited by complex and expensive development. In contrast, the use of free or near-free global large language models (LLMs) offers a significant advantage, making diagnostic support more accessible. Assess the diagnostic accuracy of the LLMs ChatGPT and Gemini in evaluating cytological smears during interventional pulmonology procedures. <b>Methods:</b> Retrospective evaluation of the efficacy of LLMs for assessment of cytological smears obtained from adult patients who underwent interventional bronchoscopic and ultrasound-guided biopsies between 2020 and 2025. Images of ROSE-prepared samples were analyzed by ChatGPT-4o, ChatGPT-5, ChatGPT-5 "thinking", and Gemini 2.5 models. <b>Results:</b> Forty-eight procedures in 47 patients (mean age 65 years) were analyzed; 79% of biopsies were malignant. Using the final histopathology report as reference, cytologists achieved balanced accuracy of 0.75 (Gwet's AC1 = 0.53, sensitivity 0.71, specificity 0.78). ChatGPT-5 "thinking" showed high concordance (accuracy 0.65, Gwet's AC1 = 0.81, sensitivity 1.00, specificity 0.30). Gemini reached an accuracy of 0.59 (Gwet's AC1 = 0.76, sensitivity 0.97, specificity 0.20). <b>Conclusions:</b> To our knowledge, this study is the first to evaluate LLM-assisted ROSE in interventional pulmonology. The results suggest the potential feasibility of integrating this AI technology into the workflow within the pulmonary division. Larger prospective studies are needed to confirm effects on diagnostic yield.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.