Back to all papers

Artificial intelligence for immunotherapy response assessment in lung cancer using PET/CT reports.

Authors

Ismayilov R,Altundag O,Gencoglu EA,Aktas A,Alparslan S,Ozcicek A,Turhanoglu D,Oguz A,Farzaliyeva A,Ramazanoglu MN,Kocak M,Akcali Z

Affiliations (5)

  • Department of Medical Oncology, Baskent University Faculty of Medicine, Ankara, Türkiye. [email protected].
  • Department of Medical Oncology, Baskent University Faculty of Medicine, Ankara, Türkiye.
  • Department of Nuclear Medicine, Baskent University Faculty of Medicine, Ankara, Türkiye.
  • Baskent University Faculty of Medicine, Ankara, Türkiye.
  • Department of Medical Informatics, Baskent University Faculty of Medicine, Ankara, Türkiye.

Abstract

Accurate and timely assessment of immunotherapy response is vital for optimizing lung cancer management. This study evaluates the efficacy of large language models (LLMs) in automating response assessment using positron emission tomography/computed tomography (PET/CT) reports based on the European Organization for Research and Treatment of Cancer (EORTC) criteria. An effective prompting strategy was developed using Google Gemini 2.5 Pro Experimental 03-25, with explicit instructions for applying EORTC criteria via few-shot prompting. This prompt was then tested with both Gemini 2.5 Pro and OpenAI ChatGPT 4o to assess cross-model performance. Pre- and post-immunotherapy PET-CT reports in text format from 36 lung cancer patients were independently classified by the LLMs and an experienced nuclear medicine specialist. Performance metrics, including precision, recall, F1-score, and support, were calculated for each response category. Inter-rater agreement was assessed using Cohen's Kappa. The nuclear medicine specialist classified 5, 21, 6, and 4 reports as complete metabolic response (CMR), progressive metabolic disease (PMD), partial metabolic response (PMR), and stable metabolic disease (SMD), respectively, while Gemini 2.5 Pro classified 4, 21, 8, and 3 of them. Gemini achieved an overall accuracy of 94% and demonstrated strong agreement with the expert (overall Cohen's Kappa: 0.907). F1-scores were 0.86 for PMR and SMD, 0.89 for CMR, and 1.00 for PMD, with per-label Kappa scores ranging from 0.824 (PMR) to 1.00 (PMD). In comparison, ChatGPT 4o achieved perfect agreement with the expert across all 36 cases (accuracy = 100%, Cohen's Kappa = 1.000). When guided by a structured and task-specific prompt, both Gemini 2.5 Pro and ChatGPT 4o demonstrated strong capability for automating accurate immunotherapy response assessment in lung cancer using PET-CT reports. These results underscore the potential of LLMs to streamline clinical workflows and improve efficiency. Validation with larger data sets is warranted to support clinical implementation.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.