Back to all papers

Staging Prostate Cancer with AI: A Comparative Study of Large Language Models and Expert Interpretation on PSMA PET-CT Reports.

December 5, 2025pubmed logopapers

Authors

Ismayilov R,Aktas A,Gencoglu EA,Oguz A,Altundag O,Akcali Z

Affiliations (4)

  • Department of Medical Oncology, Baskent University, Faculty of Medicine, Ankara, Türkiye. [email protected].
  • Department of Nuclear Medicine, Baskent University, Faculty of Medicine, Ankara, Türkiye.
  • Department of Medical Oncology, Baskent University, Faculty of Medicine, Ankara, Türkiye.
  • Department of Medical Informatics, Baskent University, Faculty of Medicine, Ankara, Türkiye.

Abstract

Accurate staging of prostate cancer is essential for therapeutic decision-making. While PSMA PET-CT reports offer rich clinical data, their unstructured format hinders large-scale analysis. Recent advances in large language models (LLMs) offer new opportunities to extract structured information from narrative radiology reports. However, their ability to perform multi-step clinical reasoning, particularly for cancer staging, remains underexplored. In this feasibility study, 80 anonymized, Turkish-language PSMA PET-CT reports were independently interpreted by two LLMs-Gemini 2.5 Pro (Google) and ChatGPT 4o (OpenAI). Using a structured prompt containing an embedded knowledge base (AJCC/CHAARTED criteria) and few-shot examples, both LLMs generated classifications for T, N, M, and overall clinical stage/disease volume. Outputs were benchmarked against expert classifications by a senior nuclear medicine specialist. Performance was evaluated using accuracy, precision, recall, F1-score, and Cohen's kappa. For the composite task of classifying clinical stage and disease volume, Gemini 2.5 Pro achieved an accuracy of 93.8% (95% CI: 86.0-97.9) and a Cohen's kappa of 0.910 (95% CI: 0.834-0.986), while ChatGPT 4o achieved 91.3% accuracy (95% CI: 82.8-96.4) with a kappa of 0.874 (95% CI: 0.786-0.962). For T staging, Gemini showed a higher accuracy point estimate (95.0% [95% CI: 87.7-98.6] vs. 91.3% [95% CI: 82.8-96.4]), while both models excelled at the binary N and M classifications, achieving accuracies above 95% and kappa values indicating near-perfect agreement (κ > 0.900). LLMs, when guided by expert-informed prompt engineering, can accurately stage prostate cancer from free-text PSMA PET-CT reports and may serve as a powerful assistive tool for data automation, research acceleration, and quality assurance.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.