Back to all papers

Automated PROMISE V2 Scoring from PSMA PET/CT Reports Using Large Language Models: A Comparative Evaluation of Prompt Design and Model Performance.

June 9, 2026pubmed logopapers

Authors

Speicher T,Demirkol IE,Blickle A,Bastian MB,Maus S,Schaefer-Schuler A,Bartholomä M,Burgard C,Ezziddin S,Rosar F

Affiliations (2)

  • Department of Nuclear Medicine, Saarland University-Medical Center, 66421 Homburg, Germany.
  • Department of Nuclear Medicine, Friedrich-Alexander-Universität Erlangen-Nürnberg and Universitätsklinikum Erlangen, 91054 Erlangen, Germany.

Abstract

Large language models (LLMs) are increasingly explored for clinical use. However, the extent to which such models can reliably support physicians in reporting, staging, and the assessment of classification remains an active area of research. This study aimed to evaluate and compare multiple LLMs for automated PROMISE V2 classification for prostate cancer. A total of 126 unambiguous German-language PSMA PET/CT text reports were retrospectively analyzed, with reference standards established by expert consensus based on image interpretation and the original report text. Five LLMs (GPT-5.4, DeepSeek-V3.2, Claude Sonnet 4.6, Gemini 3 Flash and Grok 4) were assessed using two English-language prompting strategies of varying complexity. Agreement with the reference standard served as the primary endpoint. Performance varied in the short-prompt setting (36.5-79.4%) but improved consistently with the long prompt (74.6-86.5%), with Gemini 3 Flash achieving the highest agreement. Across PROMISE V2 subcategories, agreement rates were high (miT: 81.0-92.1%, miN: 92.9-96.0%, miM: 92.9-95.2%), despite inter-model differences. In conclusion, contemporary LLMs demonstrate promising performance in deriving PROMISE V2 scores from unambiguous original report texts, particularly when guided by detailed prompts.

Topics

Positron Emission Tomography Computed TomographyProstatic NeoplasmsGlutamate Carboxypeptidase IIJournal ArticleComparative Study

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.