Can chatGPT-4o reliably standardize PSMA PET/CT and PET/MRI reports using PROMISE V2 criteria? - An exploratory study.

June 26, 2026

DOI: 10.1186/s13550-026-01475-z PMID: 42360587

Authors

Hinterberger A,Mangold MH,Weigel C,Hartmann H,Nörenberg D,Froelich MF,Ebner R,Haney-Aubert CM,Kowalewski KF,Schönberg SO,Grawe F

Affiliations (10)

DKFZ Hector Cancer Institute at the University Medical Center Mannheim, Heidelberg, Germany.
Junior Clinical Cooperation Unit Translational Molecular Imaging in Oncologic Therapy Monitoring (E310), German Cancer Research Center, Heidelberg, Germany.
Junior Clinical Cooperation Unit Intelligent Systems and Robotics in Urology (ISRU), German Cancer Research Center, Heidelberg, Germany.
Department of Urology and Urologic Surgery, University Medical Centre Mannheim, University of Heidelberg, Mannheim, Germany.
Department of Radiology and Nuclear Medicine, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany.
Department of Radiology, LMU University Hospital, LMU Munich, Munich, Germany.
DKFZ Hector Cancer Institute at the University Medical Center Mannheim, Heidelberg, Germany. [email protected].
Junior Clinical Cooperation Unit Translational Molecular Imaging in Oncologic Therapy Monitoring (E310), German Cancer Research Center, Heidelberg, Germany. [email protected].
Department of Radiology and Nuclear Medicine, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany. [email protected].
German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany. [email protected].

Abstract

Structured reporting standardizes and facilitates reporting, improves accurate communication, and ultimately clinical decision-making. Although standardized frameworks such as PROMISE criteria are available for prostate-specific membrane antigen positron emission tomography (PSMA PET) for prostate cancer patients, free-text reporting remains predominant in both clinical routine and trials. Large language models (LLMs) may enable low-effort, time-efficient extraction of structured classifications from narrative reports. This study evaluated the performance of ChatGPT-4o for extracting PROMISE V2-based classifications from unstructured PSMA-PET/CT and PET/MRI reports. For PSMA-PET/CT, overall miTNM accuracy was 79.8%, whereas PSMA-PET/MRI achieved a significantly higher accuracy of 91.0% (OR = 2.80, 95% CI: 1.32-6.51, p = 0.003). Component-wise, PET/MRI outperformed PET/CT in T-stage classification (83.8% vs. 57.7%; OR = 3.83, 95% CI: 1.34-12.69, p = 0.006) and demonstrated numerically higher N-stage classification accuracy (100% vs. 85.9%, p = 0.014), while M-stage classification was comparable between modalities (89.1% vs. 95.7%; OR = 0.84, 95% CI: 0.20-4.19, p = 0.748). PRIMARY score accuracy was also comparable for PET/CT and PET/MRI (70.4% vs. 88.1%; OR = 0.43, 95% CI: 0.05-2.14, p = 0.315). ChatGPT-4o's rationale for classifications was rated highly plausible across modalities, with a minimum Likert score of ≥ 4.8 for miTNM and 4.1 for PRIMARY. ChatGPT-4o enables reliable extraction of PROMISE V2-based N- and M-stage classifications from free-text PSMA-PET reports, with limited accuracy for T-stage. This work provides a first step toward leveraging LLMs to support structured and efficient reporting in PSMA PET imaging and points out present limitations.

View Source Full Text PDF

Topics

Journal Article

Can chatGPT-4o reliably standardize PSMA PET/CT and PET/MRI reports using PROMISE V2 criteria? - An exploratory study.

Authors

Affiliations (10)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?