Back to all papers

Towards Automated FIGO Staging in Radiology: The Role of LLMs in Cervical and Endometrial Cancer.

February 4, 2026pubmed logopapers

Authors

Martín-Noguerol T,López-Úbeda P,Barrientos-Manrique EA,García-Ferrer M,Luna A

Affiliations (4)

  • MRI Unit, Radiology Department, HT medica, Carmelo Torres 2, 23007 Jaén, Spain (T.M.-N., A.L.). Electronic address: [email protected].
  • NLP Unit, HT medica, Jaén, Spain (P.L.-Ú.).
  • Radiology Department, HT medica, Sevilla, Spain (E.A.B.-M., M.G.-F.).
  • MRI Unit, Radiology Department, HT medica, Carmelo Torres 2, 23007 Jaén, Spain (T.M.-N., A.L.).

Abstract

Staging gynecological malignancies is a complex process, and radiologists should be familiar with the evolution of FIGO staging criteria. Large Language Models (LLMs) offer potential to support radiologists by automating classification tasks from free-text MRI reports. We conducted a retrospective study using two curated datasets of pelvic MRI reports from patients with cervical (n = 261, FIGO 2018) and endometrial cancer (n = 555, FIGO 2023). A general-purpose LLM (Cohere Command-A) was evaluated under three prompting strategies (zero-shot, guided, and chain-of-thought [CoT]), using exact stage accuracy, an ordinal FIGO distance metric, and the rate of severe errors. The Cohere Command-A model was chosen for its long-context reasoning, instruction-following capabilities, reproducible fixed version, and secure handling of sensitive clinical data. While alternative LLMs (eg, GPT-4o, Gemini, Llama-3, DeepSeek) could offer complementary insights, access, resources, and compliance constraints limited broader comparisons. For cervical cancer, CoT prompting achieved the highest accuracy (80.5%) and the lowest FIGO distance, with 23 severe misclassifications (≥2-stage deviation), outperforming guided and zero-shot prompting. For endometrial cancer, all strategies performed appropriately, with CoT again yielding the best results (accuracy, 90.6%) and the lowest number of severe misclassifications (37 cases), compared with guided and zero-shot prompting. In a small subset of cases with no agreement between any prompting strategy and the reference label, manual review showed that only a minority presented potentially suboptimal annotations, suggesting that CoT-based predictions may also help flag doubtful reports. The LLMs used demonstrated strong performance in automatically assigning FIGO stages for cervical and endometrial cancers from MRI reports. Their integration could reduce workload and improve consistency in staging. Further validation is needed before clinical implementation.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.