University of Toronto researchers found that large language models (LLMs) such as DeepSeek V3 and GPT-4o offer promising support for radiology decision-making in pancreatic cancer when their recommendations cite guideline sources.
Key Details
- 1LLMs were tested for generating NCCN-compliant management plans for 328 pancreatic ductal adenocarcinoma (PDAC) cases.
- 2DeepSeek V3 had a 100% completion rate and 1.5% discordance; GPT-4o had a 96.3% completion rate and 8.8% discordance, a statistically significant difference.
- 3Both LLMs had high (>91%) category-specific concordance, though DeepSeek outperformed GPT-4o—except GPT-4o had 86% for locally advanced nonresectable cancer.
- 4Radiologist review flagged occasional inaccurate recommendations, including misclassification of tumor resectability and overtreatment.
- 5Researchers emphasized that LLM explainability and the ability to cite guidelines are key for clinical trust and workflow integration.
Why It Matters

Source
AuntMinnie
Related News

Paul Chang Discusses Foundation Models and Agentic AI at RSNA 2025
Dr. Paul Chang shares his insights on the role of foundation models and agentic AI in radiology at RSNA 2025.

AI Model Using Mammograms Enhances Five-Year Breast Cancer Risk Assessment
A new image-only AI model more accurately predicts five-year breast cancer risk than breast density alone, according to multinational research presented at RSNA 2025.

AI Model Uses CT Scans to Reveal Biomarker for Chronic Stress
Researchers developed an AI model to measure chronic stress using adrenal gland volume on routine CT scans.