Back to all papers

Automated Resectability Classification of Pancreatic Cancer CT Reports with Privacy-Preserving Open-Weight Large Language Models: A Multicenter Study.

Authors

Lee JH,Min JH,Gu K,Han S,Hwang JA,Choi SY,Song KD,Lee JE,Lee J,Moon JE,Adetyan H,Yang JD

Affiliations (7)

  • Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea.
  • Department of Radiology and Center for Imaging Science, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81 Irwon-ro Gangnam-gu, Seoul, 06351, Republic of Korea. [email protected].
  • Department of Radiology, Soonchunhyang University College of Medicine, Bucheon Hospital, Bucheon, Republic of Korea.
  • Department of Radiology, Chungnam National University Hospital, Chungnam National University College of Medicine, Daejeon, Republic of Korea.
  • Department of Radiology, Chungbuk National University, Chungbuk National University Hospital, Cheongju, Republic of Korea.
  • Department of Radiology, Kangdong Seong-Sim Hospital, Hallym University College of Medicine, Seoul, Republic of Korea.
  • Karsh Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA, USA.

Abstract

 To evaluate the effectiveness of open-weight large language models (LLMs) in extracting key radiological features and determining National Comprehensive Cancer Network (NCCN) resectability status from free-text radiology reports for pancreatic ductal adenocarcinoma (PDAC). Methods. Prompts were developed using 30 fictitious reports, internally validated on 100 additional fictitious reports, and tested using 200 real reports from two institutions (January 2022 to December 2023). Two radiologists established ground truth for 18 key features and resectability status. Gemma-2-27b-it and Llama-3-70b-instruct models were evaluated using recall, precision, F1-score, extraction accuracy, and overall resectability accuracy. Statistical analyses included McNemar's test and mixed-effects logistic regression. Results. In internal validation, Llama had significantly higher recall than Gemma (99% vs. 95%, p < 0.01) and slightly higher extraction accuracy (98% vs. 97%). Llama also demonstrated higher overall resectability accuracy (93% vs. 91%). In the internal test set, both models achieved 96% recall and 96% extraction accuracy. Overall resectability accuracy was 95% for Llama and 93% for Gemma. In the external test set, both models had 93% recall. Extraction accuracy was 93% for Llama and 95% for Gemma. Gemma achieved higher overall resectability accuracy (89% vs. 83%), but the difference was not statistically significant (p > 0.05). Conclusion. Open-weight models accurately extracted key radiological features and determined NCCN resectability status from free-text PDAC reports. While internal dataset performance was robust, performance on external data decreased, highlighting the need for institution-specific optimization.

Topics

Pancreatic NeoplasmsTomography, X-Ray ComputedCarcinoma, Pancreatic DuctalNatural Language ProcessingJournal ArticleMulticenter Study

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.