Back to all papers

Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.

Authors

Choubey AP,Eguia E,Hollingsworth A,Chatterjee S,D'Angelica MI,Jarnagin WR,Wei AC,Schattner MA,Do RKG,Soares KC

Affiliations (4)

  • Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Artificial Intelligence & Machine Learning, Digital, Informatics and Technology Solutions (DigITs), Memorial Sloan Kettering Cancer Center, New York, New York.
  • Department of Gastroenterology, Hepatology, and Nutrition, Memorial Sloan Kettering Cancer Center, New York, New York.
  • Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York.

Abstract

Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports. A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison. Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts. LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.