Comparative analysis of natural language processing methodologies for classifying computed tomography enterography reports in Crohn's disease patients.
Authors
Affiliations (7)
Affiliations (7)
- College of Health Sciences, University of Alberta, Edmonton, AB, Canada.
- College of Natural and Applied Sciences, University of Alberta, Edmonton, AB, Canada.
- Department of Science, University of Alberta, Camrose, AB, Canada.
- Alberta Machine Intelligence Institute (Amii), Edmonton, AB, Canada.
- College of Health Sciences, University of Alberta, Edmonton, AB, Canada. [email protected].
- College of Natural and Applied Sciences, University of Alberta, Edmonton, AB, Canada. [email protected].
- Charité-Universitätsmedizin Berlin, Berlin, Germany. [email protected].
Abstract
Imaging is crucial to assess disease extent, activity, and outcomes in inflammatory bowel disease (IBD). Artificial intelligence (AI) image interpretation requires automated exploitation of studies at scale as an initial step. Here we evaluate natural language processing to classify Crohn's disease (CD) on CTE. From our population representative IBD registry a sample of CD patients (male: 44.6%, median age: 50 IQR37-60) and controls (n = 981 each) CTE reports were extracted and split into training- (n = 1568), development- (n = 196), and testing (n = 198) datasets each with around 200 words and balanced numbers of labels, respectively. Predictive classification was evaluated with CNN, Bi-LSTM, BERT-110M, LLaMA-3.3-70B-Instruct and DeepSeek-R1-Distill-LLaMA-70B. While our custom IBDBERT finetuned on expert IBD knowledge (i.e. ACG, AGA, ECCO guidelines), outperformed rule- and rationale extraction-based classifiers (accuracy 88.6% with pre-tuning learning rate 0.00001, AUC 0.945) in predictive performance, LLaMA, but not DeepSeek achieved overall superior results (accuracy 91.2% vs. 88.9%, F1 0.907 vs. 0.874).