Privacy-Preserving Generation of Structured Lymphoma Progression Reports from Cross-sectional Imaging: A Comparative Analysis of Llama 3.3 and Llama 4.
Authors
Affiliations (10)
Affiliations (10)
- Institute for Diagnostic and Interventional Radiology, School of Medicine and Health, TUM University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
- Institute of Diagnostic and Interventional Neuroradiology, School of Medicine and Health, TUM University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
- Institute for Cardiovascular Radiology and Nuclear Medicine, School of Medicine and Health, TUM University Hospital German Heart Center, Technical University of Munich, Munich, Germany.
- Department of Radiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany.
- Department of Interventional Radiology, Faculty of Medicine, Medical Center - University of Freiburg, Freiburg im Breisgau, Germany.
- Department of Neuroradiology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
- Department of Neuroradiology, Hôpital Maison-Blanche, CHU Reims, Université Reims-Champagne-Ardenne, Reims, France.
- Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany.
- Department of Radiology, University Hospital RWTH Aachen, Aachen, Germany.
- Institute for Diagnostic and Interventional Radiology, School of Medicine and Health, TUM University Hospital rechts der Isar, Technical University of Munich, Munich, Germany. [email protected].
Abstract
Efficient processing of radiology reports for monitoring disease progression is crucial in oncology. Although large language models (LLMs) show promise in extracting structured information from medical reports, privacy concerns limit their clinical implementation. This study evaluates the feasibility and accuracy of two of the most recent Llama models for generating structured lymphoma progression reports from cross-sectional imaging data in a privacy-preserving, real-world clinical setting. This single-center, retrospective study included adult lymphoma patients who underwent cross-sectional imaging and treatment between July 2023 and July 2024. We established a chain-of-thought prompting strategy to leverage the locally deployed Llama-3.3-70B-Instruct and Llama-4-Scout-17B-16E-Instruct models to generate lymphoma disease progression reports across three iterations. Two radiologists independently scored nodal and extranodal involvement, as well as Lugano staging and treatment response classifications. For each LLM and task, we calculated the F1 score, accuracy, recall, precision, and specificity per label, as well as the case-weighted average with 95% confidence intervals (CIs). Both LLMs correctly implemented the template structure for all 65 patients included in this study. Llama-4-Scout-17B-16E-Instruct demonstrated significantly greater accuracy in extracting nodal and extranodal involvement information (nodal: 0.99 [95% CI = 0.98-0.99] vs. 0.97 [95% CI = 0.95-0.96], p < 0.001; extranodal: 0.99 [95% CI = 0.99-1.00] vs. 0.99 [95% CI = 0.98-0.99], p = 0.013). This difference was more pronounced when predicting Lugano stage and treatment response (stage: 0.85 [95% CI = 0.79-0.89] vs. 0.60 [95% CI = 0.53-0.67], p < 0.001; treatment response: 0.88 [95% CI = 0.83-0.92] vs. 0.65 [95% CI = 0.58-0.71], p < 0.001). Neither model produced hallucinations of newly involved nodal or extranodal sites. The highest relative error rates were found when interpreting the level of disease after treatment. In conclusion, privacy-preserving LLMs can effectively extract clinical information from lymphoma imaging reports. While they excel at data extraction, they are limited in their ability to generate new clinical inferences from the extracted information. Our findings suggest their potential utility in streamlining documentation and highlight areas requiring optimization before clinical implementation.