BI-RADS-compliant structured mammography reporting using locally deployed large language models under privacy constraints.

November 19, 2025

papers

DOI: 10.1007/s00330-025-12147-2 PMID: 41258455

Authors

Sheng W,Wang Y,Xiao L,Guo B,Zhang Y,Qiao L,Li Q,Yang L,Zhang Y

Affiliations (8)

School of Physics and Electronic Engineering, Linyi University, Linyi, China.
Information Center, Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University, Jinan, China.
Department of Radiology, Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University, Jinan, China.
Department of Radiology, Jinan Maternal and Child Health Care Hospital affiliated to Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China.
Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China.
School of Physics and Electronic Engineering, Linyi University, Linyi, China. [email protected].
Information Center, Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University, Jinan, China. [email protected].
School of Radiology, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China. [email protected].

Abstract

To develop a privacy-preserving method for structuring free-text mammography reports using a locally fine-tuned, open-source large language model (LLM). In this multicenter study, 7161 unstructured mammography reports were collected from three institutions. The open-source Llama-3 model was fine-tuned via supervised learning using pseudo-labels from the commercial Qwen-Max model with low-rank adaptation. All labels were pseudo-labels generated by the commercial Qwen-Max model rather than human annotations. Structured outputs followed a BI-RADS-oriented nested JSON schema. Performance was evaluated across 23 features using Precision, Recall, and F1-score. Structural integrity was assessed using the JSON format accuracy (JFA) and field integrity accuracy (FIA) metrics. Statistical comparisons were performed using the paired Wilcoxon signed-rank test and Cohen's d effect size. A total of 7161 reports were retrospectively obtained from three institutions and analyzed. The fine-tuned model achieved strong performance at epoch 10 (Precision 0.942, Recall 0.929, F1-score 0.932), with JFA and FIA reaching 0.964 and 1.000, respectively, showing significant gains over the base model (p < 0.05, Cohen's d > 0.8). While slightly below Qwen-Max overall, the model exhibited moderate yet statistically significant differences (p < 0.05; 0.5 < Cohen's d < 0.8), particularly in the "special signs" category (F1 = 0.737 vs 0.947). This method effectively converts mammography reports into structured data using a locally fine-tuned, open-source LLM. Although there is a slight performance trade-off, it improves privacy and can be deployed locally. Its accuracy, clinical relevance, and compliance make it a practical solution for medical institutions. Question Free-text mammography reports lack standardization, making structured extraction difficult, while existing solutions often compromise privacy, adaptability, or require costly commercial tools. Findings Our fine-tuned LLaMA-3 model achieved high extraction accuracy (F1-score: 0.932) and complete structural integrity (FIA: 1.000) within a fully local deployment pipeline. Clinical relevance This method enables standardized and privacy-preserving mammography reporting without disruption to the clinical workflow, supporting safer AI integration, enhanced data quality, and compliance with regulations such as the General Data Protection Regulation (GDPR).

View Source Full Text PDF

Topics

Journal Article

BI-RADS-compliant structured mammography reporting using locally deployed large language models under privacy constraints.

Authors

Affiliations (8)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?