BI-RADS-compliant structured mammography reporting using locally deployed large language models under privacy constraints.
Authors
Affiliations (8)
Affiliations (8)
- School of Physics and Electronic Engineering, Linyi University, Linyi, China.
- Information Center, Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University, Jinan, China.
- Department of Radiology, Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University, Jinan, China.
- Department of Radiology, Jinan Maternal and Child Health Care Hospital affiliated to Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China.
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China.
- School of Physics and Electronic Engineering, Linyi University, Linyi, China. [email protected].
- Information Center, Shandong Provincial Maternal and Child Health Care Hospital affiliated to Qingdao University, Jinan, China. [email protected].
- School of Radiology, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China. [email protected].
Abstract
To develop a privacy-preserving method for structuring free-text mammography reports using a locally fine-tuned, open-source large language model (LLM). In this multicenter study, 7161 unstructured mammography reports were collected from three institutions. The open-source Llama-3 model was fine-tuned via supervised learning using pseudo-labels from the commercial Qwen-Max model with low-rank adaptation. All labels were pseudo-labels generated by the commercial Qwen-Max model rather than human annotations. Structured outputs followed a BI-RADS-oriented nested JSON schema. Performance was evaluated across 23 features using Precision, Recall, and F1-score. Structural integrity was assessed using the JSON format accuracy (JFA) and field integrity accuracy (FIA) metrics. Statistical comparisons were performed using the paired Wilcoxon signed-rank test and Cohen's d effect size. A total of 7161 reports were retrospectively obtained from three institutions and analyzed. The fine-tuned model achieved strong performance at epoch 10 (Precision 0.942, Recall 0.929, F1-score 0.932), with JFA and FIA reaching 0.964 and 1.000, respectively, showing significant gains over the base model (p < 0.05, Cohen's d > 0.8). While slightly below Qwen-Max overall, the model exhibited moderate yet statistically significant differences (p < 0.05; 0.5 < Cohen's d < 0.8), particularly in the "special signs" category (F1 = 0.737 vs 0.947). This method effectively converts mammography reports into structured data using a locally fine-tuned, open-source LLM. Although there is a slight performance trade-off, it improves privacy and can be deployed locally. Its accuracy, clinical relevance, and compliance make it a practical solution for medical institutions. Question Free-text mammography reports lack standardization, making structured extraction difficult, while existing solutions often compromise privacy, adaptability, or require costly commercial tools. Findings Our fine-tuned LLaMA-3 model achieved high extraction accuracy (F1-score: 0.932) and complete structural integrity (FIA: 1.000) within a fully local deployment pipeline. Clinical relevance This method enables standardized and privacy-preserving mammography reporting without disruption to the clinical workflow, supporting safer AI integration, enhanced data quality, and compliance with regulations such as the General Data Protection Regulation (GDPR).