A fine-tuned, domain-specific LLM (LLM-RadSum) outperforms GPT-4o in accurately summarizing radiology reports across multiple patient demographics and modalities.
Key Details
- 1LLM-RadSum, based on Llama2, was trained and evaluated on over 1 million CT and MRI radiology reports from five hospitals.
- 2The model achieved higher F1 scores in summarization compared to GPT-4o (0.58 vs. 0.3, p < 0.001), consistent across anatomic regions, modalities, sex, and ages.
- 388.9% of LLM-RadSum's outputs were 'completely consistent' with original reports, versus 43.1% for GPT-4o.
- 481.5% of LLM-RadSum outputs met senior radiologists’ standards for safety and clinical use; most GPT-4o outputs required minor edits.
- 5Human evaluation included 1,800 randomly selected reports, underscoring generalizability within diverse hospital settings.
Why It Matters
This study demonstrates the tangible benefits of local domain adaptation of LLMs in radiology report workflows, setting the stage for safer and more accurate clinical deployment of AI-generated content. It prompts strategic considerations for hospitals about investing in specialty versus general-purpose AI models for radiology.

Source
AuntMinnie
Related News

•AuntMinnie
AI-Based Slab Reconstruction Streamlines Digital Breast Tomosynthesis
AI-driven slab reconstruction in DBT improves workflow efficiency without compromising diagnostic accuracy in breast cancer screening.

•AuntMinnie
AI Model Predicts Dosimetry for Lu-177 PSMA Therapy Using PET/CT
A machine learning PET/CT model shows promise for predicting radiation dose prior to Lu-177 PSMA therapy in prostate cancer patients.

•AuntMinnie
AI Model Uses Ultrasound to Assess Fetal Lung Maturity
Researchers demonstrated an AI model's strong accuracy in measuring fetal lung maturity from ultrasound images.