A fine-tuned, domain-specific LLM (LLM-RadSum) outperforms GPT-4o in accurately summarizing radiology reports across multiple patient demographics and modalities.
Key Details
- 1LLM-RadSum, based on Llama2, was trained and evaluated on over 1 million CT and MRI radiology reports from five hospitals.
- 2The model achieved higher F1 scores in summarization compared to GPT-4o (0.58 vs. 0.3, p < 0.001), consistent across anatomic regions, modalities, sex, and ages.
- 388.9% of LLM-RadSum's outputs were 'completely consistent' with original reports, versus 43.1% for GPT-4o.
- 481.5% of LLM-RadSum outputs met senior radiologists’ standards for safety and clinical use; most GPT-4o outputs required minor edits.
- 5Human evaluation included 1,800 randomly selected reports, underscoring generalizability within diverse hospital settings.
Why It Matters
This study demonstrates the tangible benefits of local domain adaptation of LLMs in radiology report workflows, setting the stage for safer and more accurate clinical deployment of AI-generated content. It prompts strategic considerations for hospitals about investing in specialty versus general-purpose AI models for radiology.

Source
AuntMinnie
Related News

•AuntMinnie
AI Enhancement Dramatically Improves Quality of Suboptimal Chest CTs
AI-powered image enhancement significantly boosts the diagnostic quality of suboptimal chest CT and CTPA studies.

•AuntMinnie
AI Enables Safe 75% Gadolinium Reduction in Breast MRI Without Losing Sensitivity
AI-enhanced breast MRI with a 75% reduced gadolinium dose maintained diagnostic sensitivity comparable to full-dose protocols.

•Cardiovascular Business
Deep Learning AI Model Detects Coronary Microvascular Dysfunction Via ECG
A new AI algorithm rapidly detects coronary microvascular dysfunction using ECGs, with validation incorporating PET imaging.