DeepSeek-V3 and ChatGPT-4o excelled in accurately answering patient questions about interventional radiology procedures, suggesting LLMs' growing role in clinical communication.
Key Details
- 1Study evaluated four LLMs (ChatGPT-4o, DeepSeek-V3, OpenBioLLM-8b, BioMistral-7b) on 107 real-world patient questions covering TAPE, CT-guided HDR brachytherapy, and BEST.
- 2Questions and their answers were independently scored for accuracy by two board-certified radiologists using a Likert scale.
- 3DeepSeek-V3 achieved the highest mean scores for BEST (4.49) and CT-HDR (4.24), while matching ChatGPT-4o on TAPE (4.20 vs 4.17).
- 4OpenBioLLM-8b and BioMistral-7b scored significantly lower and produced potentially hazardous responses.
- 5LLMs' responses show promise for supporting—but not replacing—comprehensive medical consultations.
- 6Future studies should include patient feedback and focus on alignment with clinical guidelines.
Why It Matters

Source
AuntMinnie
Related News

Radiologists Prefer Domain-Specific AI for CT Report Generation
Radiologists show a clear preference for domain-specific AI models in generating accurate and timely CT report impressions.

Radiology Receives Declining Share of Industry Research Funding
Radiologists received only 1.1% of industry-funded research payments in 2024, with a continuing downward trend.

GPT-4o AI Matches Radiologists in Follow-Up Imaging Recommendations
GPT-4o matched the performance of experienced radiologists and surpassed residents in recommending follow-up imaging from routine radiology reports.