DeepSeek-V3 and ChatGPT-4o excelled in accurately answering patient questions about interventional radiology procedures, suggesting LLMs' growing role in clinical communication.
Key Details
- 1Study evaluated four LLMs (ChatGPT-4o, DeepSeek-V3, OpenBioLLM-8b, BioMistral-7b) on 107 real-world patient questions covering TAPE, CT-guided HDR brachytherapy, and BEST.
- 2Questions and their answers were independently scored for accuracy by two board-certified radiologists using a Likert scale.
- 3DeepSeek-V3 achieved the highest mean scores for BEST (4.49) and CT-HDR (4.24), while matching ChatGPT-4o on TAPE (4.20 vs 4.17).
- 4OpenBioLLM-8b and BioMistral-7b scored significantly lower and produced potentially hazardous responses.
- 5LLMs' responses show promise for supporting—but not replacing—comprehensive medical consultations.
- 6Future studies should include patient feedback and focus on alignment with clinical guidelines.
Why It Matters

Source
AuntMinnie
Related News

AI Guidance Cuts Novice Ultrasound Exam Time by 34%
AI guidance significantly reduces exam times and enhances diagnostic quality for novice ultrasound operators performing shoulder exams.

NYC Health + Hospitals CEO Considers AI to Replace Radiologists
NYC Health + Hospitals CEO suggests AI could partially replace radiologists, pending regulatory approval.

AI Models Reveal Racial Disparities in Breast Cancer Patterns
Machine learning models reveal significant racial disparities and key predictors in breast cancer incidence across diverse groups.