DeepSeek-V3 and ChatGPT-4o excelled in accurately answering patient questions about interventional radiology procedures, suggesting LLMs' growing role in clinical communication.
Key Details
- 1Study evaluated four LLMs (ChatGPT-4o, DeepSeek-V3, OpenBioLLM-8b, BioMistral-7b) on 107 real-world patient questions covering TAPE, CT-guided HDR brachytherapy, and BEST.
- 2Questions and their answers were independently scored for accuracy by two board-certified radiologists using a Likert scale.
- 3DeepSeek-V3 achieved the highest mean scores for BEST (4.49) and CT-HDR (4.24), while matching ChatGPT-4o on TAPE (4.20 vs 4.17).
- 4OpenBioLLM-8b and BioMistral-7b scored significantly lower and produced potentially hazardous responses.
- 5LLMs' responses show promise for supporting—but not replacing—comprehensive medical consultations.
- 6Future studies should include patient feedback and focus on alignment with clinical guidelines.
Why It Matters

Source
AuntMinnie
Related News

AI Model Accurately Detects Pediatric Physeal Fractures on X-Ray
A deep learning model accurately identifies hard-to-detect physeal fractures in children's wrist x-rays.

AI Advancements and Studies Highlighted in Digital X-Ray Insider
This edition covers AI models for fracture detection, mortality prediction, and more, along with new research using x-ray and DEXA modalities.

Adult-Trained Radiology AI Models Struggle in Pediatric Imaging
Adult-trained radiology AI models often underperform when applied to pediatric imaging data, according to a systematic review.