
Latest multimodal large language models show limitations on image-based radiology exam questions.
Key Details
- 1Researchers tested ChatGPT-4v and ChatGPT-4o on 222 image-based multiple-choice questions from national radiology board exams (2020 and 2024).
- 2These LLMs have been recently trained to process both text and images.
- 3Despite advancements, significant concerns remain regarding their reliability for diagnostic tasks in radiology.
- 4The potential of such models in radiology workflows, such as report generation and diagnostic support, is still under early investigation.
Why It Matters
As large language models gain capability for image analysis, assessing their reliability is crucial for safe deployment in radiology. Failures on board-style questions highlight the need for ongoing scrutiny before clinical trust is warranted.

Source
Radiology Business
Related News

•AuntMinnie
Study: Computer Vision Models Best LLMs in Chest CT Breast Abnormality Detection
Computer vision models (CVMs) surpass large language models (LLMs) in accurately labeling incidental breast abnormalities on chest CT scans.

•AuntMinnie
Deep Learning Models Rival Radiologists for Pancreatic Cancer Detection on CT
Deep-learning models achieved comparable or superior accuracy to experienced radiologists in detecting pancreatic cancer on CT scans, especially for small tumors.

•Radiology Business
Radiology AI Devices at Elevated Risk for FDA Recalls, Study Finds
Radiology AI devices are more likely to face FDA recalls, largely due to deviations from intended use and incomplete clinical data.