New multimodal large language models (LLMs) like OpenAI o3 and Gemini 2.5 Pro demonstrated significant advancements in answering Japanese radiology board exam questions, particularly with image input.
Key Details
- 1Eight LLMs were tested on the Japan Diagnostic Radiology Board Examination (JDRBE).
- 2OpenAI o3 achieved 67% accuracy (text-only) and 72% with image input.
- 3Gemini 2.5 Pro also showed notable accuracy improvements with image data.
- 4Both OpenAI o3 and Gemini 2.5 Pro received higher legitimacy scores from radiologist raters than some competitors.
- 5The test set included 233 questions and 477 images (184 CT, 159 MRI, 15 x-ray, 90 nuclear medicine).
- 6Image input statistically improved diagnostic accuracy for several models.
Why It Matters
This study marks the first demonstration of statistically significant improvement in LLM diagnostic accuracy with image input on a radiology board exam, signaling meaningful progress for AI-assisted radiological training and assessment.

Source
AuntMinnie
Related News

•AuntMinnie
AI Tool Mirai Shows Robust Performance for Interval Breast Cancer Detection
The Mirai AI model significantly improves detection of interval breast cancers in negative screening mammograms.

•Radiology Business
AI Tool Predicts Interval Breast Cancer Risk from Negative Mammograms
AI can predict interval breast cancer risk up to three years after a negative mammogram.

•Radiology Business
AI Outperforms Radiologists in Predicting Lung Cancer Treatment Response
AI tools demonstrate higher accuracy than radiologists in predicting lung cancer treatment response from imaging studies.