New multimodal large language models (LLMs) like OpenAI o3 and Gemini 2.5 Pro demonstrated significant advancements in answering Japanese radiology board exam questions, particularly with image input.
Key Details
- 1Eight LLMs were tested on the Japan Diagnostic Radiology Board Examination (JDRBE).
- 2OpenAI o3 achieved 67% accuracy (text-only) and 72% with image input.
- 3Gemini 2.5 Pro also showed notable accuracy improvements with image data.
- 4Both OpenAI o3 and Gemini 2.5 Pro received higher legitimacy scores from radiologist raters than some competitors.
- 5The test set included 233 questions and 477 images (184 CT, 159 MRI, 15 x-ray, 90 nuclear medicine).
- 6Image input statistically improved diagnostic accuracy for several models.
Why It Matters

Source
AuntMinnie
Related News

Google's Gemini Outperforms Providers in Communicating IR Procedures
Large language models like Google's Gemini demonstrate higher accuracy and greater empathy than human providers when answering patient questions about interventional radiology.

Comparing False-Positive Findings: AI vs. Radiologists in DBT Screening
AI and radiologists differ in the types and patient characteristics of false-positive findings in digital breast tomosynthesis breast cancer screening.

Aidoc Receives FDA Breakthrough Status for Multi-Condition CT AI Triage
Aidoc has received FDA Breakthrough Device status for its AI solution that flags multiple critical conditions in CT scans.