New multimodal large language models (LLMs) like OpenAI o3 and Gemini 2.5 Pro demonstrated significant advancements in answering Japanese radiology board exam questions, particularly with image input.
Key Details
- 1Eight LLMs were tested on the Japan Diagnostic Radiology Board Examination (JDRBE).
- 2OpenAI o3 achieved 67% accuracy (text-only) and 72% with image input.
- 3Gemini 2.5 Pro also showed notable accuracy improvements with image data.
- 4Both OpenAI o3 and Gemini 2.5 Pro received higher legitimacy scores from radiologist raters than some competitors.
- 5The test set included 233 questions and 477 images (184 CT, 159 MRI, 15 x-ray, 90 nuclear medicine).
- 6Image input statistically improved diagnostic accuracy for several models.
Why It Matters

Source
AuntMinnie
Related News

New Report Highlights Clinical AI Performance, Sustainability, and Adoption Challenges
A multi-institutional review details key challenges, progress, and sustainability concerns in deploying clinical AI in real-world healthcare settings.

FDA Clears AI Platform for Comprehensive Cardiac Risk Assessment on CT
HeartLung Corporation's AI-CVD receives FDA clearance for opportunistic multi-condition screening on routine chest CT scans.

LLM Boosts Terminology Expansion in Radiology Reports Over RadLex
A large language model (LLM) significantly outperforms RadLex in expanding terms for radiology report language standardization.