Large Multimodal AI Models Compared for Lung Cancer CT Interpretation

July 30, 2025

A new study evaluates the diagnostic accuracy of three leading generative multimodal AI models in interpreting CT images for lung cancer detection.

Key Details

  • Three models compared: Gemini-pro-vision (Google), Claude-3-opus (Anthropic), and GPT-4-turbo (OpenAI).
  • On 184 malignant lung cases, Gemini achieved highest single-image accuracy (>90%), followed by Claude-3-opus, GPT lowest (65.2%).
  • Gemini's performance dropped to 58.5% with continuous CT slices, indicating challenges with spatial reasoning in imaging.
  • Simplified text prompts improved diagnostic AUCs: Gemini (0.76), GPT (0.73), and Claude (0.69).
  • Claude-3-opus showed superior consistency and lower variation in lesion feature analysis.
  • External validation with TCGA and MIDRC datasets supported findings, especially with simplified prompt strategies.

Why It Matters

This benchmark provides essential insight into the current capabilities and limitations of leading multimodal LLMs for radiological image analysis. Understanding model strengths, weaknesses, and prompt engineering strategies will guide their optimal integration into clinical workflows.

Read more

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.