
A new study evaluates the diagnostic accuracy of three leading generative multimodal AI models in interpreting CT images for lung cancer detection.
Key Details
- 1Three models compared: Gemini-pro-vision (Google), Claude-3-opus (Anthropic), and GPT-4-turbo (OpenAI).
- 2On 184 malignant lung cases, Gemini achieved highest single-image accuracy (>90%), followed by Claude-3-opus, GPT lowest (65.2%).
- 3Gemini's performance dropped to 58.5% with continuous CT slices, indicating challenges with spatial reasoning in imaging.
- 4Simplified text prompts improved diagnostic AUCs: Gemini (0.76), GPT (0.73), and Claude (0.69).
- 5Claude-3-opus showed superior consistency and lower variation in lesion feature analysis.
- 6External validation with TCGA and MIDRC datasets supported findings, especially with simplified prompt strategies.
Why It Matters
This benchmark provides essential insight into the current capabilities and limitations of leading multimodal LLMs for radiological image analysis. Understanding model strengths, weaknesses, and prompt engineering strategies will guide their optimal integration into clinical workflows.

Source
EurekAlert
Related News

•EurekAlert
AI Model Accurately Predicts Blood Loss Risk in Liposuction
A machine learning model predicts blood loss during high-volume liposuction with 94% accuracy.

•EurekAlert
AI-Driven CT Tool Predicts Cancer Spread in Oropharyngeal Tumors
Researchers have created an AI tool that uses CT imaging to predict the spread risk of oropharyngeal cancer, offering improved treatment stratification.

•EurekAlert
AI Model PRTS Predicts Spatial Transcriptomics From H&E Histology Images
Researchers developed PRTS, a deep learning model that infers single-cell spatial transcriptomics from standard H&E-stained tissue images.