Stanford researchers introduce Merlin, a 3D vision-language AI model for interpreting abdominal CT scans, demonstrating strong performance across multiple radiology tasks.
Key Details
- 1Merlin is a 3D vision-language model designed for abdominal CT interpretation.
- 2Evaluated on over 44,000 CT scans from multiple sites covering various anatomies.
- 3Trained on over 6 million CT images, ~2 million diagnosis codes, and 6 million radiology report tokens.
- 4Tested on 752 individual tasks: zero-shot classification, disease risk prediction, cross-modal retrieval, report generation, and 3D organ segmentation.
- 5Achieved 0.741 F1 score for zero-shot classification and AUROC of 0.757 for chronic disease risk prediction over five years.
- 6Model and code are publicly available on GitHub, HuggingFace, and PyPI.
Why It Matters

Source
AuntMinnie
Related News

Study: Computer Vision Models Best LLMs in Chest CT Breast Abnormality Detection
Computer vision models (CVMs) surpass large language models (LLMs) in accurately labeling incidental breast abnormalities on chest CT scans.

Radiology Maintains Lead in FDA-Cleared AI Algorithms, Cardiology Follows
Radiology remains the top specialty for FDA-cleared AI, with cardiology as a strong second, particularly in cardiovascular imaging.

Deep Learning Models Rival Radiologists for Pancreatic Cancer Detection on CT
Deep-learning models achieved comparable or superior accuracy to experienced radiologists in detecting pancreatic cancer on CT scans, especially for small tumors.