Stanford researchers introduce Merlin, a 3D vision-language AI model for interpreting abdominal CT scans, demonstrating strong performance across multiple radiology tasks.
Key Details
- 1Merlin is a 3D vision-language model designed for abdominal CT interpretation.
- 2Evaluated on over 44,000 CT scans from multiple sites covering various anatomies.
- 3Trained on over 6 million CT images, ~2 million diagnosis codes, and 6 million radiology report tokens.
- 4Tested on 752 individual tasks: zero-shot classification, disease risk prediction, cross-modal retrieval, report generation, and 3D organ segmentation.
- 5Achieved 0.741 F1 score for zero-shot classification and AUROC of 0.757 for chronic disease risk prediction over five years.
- 6Model and code are publicly available on GitHub, HuggingFace, and PyPI.
Why It Matters
This foundation model provides a significant step forward in automating CT scan interpretation, potentially alleviating radiologist shortages and expediting workflow. Its public release allows the broader research community to build and benchmark further advances.

Source
AuntMinnie
Related News

•AuntMinnie
LLM Boosts Accuracy and Clarity of Patient Radiology Report Translations
A study found GPT-o1 effectively simplified and accurately translated emergency radiology reports into multiple languages, outperforming Google Translate.

•Radiology Business
AI Rarely Mentioned in Radiology Job Listings Despite Widespread Adoption
A new report finds that AI is rarely specified in radiology job postings, despite its broad use in imaging.

•AuntMinnie
Highlights from Recent AI Research in Digital X-Ray Imaging
AuntMinnie Digital X-Ray Insider covers the latest AI advancements and challenges in x-ray imaging.