Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.

March 15, 2026

papers

DOI: 10.1016/j.ijmedinf.2025.106227 PMID: 41483727

Authors

Dutta N,Bose K,Syailendra E,Chu L,Gupta P

Affiliations (3)

Department of Radiodiagnosis, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India.
The Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States.
Department of Radiodiagnosis, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India. Electronic address: [email protected].

Abstract

Radiology faces an unprecedented workload crisis, creating demand for AI solutions to enhance efficiency and quality. Vision-language models (VLMs) represent a paradigm shift from narrow AI tools to integrated systems for image interpretation and report generation. However, their rapid technical progress has outpaced rigorous clinical validation, creating a critical gap between their theoretical potential and safe, practical deployment. To critically review the state of VLMs in diagnostic imaging by evaluating their clinical validation, identifying deployment challenges, and assessing their impact on the radiological workflow. This review provides a roadmap for responsible clinical integration by analyzing the gap between model performance and real-world utility. A narrative review of literature was conducted from January 2017 to May 2025. The search focused on VLM applications in radiology, including automated report generation and visual question answering. We synthesized findings from technical and clinical validation studies, thematically organized around architectural evolution, applications, validation, and implementation barriers. A clear progression from encoder-decoder models to sophisticated LLM-integrated foundation models was identified. While these models achieve high performance on NLP metrics, their clinical utility is limited. Key findings include: (1) Pervasive model hallucination, with factual errors in ∼ 22 % of AI-generated reports; (2) A lack of external validation on diverse, multi-institutional datasets; (3) Significant implementation barriers, including high computational costs, poor workflow integration, and unresolved liability. Human expert evaluations show that while AI-generated reports for routine cases are often acceptable (77.7 % in one study), accuracy declines significantly in complex cases. VLMs hold transformative potential but are not ready for autonomous clinical use. Their primary value lies in augmenting radiologists' workflow. For successful adoption, the field must shift focus from algorithmic metrics to proving clinical safety and efficacy through rigorous validation, developing robust hallucination mitigation strategies, and designing seamless workflow integrations.

View Source Full Text PDF

Topics

Diagnostic ImagingArtificial IntelligenceImage Interpretation, Computer-AssistedJournal ArticleReview

Vision-language models in diagnostic imaging: review of technical advances, clinical validation, and practical deployment.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?