Large Language Models for Cardiac MRI Diagnosis Based on Standardized Text Descriptions.

April 23, 2026

papers

DOI: 10.1002/jmri.70327 PMID: 42026856

Authors

Zhang H,Zhou J,Zhang C,Lu G,Lu Z,Wang L,Wang L,Gong H,Zhao L,Ma X

Affiliations (3)

Department of Interventional Diagnosis and Treatment, Beijing Anzhen Hospital Affiliated to Capital Medical University, Beijing, China.
Department of Radiology, Beijing Anzhen Hospital Affiliated to Capital Medical University, Beijing, China.
Capital Medical University, Beijing, China.

Abstract

MRI is important for cardiac disease evaluation, but accurate diagnosis remains challenging in less experienced centers. Although large language models (LLMs) have shown promise in medical imaging diagnosis, their application in cardiac MRI is limited. LLMs may be effective in achieving cardiac MRI diagnosis based on standardized descriptions. Retrospective. A total of 203 hypertrophic cardiomyopathy, 186 dilated cardiomyopathy, 46 hypertensive heart disease, 198 ischemic cardiomyopathy, 38 constrictive pericarditis, 45 cardiac amyloidosis, 91 myocarditis, and 144 normal controls. Balanced steady-state free-precession, short tau inversion recovery, and breath-hold inversion-recovery segmented gradient-echo sequences at 3.0 T. Clinical and cardiac MRI information from each subject was converted into standardized descriptions and input into Generative Pre-trained Transformer-4.5 (GPT-4.5), GPT-4 Omni (GPT-4o), Deepseek-V3, and Deepseek-R1 LLMs. Cardiac MRI information included LV function, wall thickness and motion, and abnormalities on T2WI, perfusion and late gadolinium enhancement sequences. Each model was asked to generate an imaging diagnosis. In addition, a medical student (8 months experience) and three radiologists (junior, mid-level and senior: with 3, 6, and 10 years' experience, respectively) provided diagnoses based on cardiac MRI images and clinical information. Frequency-weighted sensitivity and specificity were calculated. The diagnostic performances of the LLMs and human readers were compared using the McNemar test with Bonferroni correction. A p value < 0.05 was considered significant. All LLMs showed excellent frequency-weighted specificity (0.973-0.983). The frequency-weighted sensitivities of all LLMs were not significantly different from that of the junior radiologist, were significantly higher than that of the medical student, and significantly inferior to those of the senior radiologist (GPT-4.5: 0.863, GPT-4o: 0.821, Deepseek-V3: 0.843, and Deepseek-R1: 0.851 vs. junior radiologist: 0.850, all adjusted p = 1.000; vs. medical student: 0.731, all adjusted p < 0.001; vs. senior radiologist: 0.942, all adjusted p < 0.001). Additionally, the mid-level radiologist achieved a frequency-weighted sensitivity of 0.895, outperforming all LLMs except GPT-4.5. LLMs may generate accurate diagnoses from standardized cardiac MRI descriptions, potentially benefiting less experienced physicians. Stage 5.

View Source Full Text PDF

Topics

Journal Article

Large Language Models for Cardiac MRI Diagnosis Based on Standardized Text Descriptions.

Authors

Affiliations (3)

Abstract

Tags

Topics

Ready to Sharpen Your Edge?