Back to all papers

Large Language Models for Cardiac MRI Diagnosis Based on Standardized Text Descriptions.

April 23, 2026pubmed logopapers

Authors

Zhang H,Zhou J,Zhang C,Lu G,Lu Z,Wang L,Wang L,Gong H,Zhao L,Ma X

Affiliations (3)

  • Department of Interventional Diagnosis and Treatment, Beijing Anzhen Hospital Affiliated to Capital Medical University, Beijing, China.
  • Department of Radiology, Beijing Anzhen Hospital Affiliated to Capital Medical University, Beijing, China.
  • Capital Medical University, Beijing, China.

Abstract

MRI is important for cardiac disease evaluation, but accurate diagnosis remains challenging in less experienced centers. Although large language models (LLMs) have shown promise in medical imaging diagnosis, their application in cardiac MRI is limited. LLMs may be effective in achieving cardiac MRI diagnosis based on standardized descriptions. Retrospective. A total of 203 hypertrophic cardiomyopathy, 186 dilated cardiomyopathy, 46 hypertensive heart disease, 198 ischemic cardiomyopathy, 38 constrictive pericarditis, 45 cardiac amyloidosis, 91 myocarditis, and 144 normal controls. Balanced steady-state free-precession, short tau inversion recovery, and breath-hold inversion-recovery segmented gradient-echo sequences at 3.0 T. Clinical and cardiac MRI information from each subject was converted into standardized descriptions and input into Generative Pre-trained Transformer-4.5 (GPT-4.5), GPT-4 Omni (GPT-4o), Deepseek-V3, and Deepseek-R1 LLMs. Cardiac MRI information included LV function, wall thickness and motion, and abnormalities on T2WI, perfusion and late gadolinium enhancement sequences. Each model was asked to generate an imaging diagnosis. In addition, a medical student (8 months experience) and three radiologists (junior, mid-level and senior: with 3, 6, and 10 years' experience, respectively) provided diagnoses based on cardiac MRI images and clinical information. Frequency-weighted sensitivity and specificity were calculated. The diagnostic performances of the LLMs and human readers were compared using the McNemar test with Bonferroni correction. A p value < 0.05 was considered significant. All LLMs showed excellent frequency-weighted specificity (0.973-0.983). The frequency-weighted sensitivities of all LLMs were not significantly different from that of the junior radiologist, were significantly higher than that of the medical student, and significantly inferior to those of the senior radiologist (GPT-4.5: 0.863, GPT-4o: 0.821, Deepseek-V3: 0.843, and Deepseek-R1: 0.851 vs. junior radiologist: 0.850, all adjusted p = 1.000; vs. medical student: 0.731, all adjusted p < 0.001; vs. senior radiologist: 0.942, all adjusted p < 0.001). Additionally, the mid-level radiologist achieved a frequency-weighted sensitivity of 0.895, outperforming all LLMs except GPT-4.5. LLMs may generate accurate diagnoses from standardized cardiac MRI descriptions, potentially benefiting less experienced physicians. Stage 5.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.