EchoGraph System for Automated Quality Assessment of Echocardiography Reports
Authors
Affiliations (1)
Affiliations (1)
- Mayo Clinic Rochester
Abstract
Generative AI needs automatic clinical text accuracy metrics, but none exist for echocardiography. To address this, we developed EchoGraph, a BERT-based model trained on 600 densely annotated echocardiography reports from the Mayo Clinic (2017), split 7:2:1 for training, validation, and testing, using a tailored schema with 48,256 entities and 29,731 relations annotated. Sixty random MIMIC-EchoNote reports were annotated (3,672 entities and 2,360 relations) for external validation. EchoGraph demonstrated strong performance predicting entities (micro F1 0.85) and relations (micro F1 0.70), maintaining performance on external validation (entity micro F1 0.80, relation micro F1 0.52). EchoGraph F1 score showed superior error sensitivity versus RadGraph F1, with 2.8-fold higher slope magnitude (-0.817 vs -0.291) and better variance explained (R2 = 0.803 vs 0.578). EchoGraph offers an effective solution for evaluating language model-based echocardiography applications, supporting more accurate AI-generated reports.