Hybrid rule-based and on-premises LLM pipeline for extracting CMR and CPET metrics from free-text reports in repaired tetralogy of Fallot
Authors
Affiliations (1)
Affiliations (1)
- Department of Heart, Vascular, and Thoracic; Division of Cardiology and Cardiovascular Medicine; Cleveland Clinic Children's; Cleveland Clinic Children's Center
Abstract
BackgroundPatients with repaired tetralogy of Fallot (rTOF) require lifelong surveillance with cardiovascular magnetic resonance (CMR) and cardiopulmonary exercise testing (CPET). However, results are frequently stored as unstructured free-text reports, hindering large-scale analysis and research. ObjectivesThis study aimed to develop and evaluate a privacy-preserving hybrid natural language processing pipeline combining regular expressions (regex) and an on-premises large language model (LLM) to accurately extract key CMR and CPET metrics from legacy free-text reports in patients with rTOF. MethodsWe retrospectively analyzed 430 CMR and 262 CPET reports (2005-2023) from patients with rTOF. A two-stage hybrid pipeline was implemented: regex rules were applied first, followed by targeted prompting of an on-premises Llama-3.1-8B-Instruct LLM only when regex failed or returned ambiguous results. Performance was compared against regex-only and LLM-only approaches using coverage, precision, recall, and F1-score. ResultsIn CMR reports, the hybrid pipeline achieved perfect coverage (1.00) and F1-score (1.00) versus 0.98 coverage and 0.93 F1-score with regex alone, while reducing computational cost by ~75% compared with LLM-only. In CPET reports, the hybrid approach improved F1-score from 0.74 (regex alone) to 0.98, with particularly large gains for semantically complex variables (e.g., peak VO {square}, test termination reason). ConclusionsA hybrid regex-on-premises LLM pipeline provides near-perfect, efficient, and HIPAA-compliant extraction of clinical metrics from unstructured cardiology reports, offering a scalable solution for retrospective research and quality improvement in congenital heart disease.