Back to all papers

Hybrid rule-based and on-premises LLM pipeline for extracting CMR and CPET metrics from free-text reports in repaired tetralogy of Fallot

January 23, 2026medrxiv logopreprint

Authors

AKBASLI, I. T.,Beck, K. L.,Liou, W.,Du, S.,Nemer, G.,Baloglu, O.,Latifi, S. Q.,Marino, B. S.,Albahra, S.,Tandon, A.

Affiliations (1)

  • Department of Heart, Vascular, and Thoracic; Division of Cardiology and Cardiovascular Medicine; Cleveland Clinic Children's; Cleveland Clinic Children's Center

Abstract

BackgroundPatients with repaired tetralogy of Fallot (rTOF) require lifelong surveillance with cardiovascular magnetic resonance (CMR) and cardiopulmonary exercise testing (CPET). However, results are frequently stored as unstructured free-text reports, hindering large-scale analysis and research. ObjectivesThis study aimed to develop and evaluate a privacy-preserving hybrid natural language processing pipeline combining regular expressions (regex) and an on-premises large language model (LLM) to accurately extract key CMR and CPET metrics from legacy free-text reports in patients with rTOF. MethodsWe retrospectively analyzed 430 CMR and 262 CPET reports (2005-2023) from patients with rTOF. A two-stage hybrid pipeline was implemented: regex rules were applied first, followed by targeted prompting of an on-premises Llama-3.1-8B-Instruct LLM only when regex failed or returned ambiguous results. Performance was compared against regex-only and LLM-only approaches using coverage, precision, recall, and F1-score. ResultsIn CMR reports, the hybrid pipeline achieved perfect coverage (1.00) and F1-score (1.00) versus 0.98 coverage and 0.93 F1-score with regex alone, while reducing computational cost by ~75% compared with LLM-only. In CPET reports, the hybrid approach improved F1-score from 0.74 (regex alone) to 0.98, with particularly large gains for semantically complex variables (e.g., peak VO {square}, test termination reason). ConclusionsA hybrid regex-on-premises LLM pipeline provides near-perfect, efficient, and HIPAA-compliant extraction of clinical metrics from unstructured cardiology reports, offering a scalable solution for retrospective research and quality improvement in congenital heart disease.

Topics

cardiovascular medicine

Ready to Sharpen Your Edge?

Subscribe to join 9,300+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.