CineScribe: LLM-based detection and mitigation of ambiguity in cine cardiac magnetic resonance reports.
Authors
Affiliations (4)
Affiliations (4)
- Barcelona Institute for Global Health (ISGlobal), C/ del Dr. Aiguader 88, Barcelona, 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain.
- Cardiac Imaging Unit, Cardiology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, 08025, Spain; Universitat Autónoma de Barcelona (UAB), Barcelona, 08193, Spain.
- Cardiac Imaging Unit, Cardiology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, 08025, Spain.
- Barcelona Institute for Global Health (ISGlobal), C/ del Dr. Aiguader 88, Barcelona, 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain; Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, Barcelona, 08034, Spain. Electronic address: [email protected].
Abstract
Cine Cardiac Magnetic Resonance (cineCMR) is widely regarded as a reference standard for evaluating left-ventricular wall motion, yet interpretation remains variable and free-text reports frequently contain imprecise language that can lead to miscommunication. In this study, we present CineScribe, a novel lightweight large language model (LLM) trained on expert-annotated cineCMR reports to convert free-text clinical reports into structured regional wall-motion abnormality (RWMA) representations and support standardized report generation from validated diagnostic findings. CineScribe achieves high performance on the report-structuring task with a macro-F1 of 0.92 (95% CI, 0.89-0.94), enabling automated extraction of RWMAs from routine clinical documentation. Beyond structuration, we introduce a Report-Level Confidence (RLC) score, derived from token-level conditional probabilities of the model, to quantify uncertainty during structured information extraction. Using an enriched multi-expert annotated dataset, we show that reduced model confidence is strongly associated with report ambiguity, which in turn corresponds to diagnostically complex cineCMR cases characterized by increased inter-observer variability during expert video review. CineScribe detected ambiguous reports with good discrimination with a ROC-AUC of 0.76 (95% CI, 0.68-0.85). Human evaluation using the QUEST framework demonstrated that reports generated from expert-validated findings were clinically appropriate, with 78% (95% CI, 74%-83%) rated as both correct and complete. Together, these results show that model confidence during clinical information extraction can serve as a practical indicator of ambiguity in medical reporting, enabling targeted expert review of diagnostically complex cases and supporting more consistent and efficient cineCMR documentation.