Back to all papers

CineScribe: LLM-based detection and mitigation of ambiguity in cine cardiac magnetic resonance reports.

June 20, 2026pubmed logopapers

Authors

Villanueva Benito G,Descalzo ML,Pujadas S,Fernandez J,Calandrelli M,Petrone P

Affiliations (4)

  • Barcelona Institute for Global Health (ISGlobal), C/ del Dr. Aiguader 88, Barcelona, 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain.
  • Cardiac Imaging Unit, Cardiology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, 08025, Spain; Universitat Autónoma de Barcelona (UAB), Barcelona, 08193, Spain.
  • Cardiac Imaging Unit, Cardiology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, 08025, Spain.
  • Barcelona Institute for Global Health (ISGlobal), C/ del Dr. Aiguader 88, Barcelona, 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona, 08003, Spain; Barcelona Supercomputing Center, Edifici Omega 201, Jordi Girona 1 and 3, Barcelona, 08034, Spain. Electronic address: [email protected].

Abstract

Cine Cardiac Magnetic Resonance (cineCMR) is widely regarded as a reference standard for evaluating left-ventricular wall motion, yet interpretation remains variable and free-text reports frequently contain imprecise language that can lead to miscommunication. In this study, we present CineScribe, a novel lightweight large language model (LLM) trained on expert-annotated cineCMR reports to convert free-text clinical reports into structured regional wall-motion abnormality (RWMA) representations and support standardized report generation from validated diagnostic findings. CineScribe achieves high performance on the report-structuring task with a macro-F1 of 0.92 (95% CI, 0.89-0.94), enabling automated extraction of RWMAs from routine clinical documentation. Beyond structuration, we introduce a Report-Level Confidence (RLC) score, derived from token-level conditional probabilities of the model, to quantify uncertainty during structured information extraction. Using an enriched multi-expert annotated dataset, we show that reduced model confidence is strongly associated with report ambiguity, which in turn corresponds to diagnostically complex cineCMR cases characterized by increased inter-observer variability during expert video review. CineScribe detected ambiguous reports with good discrimination with a ROC-AUC of 0.76 (95% CI, 0.68-0.85). Human evaluation using the QUEST framework demonstrated that reports generated from expert-validated findings were clinically appropriate, with 78% (95% CI, 74%-83%) rated as both correct and complete. Together, these results show that model confidence during clinical information extraction can serve as a practical indicator of ambiguity in medical reporting, enabling targeted expert review of diagnostically complex cases and supporting more consistent and efficient cineCMR documentation.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.