Vision-language model for report generation and outcome prediction in CT pulmonary angiogram.

Authors

Zhong Z,Wang Y,Wu J,Hsu WC,Somasundaram V,Bi L,Kulkarni S,Ma Z,Collins S,Baird G,Ahn SH,Feng X,Kamel I,Lin CT,Greineder C,Atalay M,Jiao Z,Bai H

Affiliations (13)

  • Department of Diagnostic Imaging, Brown University Health, Providence, RI, USA.
  • Warren Alpert Medical School of Brown University, Providence, RI, USA.
  • Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
  • Second Xiangya Hospital, Central South University, Changsha, Hunan, China.
  • Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan, ROC.
  • Department of Radiology and Radiological Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
  • Carina AI, Lexington, KY, USA.
  • Department of Radiology, University of Colorado School of Medicine, Aurora, CO, USA.
  • Department of Emergency Medicine and Department of Pharmacology, University of Michigan, Ann Arbor, MI, USA.
  • Department of Diagnostic Imaging, Brown University Health, Providence, RI, USA. [email protected].
  • Warren Alpert Medical School of Brown University, Providence, RI, USA. [email protected].
  • Department of Radiology and Radiological Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA. [email protected].
  • Department of Radiology, University of Colorado School of Medicine, Aurora, CO, USA. [email protected].

Abstract

Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.