Back to all papers

Retrieval-augmented generation-enhanced large language models for comprehensive CAD-RADS 2.0 categorization from structured coronary CTA reports.

February 20, 2026pubmed logopapers

Authors

Kaba E,Çubukçu Y,Uzunibrahimoğlu B,Yılmaz YE,Çınar M,Tabakoğlu S,Bal EM,Beydüz YS,Solak M,Oğuz E,Varlık AT,Malkoç G,Beyazal M,Celiker FB,Akkaya S

Affiliations (3)

  • Department of Radiology, Training and Research Hospital, Recep Tayyip Erdogan University, 53100, Rize, Türkiye. [email protected].
  • Department of Radiology, Training and Research Hospital, Recep Tayyip Erdogan University, 53100, Rize, Türkiye.
  • Department of Radiology, Karadeniz Technical University School of Medicine, Trabzon, Türkiye.

Abstract

To evaluate the performance of large language models (LLMs), including retrieval-augmented generation (RAG)-based approaches, in extracting components and management recommendations from structured coronary computed tomography angiography (CCTA) reports according to the Coronary Artery Disease Reporting and Data System (CAD-RADS 2.0). A total of 320 fully structured CCTA reports were analyzed using LLM. Closed-source standard ChatGPT‑5, NotebookLM (RAG-based model), and a RAG-adapted ChatGPT‑5 model (ChatGPT-5-RAG) were used. Each model extracted the CAD-RADS category, plaque burden, presence of high-risk plaque (HRP), other modifiers, full score, and management recommendations in accordance with the CAD-RADS 2.0 guidelines. We compared LLM outputs with reference standards determined by two expert cardiovascular radiologists. ChatGPT-5-RAG showed the highest accuracy for CAD-RADS classification (0.959, 95% CI: 0.932-0.976), plaque burden (0.912, 95% CI: 0.876-0.939), HRP detection (0.988, 95% CI: 0.968-0.995), other modifiers (0.950, 95% CI: 0.920-0.969), and full score (0.828, 95% CI: 0.783-0.866). Closed-source ChatGPT‑5 showed the weakest performance across all components. Significant statistical differences were found among the three models (p < 0.001). Management recommendations were qualitatively rated on a three-point Likert scale; although agreement between models was low, ChatGPT-5-RAG and NotebookLM performed almost perfectly (median 3 points). This study demonstrates that RAG-enhanced LLMs significantly improve accuracy and reliability in extracting CAD-RADS 2.0 components and generating clinical management recommendations. The findings highlight the potential of RAG-based LLMs as innovative, explainable tools for automated and standardized CCTA reporting in clinical radiology workflows.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 11k+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.