Back to all papers

Effect of ChatGPT-Assisted Reflective Reasoning on Guideline-Concordant Procedural Decision-Making Among Early-Career Interventional Radiologists.

January 30, 2026pubmed logopapers

Authors

Yasar Y,Demir M,Canturk A,Ozyilmaz S,Turgan AH,Agackaya Y

Affiliations (3)

  • University of Health Sciences, Umraniye Training and Research Hospital, Department of Radiology, Istanbul, Turkey (Y.Y, M.D., S.O., A.H.T., Y.A.). Electronic address: [email protected].
  • University of Health Sciences, Umraniye Training and Research Hospital, Department of Radiology, Istanbul, Turkey (Y.Y, M.D., S.O., A.H.T., Y.A.).
  • University of Health Sciences, Sultan 2. Abdul Hamid Khan Educational and Research Hospital, Department of Radiology, Istanbul, Turkey (A.C.).

Abstract

This study aims to evaluate the effect of ChatGPT-assisted reflective reasoning on guideline-concordant procedural decision-making among early-career interventional radiologists using standardized clinical scenarios based on the American College of Radiology Appropriateness Criteria. This prospective simulation-based study included 128 scenarios across common interventional radiology indications. Two expert interventional radiologists served as the reference standard. Three early-career radiologists completed all scenarios twice: first independently (pre-ChatGPT) and, after a two-month washout period, with access to ChatGPT-generated reasoning before recording final decisions (post-ChatGPT). Guideline concordance was assessed using a three-tier scoring system (appropriate = 2, may be appropriate = 1, inappropriate = 0) and a binary score reflecting avoidance of inappropriate decisions. Predifferences and postdifferences were analyzed with Wilcoxon signed-rank and McNemar tests. Agreement with experts was measured using Cohen's kappa. ChatGPT-assisted reflective reasoning significantly improved guideline-concordant decision-making. The mean detailed compliance score increased from 1.697 to 1.900, and minimal compliance enhanced from 90.89% to 98.70%. A total of 30 scenario-level corrections shifted from inappropriate to guideline-concordant selections (McNemar χ² = 27.03; p < 0.0001). Detailed compliance improved significantly for all radiologists (p < 0.01). Weighted Cohen's kappa increased from 0.08-0.13 to 0.21-0.30, indicating better agreement with expert consensus. Performance variability decreased, narrowing the gap between early-career radiologists and experts. ChatGPT-assisted reflective reasoning enhanced guideline alignment and reduced inappropriate procedural selections among early-career interventional radiologists. These findings support the role of large language models as cognitive support tools during early clinical practice and warrant prospective evaluation in real-world settings.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 9,500+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.