Back to all papers

Improving diagnostic accuracy in preoperative glioma classification: performance of knowledge-enhanced large language models compared with radiologists.

November 27, 2025pubmed logopapers

Authors

Li S,Fang X,Jin Y,Deng Y,Hu W,Wu B,Zhou X,Wang G,Li K,Yue Q

Affiliations (6)

  • Department of Radiology, West China Hospital, Sichuan University, Chengdu, China.
  • West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
  • School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA.
  • School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China.
  • West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China. [email protected].
  • Department of Radiology, West China Hospital, Sichuan University, Chengdu, China. [email protected].

Abstract

Accurate preoperative MRI classification of gliomas is essential but challenging due to complex radiological features and inter-observer variability. This study evaluated three large language models (LLMs) for VASARI-based glioma classification compared to radiologist interpretations. We retrospectively analyzed 150 histopathologically confirmed gliomas (43 circumscribed astrocytic, 53 high-grade diffuse, 54 low-grade diffuse gliomas) using standardized MRI protocols. Three radiologists extracted VASARI features, while three LLMs (GPT-4, Claude3.5-Sonnet, Claude3.0-Opus) analyzed these features using standard input-output or knowledge-enhanced prompting incorporating diagnostic guidelines. Knowledge-enhanced prompting consistently outperformed standard prompting, improving diagnostic consistency (intra-model agreement: Sonnet κ = 0.91, Opus κ = 0.92, GPT-4 κ = 0.72). For diffuse versus circumscribed classification, senior radiologists (AUC = 0.88) and Claude3.5-Sonnet with knowledge-enhanced prompting (AUC = 0.84) performed similarly (p > 0.05). LLM assistance significantly improved junior radiologists' performance, with AUC increases from 0.77 to 0.83 (p = 0.026). Knowledge-enhanced LLMs demonstrate diagnostic performance comparable to experienced radiologists and improve junior accuracy, suggesting potential as decision-support tools requiring radiologist oversight.

Topics

Journal Article

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.