Back to all papers

Improving diagnostic accuracy in preoperative glioma classification: performance of knowledge-enhanced large language models compared with radiologists.

November 27, 2025pubmed logopapers

Authors

Li S,Fang X,Jin Y,Deng Y,Hu W,Wu B,Zhou X,Wang G,Li K,Yue Q

Affiliations (6)

  • Department of Radiology, West China Hospital, Sichuan University, Chengdu, China.
  • West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
  • School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA.
  • School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China.
  • West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China. [email protected].
  • Department of Radiology, West China Hospital, Sichuan University, Chengdu, China. [email protected].

Abstract

Accurate preoperative MRI classification of gliomas is essential but challenging due to complex radiological features and inter-observer variability. This study evaluated three large language models (LLMs) for VASARI-based glioma classification compared to radiologist interpretations. We retrospectively analyzed 150 histopathologically confirmed gliomas (43 circumscribed astrocytic, 53 high-grade diffuse, 54 low-grade diffuse gliomas) using standardized MRI protocols. Three radiologists extracted VASARI features, while three LLMs (GPT-4, Claude3.5-Sonnet, Claude3.0-Opus) analyzed these features using standard input-output or knowledge-enhanced prompting incorporating diagnostic guidelines. Knowledge-enhanced prompting consistently outperformed standard prompting, improving diagnostic consistency (intra-model agreement: Sonnet κ = 0.91, Opus κ = 0.92, GPT-4 κ = 0.72). For diffuse versus circumscribed classification, senior radiologists (AUC = 0.88) and Claude3.5-Sonnet with knowledge-enhanced prompting (AUC = 0.84) performed similarly (p > 0.05). LLM assistance significantly improved junior radiologists' performance, with AUC increases from 0.77 to 0.83 (p = 0.026). Knowledge-enhanced LLMs demonstrate diagnostic performance comparable to experienced radiologists and improve junior accuracy, suggesting potential as decision-support tools requiring radiologist oversight.

Topics

Journal Article

Ready to Sharpen Your Edge?

Subscribe to join 7,600+ peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.