Improving diagnostic accuracy in preoperative glioma classification: performance of knowledge-enhanced large language models compared with radiologists.
Authors
Affiliations (6)
Affiliations (6)
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China.
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA.
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China.
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China. [email protected].
- Department of Radiology, West China Hospital, Sichuan University, Chengdu, China. [email protected].
Abstract
Accurate preoperative MRI classification of gliomas is essential but challenging due to complex radiological features and inter-observer variability. This study evaluated three large language models (LLMs) for VASARI-based glioma classification compared to radiologist interpretations. We retrospectively analyzed 150 histopathologically confirmed gliomas (43 circumscribed astrocytic, 53 high-grade diffuse, 54 low-grade diffuse gliomas) using standardized MRI protocols. Three radiologists extracted VASARI features, while three LLMs (GPT-4, Claude3.5-Sonnet, Claude3.0-Opus) analyzed these features using standard input-output or knowledge-enhanced prompting incorporating diagnostic guidelines. Knowledge-enhanced prompting consistently outperformed standard prompting, improving diagnostic consistency (intra-model agreement: Sonnet κ = 0.91, Opus κ = 0.92, GPT-4 κ = 0.72). For diffuse versus circumscribed classification, senior radiologists (AUC = 0.88) and Claude3.5-Sonnet with knowledge-enhanced prompting (AUC = 0.84) performed similarly (p > 0.05). LLM assistance significantly improved junior radiologists' performance, with AUC increases from 0.77 to 0.83 (p = 0.026). Knowledge-enhanced LLMs demonstrate diagnostic performance comparable to experienced radiologists and improve junior accuracy, suggesting potential as decision-support tools requiring radiologist oversight.