Predicting molecular types of adult-type diffuse gliomas based on MRI reports with large language models.
Authors
Affiliations (15)
Affiliations (15)
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea.
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea.
- Department of Psychiatry, Yonsei University College of Medicine, Seoul, Korea.
- Institute of Behavioral Science in Medicine, Yonsei University College of Medicine, Seoul, Korea.
- Department of Radiology, Seoul National University Hospital, Seoul, Korea.
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Seoul, Korea.
- Department of Neurosurgery, Yonsei University College of Medicine, Seoul, Korea.
- Department of Pathology, Yonsei University College of Medicine, Seoul, Korea.
- Division for Computational Radiology & Clinical AI (CCIBonn.ai), Clinic for Neuroradiology, University Hospital Bonn, Bonn, Germany.
- Medical Faculty Bonn, University of Bonn, Bonn, Germany.
- Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany.
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea. [email protected].
- Department of Psychiatry, Yonsei University College of Medicine, Seoul, Korea. [email protected].
- Institute of Behavioral Science in Medicine, Yonsei University College of Medicine, Seoul, Korea. [email protected].
- Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea. [email protected].
Abstract
To evaluate the performance of large language models (LLMs) in predicting molecular types of adult-type diffuse gliomas according to the 2021 WHO classification using MRI radiology reports. This retrospective study included 2169 patients diagnosed with adult-type diffuse gliomas (294 oligodendrogliomas, 295 IDH-mutant astrocytomas, and 1580 IDH-wildtype glioblastomas) between July 2005 and March 2024 from four hospitals in Asia and Europe. Seven proprietary and open-source LLMs were assessed: GPT-4o-mini, GPT-4.1-mini, Llama 3.1 8B, Llama 3.1 70B, Qwen2.5 7B, Deepseek-r1 8B, and Mistal 7B. The performance of LLMs in classifying molecular types was compared based on the provision of relevant knowledge of glioma imaging findings (knowledge-based vs. naïve prompt). The impact of radiologists' subspecialization in neuro-oncology, report quality, and reporting language on LLMs' performance was also evaluated. LLMs achieved significantly higher (naïve vs. knowledge-based; GPT-4o-mini, 77.0% vs. 79.1%, p < 0.001; Qwen2.5 7B, 75.9% vs. 79.5%, p < 0.001; Deepseek-r1 8B, 66.0% vs. 73.2%, p < 0.001) or comparable accuracy (GPT-4.1-mini, 78.7% vs. 78.6%; Llama 3.1 70B, 78.0% vs. 78.1%; Mistral 7B, 58.4% vs. 57.4%) using knowledge-based prompt compared to naïve prompt, except for Llama 3.1 8B (65.4% vs. 44.6%, p < 0.001). Differences in accuracy were more pronounced in smaller-sized LLMs. Additionally, the accuracy was significantly higher with reports by neuro-oncology specialists and high-quality reports in all LLMs (p < 0.001). LLMs may provide preoperative information on the tumor types of adult-type diffuse gliomas from MRI reports by providing relevant knowledge in the prompt. Informative and descriptive reports could further enhance LLMs' performance. Question Our study aimed to evaluate large language models' (LLMs) ability to efficiently predict molecular types of adult-type diffuse gliomas according to the 2021 WHO classification. Findings Larger models generally showed better accuracy and were less sensitive to domain-specific knowledge. Their performance improved when using high-quality, longer reports or reports by neuro-oncology specialists. Clinical relevance These findings highlight the potential role of LLMs in predicting glioma molecular types, underscoring the importance of informative and descriptive reports in enhancing their performance.