Sort by:
Page 2 of 90896 results

Performance of GPT-5 in Brain Tumor MRI Reasoning

Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang

arxiv logopreprintAug 14 2025
Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.

SingleStrip: learning skull-stripping from a single labeled example

Bella Specktor-Fadida, Malte Hoffmann

arxiv logopreprintAug 14 2025
Deep learning segmentation relies heavily on labeled data, but manual labeling is laborious and time-consuming, especially for volumetric images such as brain magnetic resonance imaging (MRI). While recent domain-randomization techniques alleviate the dependency on labeled data by synthesizing diverse training images from label maps, they offer limited anatomical variability when very few label maps are available. Semi-supervised self-training addresses label scarcity by iteratively incorporating model predictions into the training set, enabling networks to learn from unlabeled data. In this work, we combine domain randomization with self-training to train three-dimensional skull-stripping networks using as little as a single labeled example. First, we automatically bin voxel intensities, yielding labels we use to synthesize images for training an initial skull-stripping model. Second, we train a convolutional autoencoder (AE) on the labeled example and use its reconstruction error to assess the quality of brain masks predicted for unlabeled data. Third, we select the top-ranking pseudo-labels to fine-tune the network, achieving skull-stripping performance on out-of-distribution data that approaches models trained with more labeled images. We compare AE-based ranking to consistency-based ranking under test-time augmentation, finding that the AE approach yields a stronger correlation with segmentation accuracy. Our results highlight the potential of combining domain randomization and AE-based quality control to enable effective semi-supervised segmentation from extremely limited labeled data. This strategy may ease the labeling burden that slows progress in studies involving new anatomical structures or emerging imaging techniques.

Automated Segmentation of Coronal Brain Tissue Slabs for 3D Neuropathology

Jonathan Williams Ramirez, Dina Zemlyanker, Lucas Deden-Binder, Rogeny Herisse, Erendira Garcia Pallares, Karthik Gopinath, Harshvardhan Gazula, Christopher Mount, Liana N. Kozanno, Michael S. Marshall, Theresa R. Connors, Matthew P. Frosch, Mark Montine, Derek H. Oakley, Christine L. Mac Donald, C. Dirk Keene, Bradley T. Hyman, Juan Eugenio Iglesias

arxiv logopreprintAug 13 2025
Advances in image registration and machine learning have recently enabled volumetric analysis of \emph{postmortem} brain tissue from conventional photographs of coronal slabs, which are routinely collected in brain banks and neuropathology laboratories worldwide. One caveat of this methodology is the requirement of segmentation of the tissue from photographs, which currently requires costly manual intervention. In this article, we present a deep learning model to automate this process. The automatic segmentation tool relies on a U-Net architecture that was trained with a combination of \textit{(i)}1,414 manually segmented images of both fixed and fresh tissue, from specimens with varying diagnoses, photographed at two different sites; and \textit{(ii)}~2,000 synthetic images with randomized contrast and corresponding masks generated from MRI scans for improved generalizability to unseen photographic setups. Automated model predictions on a subset of photographs not seen in training were analyzed to estimate performance compared to manual labels -- including both inter- and intra-rater variability. Our model achieved a median Dice score over 0.98, mean surface distance under 0.4~mm, and 95\% Hausdorff distance under 1.60~mm, which approaches inter-/intra-rater levels. Our tool is publicly available at surfer.nmr.mgh.harvard.edu/fswiki/PhotoTools.

Automatic detection of arterial input function for brain DCE-MRI in multi-site cohorts.

Saca L, Gaggar R, Pappas I, Benzinger T, Reiman EM, Shiroishi MS, Joe EB, Ringman JM, Yassine HN, Schneider LS, Chui HC, Nation DA, Zlokovic BV, Toga AW, Chakhoyan A, Barnes S

pubmed logopapersAug 13 2025
Arterial input function (AIF) extraction is a crucial step in quantitative pharmacokinetic modeling of DCE-MRI. This work proposes a robust deep learning model that can precisely extract an AIF from DCE-MRI images. A diverse dataset of human brain DCE-MRI images from 289 participants, totaling 384 scans, from five different institutions with extracted gadolinium-based contrast agent curves from large penetrating arteries, and with most data collected for blood-brain barrier (BBB) permeability measurement, was retrospectively analyzed. A 3D UNet model was implemented and trained on manually drawn AIF regions. The testing cohort was compared using proposed AIF quality metric AIFitness and K<sup>trans</sup> values from a standard DCE pipeline. This UNet was then applied to a separate dataset of 326 participants with a total of 421 DCE-MRI images with analyzed AIF quality and K<sup>trans</sup> values. The resulting 3D UNet model achieved an average AIFitness score of 93.9 compared to 99.7 for manually selected AIFs, and white matter K<sup>trans</sup> values were 0.45/min × 10<sup>-3</sup> and 0.45/min × 10<sup>-3</sup>, respectively. The intraclass correlation between automated and manual K<sup>trans</sup> values was 0.89. The separate replication dataset yielded an AIFitness score of 97.0 and white matter K<sup>trans</sup> of 0.44/min × 10<sup>-3</sup>. Findings suggest a 3D UNet model with additional convolutional neural network kernels and a modified Huber loss function achieves superior performance for identifying AIF curves from DCE-MRI in a diverse multi-center cohort. AIFitness scores and DCE-MRI-derived metrics, such as K<sup>trans</sup> maps, showed no significant differences in gray and white matter between manually drawn and automated AIFs.

Multimodal ensemble machine learning predicts neurological outcome within three hours after out of hospital cardiac arrest.

Kawai Y, Yamamoto K, Tsuruta K, Miyazaki K, Asai H, Fukushima H

pubmed logopapersAug 13 2025
This study aimed to determine if an ensemble (stacking) model that integrates three independently developed base models can reliably predict patients' neurological outcomes following out-of-hospital cardiac arrest (OHCA) within 3 h of arrival and outperform each individual model. This retrospective study included patients with OHCA (≥ 18 years) admitted directly to Nara Medical University between April 2015 and March 2024 who remained comatose for ≥ 3 h after arrival and had suitable head computed tomography (CT) images. The area under the receiver operating characteristic curve (AUC) and Briers scores were used to evaluate the performance of four models (resuscitation-related background OHCA score factors, bilateral pupil diameter, single-slice head CT within 3 h of arrival, and an ensemble stacked model combining these three models) in predicting favourable neurological outcomes at hospital discharge or 1 month, as defined by a Cerebral Performance Category scale of 1-2. Among 533 patients, 82 (15%) had favourable outcomes. The OHCA, pupil, and head CT models yielded AUCs of 0.76, 0.65, and 0.68 with Brier scores of 0.11, 0.13, and 0.12, respectively. The ensemble model outperformed the other models (AUC, 0.82; Brier score, 0.10), thereby supporting its application for early clinical decision-making and optimising resource allocation.

Machine Learning-Driven Radiomic Profiling of Thalamus-Amygdala Nuclei for Prediction of Postoperative Delirium After STN-DBS in Parkinson's Disease Patients: A Pilot Study.

Radziunas A, Davidavicius G, Reinyte K, Pranckeviciene A, Fedaravicius A, Kucinskas V, Laucius O, Tamasauskas A, Deltuva V, Saudargiene A

pubmed logopapersAug 13 2025
Postoperative delirium is a common complication following sub-thalamic nucleus deep brain stimulation surgery in Parkinson's disease patients. Postoperative delirium has been shown to prolong hospital stays, harm cognitive function, and negatively impact outcomes. Utilizing radiomics as a predictive tool for identifying patients at risk of delirium is a novel and personalized approach. This pilot study analyzed preoperative T1-weighted and T2-weighted magnetic resonance images from 34 Parkinson's disease patients, which were used to segment the thalamus, amygdala, and hippocampus, resulting in 10,680 extracted radiomic features. Feature selection using the minimum redundancy maximal relevance method identified the 20 most informative features, which were input into eight different machine learning algorithms. A high predictive accuracy of postoperative delirium was achieved by applying regularized binary logistic regression and linear discriminant analysis and using 10 most informative radiomic features. Regularized logistic regression resulted in 96.97% (±6.20) balanced accuracy, 99.5% (±4.97) sensitivity, 94.43% (±10.70) specificity, and area under the receiver operating characteristic curve of 0.97 (±0.06). Linear discriminant analysis showed 98.42% (±6.57) balanced accuracy, 98.00% (±9.80) sensitivity, 98.83% (±4.63) specificity, and area under the receiver operating characteristic curve of 0.98 (±0.07). The feed-forward neural network also demonstrated strong predictive capacity, achieving 96.17% (±10.40) balanced accuracy, 94.5% (±19.87) sensitivity, 97.83% (±7.87) specificity, and an area under the receiver operating characteristic curve of 0.96 (±0.10). However, when the feature set was extended to 20 features, both logistic regression and linear discriminant analysis showed reduced performance, while the feed-forward neural network achieved the highest predictive accuracy of 99.28% (±2.71), with 100.0% (±0.00) sensitivity, 98.57% (±5.42) specificity, and an area under the receiver operating characteristic curve of 0.99 (±0.03). Selected radiomic features might indicate network dysfunction between thalamic laterodorsal, reuniens medial ventral, and amygdala basal nuclei with hippocampus cornu ammonis 4 in these patients. This finding expands previous research suggesting the importance of the thalamic-hippocampal-amygdala network for postoperative delirium due to alterations in neuronal activity.

Current imaging applications, radiomics, and machine learning modalities of CNS demyelinating disorders and its mimickers.

Alam Z, Maddali A, Patel S, Weber N, Al Rikabi S, Thiemann D, Desai K, Monoky D

pubmed logopapersAug 12 2025
Distinguishing among neuroinflammatory demyelinating diseases of the central nervous system can present a significant diagnostic challenge due to substantial overlap in clinical presentations and imaging features. Collaboration between specialists, novel antibody testing, and dedicated magnetic resonance imaging protocols have helped to narrow the diagnostic gap, but challenging cases remain. Machine learning algorithms have proven to be able to identify subtle patterns that escape even the most experienced human eye. Indeed, machine learning and the subfield of radiomics have demonstrated exponential growth and improvement in diagnosis capacity within the past decade. The sometimes daunting diagnostic overlap of various demyelinating processes thus provides a unique opportunity: can the elite pattern recognition powers of machine learning close the gap in making the correct diagnosis? This review specifically focuses on neuroinflammatory demyelinating diseases, exploring the role of artificial intelligence in the detection, diagnosis, and differentiation of the most common pathologies: multiple sclerosis (MS), neuromyelitis optica spectrum disorder (NMOSD), acute disseminated encephalomyelitis (ADEM), Sjogren's syndrome, MOG antibody-associated disorder (MOGAD), and neuropsychiatric systemic lupus erythematosus (NPSLE). Understanding how these tools enhance diagnostic precision may lead to earlier intervention, improved outcomes, and optimized management strategies.

Development and validation of machine learning models to predict vertebral artery injury by C2 pedicle screws.

Ye B, Sun Y, Chen G, Wang B, Meng H, Shan L

pubmed logopapersAug 12 2025
Cervical 2 pedicle screw (C2PS) fixation is widely used in posterior cervical surgery but carries risks of vertebral artery injury (VAI), a rare yet severe complication. This study aimed to identify risk factors for VAI during C2PS placement and develop a machine learning (ML)-based predictive model to enhance preoperative risk assessment. Clinical and radiological data from 280 patients undergoing head and neck CT angiography were retrospectively analyzed. Three-dimensional reconstructed images simulated C2PS placement, classifying patients into injury (n = 98) and non-injury (n = 182) groups. Fifteen variables, including characteristic of patients and anatomic variables were evaluated. Eight ML algorithms were trained (70% training cohort) and validated (30% validation cohort). Model performance was assessed using AUC, sensitivity, specificity, and SHAP (SHapley Additive exPlanations) for interpretability. Six key risk factors were identified: pedicle diameter, high-riding vertebral artery (HRVA), intra-axial vertebral artery (IAVA), vertebral artery diameter (VAD), distance between the transverse foramen and the posterior end of the vertebral body (TFPEVB) and distance between the vertebral artery and the vertebral body (VAVB). The neural network model (NNet) demonstrated optimal predictive performance, achieving AUCs of 0.929 (training) and 0.936 (validation). SHAP analysis confirmed these variables as primary contributors to VAI risk. This study established an ML-driven predictive model for VAI during C2PS placement, highlighting six critical anatomical and radiological risk factors. Integrating this model into clinical workflows may optimize preoperative planning, reduce complications, and improve surgical outcomes. External validation in multicenter cohorts is warranted to enhance generalizability.

Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction

Joseph Paillard, Antoine Collas, Denis A. Engemann, Bertrand Thirion

arxiv logopreprintAug 12 2025
Recent advances in machine learning have greatly expanded the repertoire of predictive methods for medical imaging. However, the interpretability of complex models remains a challenge, which limits their utility in medical applications. Recently, model-agnostic methods have been proposed to measure conditional variable importance and accommodate complex non-linear models. However, they often lack power when dealing with highly correlated data, a common problem in medical imaging. We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables that are jointly predictive of the outcome. By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control. Moreover, we address the issue of vanishing conditional importance under high correlation with a tree-based importance allocation mechanism. We benchmarked Hierarchical-CPI against state-of-the-art variable importance methods. Its effectiveness is demonstrated in two neuroimaging datasets: classifying dementia diagnoses from MRI data (ADNI dataset) and analyzing the Berger effect on EEG data (TDBRAIN dataset), identifying biologically plausible variables.

Dynamic Survival Prediction using Longitudinal Images based on Transformer

Bingfan Liu, Haolun Shi, Jiguo Cao

arxiv logopreprintAug 12 2025
Survival analysis utilizing multiple longitudinal medical images plays a pivotal role in the early detection and prognosis of diseases by providing insight beyond single-image evaluations. However, current methodologies often inadequately utilize censored data, overlook correlations among longitudinal images measured over multiple time points, and lack interpretability. We introduce SurLonFormer, a novel Transformer-based neural network that integrates longitudinal medical imaging with structured data for survival prediction. Our architecture comprises three key components: a Vision Encoder for extracting spatial features, a Sequence Encoder for aggregating temporal information, and a Survival Encoder based on the Cox proportional hazards model. This framework effectively incorporates censored data, addresses scalability issues, and enhances interpretability through occlusion sensitivity analysis and dynamic survival prediction. Extensive simulations and a real-world application in Alzheimer's disease analysis demonstrate that SurLonFormer achieves superior predictive performance and successfully identifies disease-related imaging biomarkers.
Page 2 of 90896 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.