Latest Papers on Radiology AI. Tags: Classification

Radiomics and machine learning for predicting valve vegetation in infective endocarditis: a comparative analysis of mitral and aortic valves using TEE imaging.

Esmaely F, Moradnejad P, Boudagh S, Bitarafan-Rajabi A

•papers•Jun 12 2025

Detecting valve vegetation in infective endocarditis (IE) poses challenges, particularly with mechanical valves, because acoustic shadowing artefacts often obscure critical diagnostic details. This study aimed to classify native and prosthetic mitral and aortic valves with and without vegetation using radiomics and machine learning. 286 TEE scans from suspected IE cases (August 2023-November 2024) were analysed alongside 113 rejected IE as control cases. Frames were preprocessed using the Extreme Total Variation Bilateral (ETVB) filter, and radiomics features were extracted for classification using machine learning models, including Random Forest, Decision Tree, SVM, k-NN, and XGBoost. in order to evaluate the models, AUC, ROC curves, and Decision Curve Analysis (DCA) were used. For native mitral valves, SVM achieved the highest performance with an AUC of 0.88, a sensitivity of 0.91, and a specificity of 0.87. Mechanical mitral valves also showed optimal results with SVM (AUC: 0.85, sensitivity: 0.73, specificity: 0.92). Native aortic valves were best classified using SVM (AUC: 0.86, sensitivity: 0.87, specificity: 0.86), while Random Forest excelled for mechanical aortic valves (AUC: 0.81, sensitivity: 0.89, specificity: 0.78). These findings suggest that combining the models with the clinician's report may enhance the diagnostic accuracy of TEE, particularly in the absence of advanced imaging methods like PET/CT.

Ultrasound Classification Cardiac Retrospective Clinical In Silico Academic Lab

Summary Report of the SNMMI AI Task Force Radiomics Challenge 2024.

Boellaard R, Rahmim A, Eertink JJ, Duehrsen U, Kurch L, Lugtenburg PJ, Wiegers SE, Zwezerijnen GJC, Zijlstra JM, Heymans MW, Buvat I

•papers•Jun 12 2025

In medical imaging, challenges are competitions that aim to provide a fair comparison of different methodologic solutions to a common problem. Challenges typically focus on addressing real-world problems, such as segmentation, detection, and prediction tasks, using various types of medical images and associated data. Here, we describe the organization and results of such a challenge to compare machine-learning models for predicting survival in patients with diffuse large B-cell lymphoma using a baseline 18F-FDG PET/CT radiomics dataset. Methods: This challenge aimed to predict progression-free survival (PFS) in patients with diffuse large B-cell lymphoma, either as a binary outcome (shorter than 2 y versus longer than 2 y) or as a continuous outcome (survival in months). All participants were provided with a radiomic training dataset, including the ground truth survival for designing a predictive model and a radiomic test dataset without ground truth. Figures of merit (FOMs) used to assess model performance were the root-mean-square error for continuous outcomes and the C-index for 1-, 2-, and 3-y PFS binary outcomes. The challenge was endorsed and initiated by the Society of Nuclear Medicine and Molecular Imaging AI Task Force. Results: Nineteen models for predicting PFS as a continuous outcome from 15 teams were received. Among those models, external validation identified 6 models showing similar performance to that of a simple general linear reference model using SUV and total metabolic tumor volumes (TMTV) only. Twelve models for predicting binary outcomes were submitted by 9 teams. External validation showed that 1 model had higher, but nonsignificant, C-index values compared with values obtained by a simple logistic regression model using SUV and TMTV. Conclusion: Some of the radiomic-based machine-learning models developed by participants showed better FOMs than did simple linear or logistic regression models based on SUV and TMTV only, although the differences in observed FOMs were nonsignificant. This suggests that, for the challenge dataset, there was limited or no value seen from the addition of sophisticated radiomic features and use of machine learning when developing models for outcome prediction.

Mixed Modality Classification Retrospective Clinical In Silico Consortium Benchmark SOTA Open Dataset

Machine Learning-Based Prediction of Delayed Neurological Sequelae in Carbon Monoxide Poisoning Using Automatically Extracted MR Imaging Features.

Lee GY, Sohn CH, Kim D, Jeon SB, Yun J, Ham S, Nam Y, Yum J, Kim WY, Kim N

•papers•Jun 12 2025

Delayed neurological sequelae are among the most serious complications of carbon monoxide poisoning. However, no reliable tools are available for evaluating its potential risk. We aimed to assess whether machine learning models using imaging features that were automatically extracted from brain MRI can predict the potential delayed neurological sequelae risk in patients with acute carbon monoxide poisoning. This single-center, retrospective, observational study analyzed a prospectively collected registry of acute carbon monoxide poisoning patients who visited our emergency department from April 2011 to December 2015. Overall, 1618 radiomics and 4 lesion-segmentation features from DWI b1000 and ADC images, as well as 62 clinical variables were extracted from each patient. The entire dataset was divided into five subsets, with one serving as the hold-out test set and the remaining four used for training and tuning. Four machine learning models, linear regression, support vector machine, random forest, and extreme gradient boosting, as well as an ensemble model, were trained and evaluated using 20 different data configurations. The primary evaluation metric was the mean and 95% CI of the area under the receiver operating characteristic curve. Shapley additive explanations were calculated and visualized to enhance model interpretability. Of the 373 patients, delayed neurological sequelae occurred in 99 (26.5%) patients (mean age 43.0 ± 15.2; 62.0% male). The means [95% CIs] of the area under the receiver operating characteristic curve, accuracy, sensitivity, and specificity of the best performing machine learning model for predicting the development of delayed neurological sequelae were 0.88 [0.86-0.9], 0.82 [0.8-0.83], 0.81 [0.79-0.83], and 0.82 [0.8-0.84], respectively. Among imaging features, the presence, size, and number of acute brain lesions on DWI b1000 and ADC images more accurately predicted DNS risk than advanced radiomics features based on shape, texture and wavelet transformation. Machine learning models developed using automatically extracted brain MRI features with clinical features can distinguish patients at delayed neurological sequelae risk. The models enable effective prediction of delayed neurological sequelae in patients with acute carbon monoxide poisoning, facilitating timely treatment planning for prevention. ABL = Acute brain lesion; AUROC = area under the receiver operating characteristic curve; CO = carbon monoxide; DNS = delayed neurological sequelae; LR = logistic regression; ML = machine learning; RF = random forest; SVM = support vector machine; XGBoost = extreme gradient boosting.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Improving the Robustness of Deep Learning Models in Predicting Hematoma Expansion from Admission Head CT.

Tran AT, Abou Karam G, Zeevi D, Qureshi AI, Malhotra A, Majidi S, Murthy SB, Park S, Kontos D, Falcone GJ, Sheth KN, Payabvash S

•papers•Jun 12 2025

Robustness against input data perturbations is essential for deploying deep learning models in clinical practice. Adversarial attacks involve subtle, voxel-level manipulations of scans to increase deep learning models' prediction errors. Testing deep learning model performance on examples of adversarial images provides a measure of robustness, and including adversarial images in the training set can improve the model's robustness. In this study, we examined adversarial training and input modifications to improve the robustness of deep learning models in predicting hematoma expansion (HE) from admission head CTs of patients with acute intracerebral hemorrhage (ICH). We used a multicenter cohort of n = 890 patients for cross-validation/training, and a cohort of n = 684 consecutive patients with ICH from 2 stroke centers for independent validation. Fast gradient sign method (FGSM) and projected gradient descent (PGD) adversarial attacks were applied for training and testing. We developed and tested 4 different models to predict ≥3 mL, ≥6 mL, ≥9 mL, and ≥12 mL HE in an independent validation cohort applying receiver operating characteristics area under the curve (AUC). We examined varying mixtures of adversarial and nonperturbed (clean) scans for training as well as including additional input from the hyperparameter-free Otsu multithreshold segmentation for model. When deep learning models trained solely on clean scans were tested with PGD and FGSM adversarial images, the average HE prediction AUC decreased from 0.8 to 0.67 and 0.71, respectively. Overall, the best performing strategy to improve model robustness was training with 5:3 mix of clean and PGD adversarial scans and addition of Otsu multithreshold segmentation to model input, increasing the average AUC to 0.77 against both PGD and FGSM adversarial attacks. Adversarial training with FGSM improved robustness against similar type attack but offered limited cross-attack robustness against PGD-type images. Adversarial training and inclusion of threshold-based segmentation as an additional input can improve deep learning model robustness in prediction of HE from admission head CTs in acute ICH.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Slide-free surface histology enables rapid colonic polyp interpretation across specialties and foundation AI

Yong, A., Husna, N., Tan, K. H., Manek, G., Sim, R., Loi, R., Lee, O., Tang, S., Soon, G., Chan, D., Liang, K.

•preprint•Jun 11 2025

Colonoscopy is a mainstay of colorectal cancer screening and has helped to lower cancer incidence and mortality. The resection of polyps during colonoscopy is critical for tissue diagnosis and prevention of colorectal cancer, albeit resulting in increased resource requirements and expense. Discarding resected benign polyps without sending for histopathological processing and confirmatory diagnosis, known as the resect and discard strategy, could enhance efficiency but is not commonly practiced due to endoscopists predominant preference for pathological confirmation. The inaccessibility of histopathology from unprocessed resected tissue hampers endoscopic decisions. We show that intraprocedural fibre-optic microscopy with ultraviolet-C surface excitation (FUSE) of polyps post-resection enables rapid diagnosis, potentially complementing endoscopic interpretation and incorporating pathologist oversight. In a clinical study of 28 patients, slide-free FUSE microscopy of freshly resected polyps yielded mucosal views that greatly magnified the surface patterns observed on endoscopy and revealed previously unavailable histopathological signatures. We term this new cross-specialty readout surface histology. In blinded interpretations of 42 polyps (19 training, 23 reading) by endoscopists and pathologists of varying experience, surface histology differentiated normal/benign, low-grade dysplasia, and high-grade dysplasia and cancer, with 100% performance in classifying high/low risk. This FUSE dataset was also successfully interpreted by foundation AI models pretrained on histopathology slides, illustrating a new potential for these models to not only expedite conventional pathology tasks but also autonomously provide instant expert feedback during procedures that typically lack pathologists. Surface histology readouts during colonoscopy promise to empower endoscopist decisions and broadly enhance confidence and participation in resect and discard. One Sentence SummaryRapid microscopy of resected polyps during colonoscopy yielded accurate diagnoses, promising to enhance colorectal screening.

OCT Classification Abdominal Retrospective Clinical Clinical Pilot Academic Lab GenAI Benchmark SOTA Open Dataset

Cross-dataset Evaluation of Dementia Longitudinal Progression Prediction Models

Zhang, C., An, L., Wulan, N., Nguyen, K.-N., Orban, C., Chen, P., Chen, C., Zhou, J. H., Liu, K., Yeo, B. T. T., Alzheimer's Disease Neuroimaging Initiative,, Australian Imaging Biomarkers and Lifestyle Study of Aging,

•preprint•Jun 11 2025

IntroductionAccurately predicting Alzheimers Disease (AD) progression is useful for clinical care. The 2019 TADPOLE (The Alzheimers Disease Prediction Of Longitudinal Evolution) challenge evaluated 92 algorithms from 33 teams worldwide. Unlike typical clinical prediction studies, TADPOLE accommodates (1) variable number of observed timepoints across patients, (2) missing data across modalities and visits, and (3) prediction over an open-ended time horizon, which better reflects real-world data. However, TADPOLE only used the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset, so how well top algorithms generalize to other cohorts remains unclear. MethodsWe tested five algorithms in three external datasets covering 2,312 participants and 13,200 timepoints. The algorithms included FROG, the overall TADPOLE winner, which utilized a unique Longitudinal-to-Cross-sectional (L2C) transformation to convert variable-length longitudinal histories into feature vectors of the same length across participants (i.e., same-length feature vectors). We also considered two FROG variants. One variant unified all XGBoost models from the original FROG with a single feedforward neural network (FNN), which we referred to as L2C-FNN. We also included minimal recurrent neural networks (MinimalRNN), which was ranked second at publication time, as well as AD Course Map (AD-Map), which outperformed MinimalRNN at publication time. All five models - three FROG variants, MinimalRNN and AD-Map - were trained on ADNI and tested on the external datasets. ResultsL2C-FNN performed the best overall. In the case of predicting cognition and ventricle volume, L2C-FNN and AD-Map were the best. For clinical diagnosis prediction, L2C-FNN was the best, while AD-Map was the worst. L2C-FNN also maintained its edge over other models, regardless of the number of observed timepoints, and regardless of the prediction horizon from 0 to 6 years into the future. ConclusionsL2C-FNN shows strong potential for both short-term and long-term dementia progression prediction. Pretrained ADNI models are available: https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/predict_phenotypes/Zhang2025_L2CFNN.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Code

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

Wenlong Hou, Guangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, Shujun Wang

•preprint•Jun 11 2025

Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disease. Early and precise diagnosis of AD is crucial for timely intervention and treatment planning to alleviate the progressive neurodegeneration. However, most existing methods rely on single-modality data, which contrasts with the multifaceted approach used by medical experts. While some deep learning approaches process multi-modal data, they are limited to specific tasks with a small set of input modalities and cannot handle arbitrary combinations. This highlights the need for a system that can address diverse AD-related tasks, process multi-modal or missing input, and integrate multiple advanced methods for improved performance. In this paper, we propose ADAgent, the first specialized AI agent for AD analysis, built on a large language model (LLM) to address user queries and support decision-making. ADAgent integrates a reasoning engine, specialized medical tools, and a collaborative outcome coordinator to facilitate multi-modal diagnosis and prognosis tasks in AD. Extensive experiments demonstrate that ADAgent outperforms SOTA methods, achieving significant improvements in accuracy, including a 2.7% increase in multi-modal diagnosis, a 0.7% improvement in multi-modal prognosis, and enhancements in MRI and PET diagnosis tasks.

Mixed Modality Classification Neurological Methodology In Silico Academic Lab GenAI

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

Wenlong Hou, Gangqian Yang, Ye Du, Yeung Lau, Lihao Liu, Junjun He, Ling Long, Shujun Wang

•preprint•Jun 11 2025

Mixed Modality Classification Neurological Methodology In Silico GenAI Benchmark SOTA

Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning

Ji Young Byun, Young-Jin Park, Navid Azizan, Rama Chellappa

•preprint•Jun 11 2025

As a cornerstone of patient care, clinical decision-making significantly influences patient outcomes and can be enhanced by large language models (LLMs). Although LLMs have demonstrated remarkable performance, their application to visual question answering in medical imaging, particularly for reasoning-based diagnosis, remains largely unexplored. Furthermore, supervised fine-tuning for reasoning tasks is largely impractical due to limited data availability and high annotation costs. In this work, we introduce a zero-shot framework for reliable medical image diagnosis that enhances the reasoning capabilities of LLMs in clinical settings through test-time scaling. Given a medical image and a textual prompt, a vision-language model processes a medical image along with a corresponding textual prompt to generate multiple descriptions or interpretations of visual features. These interpretations are then fed to an LLM, where a test-time scaling strategy consolidates multiple candidate outputs into a reliable final diagnosis. We evaluate our approach across various medical imaging modalities -- including radiology, ophthalmology, and histopathology -- and demonstrate that the proposed test-time scaling strategy enhances diagnostic accuracy for both our and baseline methods. Additionally, we provide an empirical analysis showing that the proposed approach, which allows unbiased prompting in the first stage, improves the reliability of LLM-generated diagnoses and enhances classification accuracy.

Mixed Modality Classification Methodology In Silico Academic Lab GenAI

Towards a general-purpose foundation model for fMRI analysis

Cheng Wang, Yu Jiang, Zhihao Peng, Chenxin Li, Changbae Bang, Lin Zhao, Jinglei Lv, Jorge Sepulcre, Carl Yang, Lifang He, Tianming Liu, Daniel Barron, Quanzheng Li, Randy Hirschtick, Byung-Hoon Kim, Xiang Li, Yixuan Yuan

•preprint•Jun 11 2025

Functional Magnetic Resonance Imaging (fMRI) is essential for studying brain function and diagnosing neurological disorders, but current analysis methods face reproducibility and transferability issues due to complex pre-processing and task-specific models. We introduce NeuroSTORM (Neuroimaging Foundation Model with Spatial-Temporal Optimized Representation Modeling), a generalizable framework that directly learns from 4D fMRI volumes and enables efficient knowledge transfer across diverse applications. NeuroSTORM is pre-trained on 28.65 million fMRI frames (>9,000 hours) from over 50,000 subjects across multiple centers and ages 5 to 100. Using a Mamba backbone and a shifted scanning strategy, it efficiently processes full 4D volumes. We also propose a spatial-temporal optimized pre-training approach and task-specific prompt tuning to improve transferability. NeuroSTORM outperforms existing methods across five tasks: age/gender prediction, phenotype prediction, disease diagnosis, fMRI-to-image retrieval, and task-based fMRI classification. It demonstrates strong clinical utility on datasets from hospitals in the U.S., South Korea, and Australia, achieving top performance in disease diagnosis and cognitive phenotype prediction. NeuroSTORM provides a standardized, open-source foundation model to improve reproducibility and transferability in fMRI-based clinical research.

MRI Classification Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

Radiomics and machine learning for predicting valve vegetation in infective endocarditis: a comparative analysis of mitral and aortic valves using TEE imaging.

Summary Report of the SNMMI AI Task Force Radiomics Challenge 2024.

Machine Learning-Based Prediction of Delayed Neurological Sequelae in Carbon Monoxide Poisoning Using Automatically Extracted MR Imaging Features.

Improving the Robustness of Deep Learning Models in Predicting Hematoma Expansion from Admission Head CT.

Slide-free surface histology enables rapid colonic polyp interpretation across specialties and foundation AI

Cross-dataset Evaluation of Dementia Longitudinal Progression Prediction Models

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

ADAgent: LLM Agent for Alzheimer's Disease Analysis with Collaborative Coordinator

Test-Time-Scaling for Zero-Shot Diagnosis with Visual-Language Reasoning

Towards a general-purpose foundation model for fMRI analysis

Ready to Sharpen Your Edge?