Latest Papers on Radiology AI. Tags: Classification

Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography.

He Z, McMillan AB

•papers•Sep 23 2025

The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks (CNNs) and vision transformers (ViTs), learn directly from image data, radiomics-based models extract handcrafted features, offering potential advantages in data-limited scenarios. We systematically compared the diagnostic performance of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines (SVMs), and Multi-Layer Perceptrons (MLPs) for radiomics, against state-of-the-art deep learning models such as InceptionV3, EfficientNetL, and ConvNeXtXLarge. Performance was evaluated across multiple sample sizes. At 24 samples, EfficientNetL achieved an AUC of 0.839, outperforming SVM (AUC = 0.762). At 4000 samples, InceptionV3 achieved the highest AUC of 0.996, compared to 0.885 for Random Forest. A Scheirer-Ray-Hare test confirmed significant main and interaction effects of model type and sample size on all metrics. Post hoc Mann-Whitney U tests with Bonferroni correction further revealed consistent performance advantages for deep learning models across most conditions. These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI. Deep learning models demonstrated higher performance and better scalability with increasing data availability, while radiomics-based models may remain useful in low-data contexts. This study addresses a critical gap in AI-based diagnostic research by offering practical guidance for deploying AI models across diverse clinical environments.

X-Ray Classification Chest Methodology In Silico

Exploring the role of preprocessing combinations in hyperspectral imaging for deep learning colorectal cancer detection.

Tkachenko M, Huber B, Hamotskyi S, Jansen-Winkeln B, Gockel I, Neumuth T, Köhler H, Maktabi M

•papers•Sep 23 2025

This study compares various preprocessing techniques for hyperspectral deep learning-based cancer diagnostics. The study considers different spectrum scaling and noise reduction options across spatial and spectral axes of hyperspectral datacubes, as well varying levels of blood and light reflections removal. We also examine how the size of the patches extracted from the hyperspectral data affects the models' performance. We additionally explore various strategies to mitigate our dataset's imbalance (where cancerous tissues are underrepresented). Our results indicate that. Scaling: Standardization significantly improves both sensitivity and specificity compared to Normalization. Larger input patch sizes enhance performance by capturing more spatial context. Noise reduction unexpectedly degrades performance. Blood filtering is more effective than filtering reflected light pixels, although neither approach produces significant results. By carefully maintaining consistent testing conditions, we ensure a fair comparison across preprocessing methods and reproducibility. Our findings highlight the necessity of careful preprocessing selection to maximize deep learning performance in medical imaging applications.

Mixed Modality Classification Abdominal Methodology In Silico

Early prediction of periventricular leukomalacia from MRI changes: a machine learning approach for risk stratification.

Lin J, Luo J, Luo Y, Zhuang Y, Mo T, Wen S, Chen T, Yun G, Zeng H

•papers•Sep 23 2025

To develop an accessible model integrating clinical, MRI, and radiomic features to predict periventricular leukomalacia (PVL) in high-risk infants. Two hundred and seventeen infants (2015-2022) with suspected motor abnormalities, stratified into training (n = 124), internal validation (n = 31), and external validation (n = 62) cohorts by MRI scanners. Radiomic features were extracted from white matter regions on axial sequences. Feature selection employed T-tests, correlation filtering, Random Forest, and LASSO regression. Multivariate logistic models were evaluated by receiver operating characteristic (ROC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, calibration, decision curve analysis (DCA), net reclassification index (NRI), and integrated discrimination improvement (IDI). Clinical predictors (gestational age, neonatal hypoglycemia, hypoxic-ischemic events, infection) and MRI features (dilated lateral ventricle, delayed myelination, and periventricular abnormal signal) were retained through univariate and multivariate screening. Five clinical predictive models, including clinical model (Model C), MRI model (Model M), Clinical + MRI model (Model C + M), radiomic model and Clinical + MRI + Radiomics model (Model C + M + R), were developed and validated using internal testing, bootstrapping, and external cohorts. Among them, Model C + M + R achieved the best overall performance, with an area under curve (AUC) of 0.96 (95% CI: 0.90-1.00), accuracy of 0.87 (95% CI: 0.76-0.94), sensitivity of 0.88, specificity of 0.85, PPV of 0.96, and NPV of 0.65 in the external validation cohort. Comparison with Model C + M, Model C + M + R demonstrated significant reclassification (NRI = 0.631, p < 0.001) and discrimination improvements (IDI = 0.037, p = 0.020). Conventional MRI-derived radiomics enhances PVL risk stratification. Interpretable accessible model for clinical use provides a new tool for high-risk infant evaluation. Question Periventricular leukomalacia requires early identification to optimize neurorehabilitation. Early white matter injury in infants is challenging to identify through conventional MRI visual assessment. Findings The clinical-MRI-radiomic model demonstrates the best performance for predicting PVL, with an AUC of 0.93 in the training and 0.96 in the external validation cohort. Clinical relevance An accessible and interpretable predictive tool for periventricular leukomalacia prediction has been developed and validated, which may enable earlier targeted interventions.

MRI Classification Neurological Retrospective Clinical In Silico

Deep Learning Modeling to Differentiate Multiple Sclerosis From MOG Antibody-Associated Disease.

Cortese R, Sforazzini F, Gentile G, de Mauro A, Luchetti L, Amato MP, Apóstolos-Pereira SL, Arrambide G, Bellenberg B, Bianchi A, Bisecco A, Bodini B, Calabrese M, Camera V, Celius EG, de Medeiros Rimkus C, Duan Y, Durand-Dubief F, Filippi M, Gallo A, Gasperini C, Granziera C, Groppa S, Grothe M, Gueye M, Inglese M, Jacob A, Lapucci C, Lazzarotto A, Liu Y, Llufriu S, Lukas C, Marignier R, Messina S, Müller J, Palace J, Pastó L, Paul F, Prados F, Pröbstel AK, Rovira À, Rocca MA, Ruggieri S, Sastre-Garriga J, Sato DK, Schneider R, Sepulveda M, Sowa P, Stankoff B, Tortorella C, Barkhof F, Ciccarelli O, Battaglini M, De Stefano N

•papers•Sep 23 2025

Multiple sclerosis (MS) is common in adults while myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD) is rare. Our previous machine-learning algorithm, using clinical variables, ≤6 brain lesions, and no Dawson fingers, achieved 79% accuracy, 78% sensitivity, and 80% specificity in distinguishing MOGAD from MS but lacked validation. The aim of this study was to (1) evaluate the clinical/MRI algorithm for distinguishing MS from MOGAD, (2) develop a deep learning (DL) model, (3) assess the benefit of combining both, and (4) identify key differentiators using probability attention maps (PAMs). This multicenter, retrospective, cross-sectional MAGNIMS study included scans from 19 centers. Inclusion criteria were as follows: adults with non-acute MS and MOGAD, with high-quality T2-fluid-attenuated inversion recovery and T1-weighted scans. Brain scans were scored by 2 readers to assess the performance of the clinical/MRI algorithm on the validation data set. A DL-based classifier using a ResNet-10 convolutional neural network was developed and tested on an independent validation data set. PAMs were generated by averaging correctly classified attention maps from both groups, identifying key differentiating regions. We included 406 MRI scans (218 with relapsing remitting MS [RRMS], mean age: 39 years ±11, 69% F; 188 with MOGAD, age: 41 years ±14, 61% F), split into 2 data sets: a training/testing set (n = 265: 150 with RRMS, age: 39 years ±10, 72% F; 115 with MOGAD, age: 42 years ±13, 61% F) and an independent validation set (n = 141: 68 with RRMS, age: 40 years ±14, 65% F; 73 with MOGAD, age: 40 years ±15, 63% F). The clinical/MRI algorithm predicted RRMS over MOGAD with 75% accuracy (95% CI 67-82), 96% sensitivity (95% CI 88-99), and specificity 56% (95% CI 44-68) in the validation cohort. The DL model achieved 77% accuracy (95% CI 64-89), 73% sensitivity (95% CI 57-89), and 83% specificity (95% CI 65-96) in the training/testing cohort, and 70% accuracy (95% CI 63-77), 67% sensitivity (95% CI 55-79), and 73% specificity (95% CI 61-83) in the validation cohort without retraining. When combined, the classifiers reached 86% accuracy (95% CI 81-92), 84% sensitivity (95% CI 75-92), and 89% specificity (95% CI 81-96). PAMs identified key region volumes: corpus callosum (1872 mm3), left precentral gyrus (341 mm3), right thalamus (193 mm3), and right cingulate cortex (186 mm3) for identifying RRMS and brainstem (629 mm3), hippocampus (234 mm3), and parahippocampal gyrus (147 mm3) for identifying MOGAD. Both classifiers effectively distinguished RRMS from MOGAD. The clinical/MRI model showed higher sensitivity while the DL model offered higher specificity, suggesting complementary roles. Their combination improved diagnostic accuracy, and PAMs revealed distinct damage patterns. Future prospective studies should validate these models in diverse, real-world settings. This study provides Class III evidence that both a clinical/MRI algorithm and an MRI-based DL model accurately distinguish RRMS from MOGAD.

MRI Classification Neurological Retrospective Clinical In Silico Consortium Benchmark SOTA

The LongiMam model for improved breast cancer risk prediction using longitudinal mammograms

Manel Rakez, Thomas Louis, Julien Guillaumin, Foucauld Chamming's, Pierre Fillard, Brice Amadeo, Virginie Rondeau

•preprint•Sep 23 2025

Risk-adapted breast cancer screening requires robust models that leverage longitudinal imaging data. Most current deep learning models use single or limited prior mammograms and lack adaptation for real-world settings marked by imbalanced outcome distribution and heterogeneous follow-up. We developed LongiMam, an end-to-end deep learning model that integrates both current and up to four prior mammograms. LongiMam combines a convolutional and a recurrent neural network to capture spatial and temporal patterns predictive of breast cancer. The model was trained and evaluated using a large, population-based screening dataset with disproportionate case-to-control ratio typical of clinical screening. Across several scenarios that varied in the number and composition of prior exams, LongiMam consistently improved prediction when prior mammograms were included. The addition of prior and current visits outperformed single-visit models, while priors alone performed less well, highlighting the importance of combining historical and recent information. Subgroup analyses confirmed the model's efficacy across key risk groups, including women with dense breasts and those aged 55 years or older. Moreover, the model performed best in women with observed changes in mammographic density over time. These findings demonstrate that longitudinal modeling enhances breast cancer prediction and support the use of repeated mammograms to refine risk stratification in screening programs. LongiMam is publicly available as open-source software.

Mammography Classification Breast Methodology In Silico Academic Lab Open Code

A systematic review of early neuroimaging and neurophysiological biomarkers for post-stroke mobility prognostication

Levy, C., Dalton, E. J., Ferris, J. K., Campbell, B. C. V., Brodtmann, A., Brauer, S., Churilov, L., Hayward, K. S.

•preprint•Sep 23 2025

BackgroundAccurate prognostication of mobility outcomes is essential to guide rehabilitation and manage patient expectations. The prognostic utility of neuroimaging and neurophysiological biomarkers remains uncertain when measured early post-stroke. This systematic review aimed to examine the prognostic capacity of early neuroimaging and neurophysiological biomarkers of mobility outcomes up to 24-months post-stroke. MethodsMEDLINE and EMBASE were searched from inception to June 2025. Cohort studies that reported neuroimaging or neurophysiological biomarkers measured [≤]14-days post-stroke and mobility outcome(s) assessed >14-days and [≤]24-months post-stroke were included. Biomarker analyses were classified by statistical analysis approach (association, discrimination/classification or validation). Magnitude of relevant statistical measures was used as the primary indicator of prognostic capacity. Risk of bias was assessed using the Quality in Prognostic Studies tool. Meta-analysis was not performed due to heterogeneity. ResultsTwenty reports from 18 independent study samples (n=2,160 participants) were included. Biomarkers were measured a median 7.5-days post-stroke, and outcomes were assessed between 1- and 12-months. Eighty-six biomarker analyses were identified (61 neuroimaging, 25 neurophysiological) and the majority used an association approach (88%). Few used discrimination/classification methods (11%), and only one conducted internal validation (1%); an MRI-based machine learning model which demonstrated excellent discrimination but still requires external validation. Structural and functional corticospinal tract integrity were frequently investigated, and most associations were small or non-significant. Lesion location and size were also commonly examined, but findings were inconsistent and often lacked magnitude reporting. Methodological limitations were common, including small sample sizes, moderate to high risk of bias, poor reporting of magnitudes, and heterogeneous outcome measures and follow-up time points. ConclusionsCurrent evidence provides limited support for early neuroimaging and neurophysiological biomarkers to prognosticate post-stroke mobility outcomes. Most analyses remain at the association stage, with minimal progress toward validation and clinical implementation. Advancing the field requires international collaboration using harmonized methodologies, standardised statistical reporting, and consistent outcome measures and timepoints. RegistrationURL: https://www.crd.york.ac.uk/prospero/; Unique identifier: CRD42022350771.

MRI Classification Neurological Review Concept Academic Lab

Enhancing the CAD-RADS™ 2.0 Category Assignment Performance of ChatGPT and DeepSeek Through "Few-shot" Prompting.

Kaya HE

•papers•Sep 23 2025

To assess whether few-shot prompting improves the performance of 2 popular large language models (LLMs) (ChatGPT o1 and DeepSeek-R1) in assigning Coronary Artery Disease Reporting and Data System (CAD-RADS™ 2.0) categories. A detailed few-shot prompt based on CAD-RADS™ 2.0 framework was developed using 20 reports from the MIMIC-IV database. Subsequently, 100 modified reports from the same database were categorized using zero-shot and few-shot prompts through the models' user interface. Model accuracy was evaluated by comparing assignments to a reference radiologist's classifications, including stenosis categories and modifiers. To assess reproducibility, 50 reports were reclassified using the same few-shot prompt. McNemar tests and Cohen kappa were used for statistical analysis. Using zero-shot prompting, accuracy was low for both models (ChatGPT: 14%, DeepSeek: 8%), with correct assignments occurring almost exclusively in CAD-RADS 0 cases. Hallucinations occurred frequently (ChatGPT: 19%, DeepSeek: 54%). Few-shot prompting significantly improved accuracy to 98% for ChatGPT and 93% for DeepSeek (both P<0.001) and eliminated hallucinations. Kappa values for agreement between model-generated and radiologist-assigned classifications were 0.979 (0.950, 1.000) (P<0.001) for ChatGPT and 0.916 (0.859, 0.973) (P<0.001) for DeepSeek, indicating almost perfect agreement for both models without a significant difference between the models (P=0.180). Reproducibility analysis yielded kappa values of 0.957 (0.900, 1.000) (P<0.001) for ChatGPT and 0.873 [0.779, 0.967] (P<0.001) for DeepSeek, indicating almost perfect and strong agreement between repeated assignments, respectively, with no significant difference between the models (P=0.125). Few-shot prompting substantially enhances LLMs' accuracy in assigning CAD-RADS™ 2.0 categories, suggesting potential for clinical application and facilitating system adoption.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

CT-based radiomics deep learning signatures for noninvasive prediction of early recurrence after radical surgery in locally advanced colorectal cancer: A multicenter study.

Zhou Y, Zhao J, Tan Y, Zou F, Fang L, Wei P, Zeng W, Gong L, Liu L, Zhong L

•papers•Sep 23 2025

Preoperative identification of high-risk locally advanced colorectal cancer (LACRC) patients is vital for optimizing treatment and minimizing toxicity. This study aims to develop and validate a combined model of CT-based images and clinical laboratory parameters to noninvasively predict postoperative early recurrence (ER) in LACRC patients. A retrospective cohort of 560 pathologically confirmed LACRC patients collected from three centers between July 2018 and March 2022 and the Gene Expression Omnibus (GEO) dataset was analyzed. We extracted radiomics and deep learning signatures (RDs) using eight machine learning techniques, integrated them with clinical-laboratory parameters to construct a preoperative combined model, and validated it in two external datasets. Its predictive performance was compared with postoperative pathological and TNM staging models. Kaplan-Meier analysis was used to evaluate preoperative risk stratification, and molecular correlations with ER were explored using GEO RNA-sequencing data. The model included five independent prognostic factors: RDs, lymphocyte-to-monocyte ratio, neutrophil-to-lymphocyte ratio, lymphocyte-Albumin, and prognostic nutritional index. It outperformed pathological and TNM models in two external datasets (AUC for test set 1:0.865 vs. 0.766, 0.665; AUC for test set 2: 0.848 vs. 0.754, 0.694). Preoperative risk stratification identified significantly better disease-free survival in low-risk vs. high-risk patients across all subgroups (p < 0.01). High enrichment scores were associated with upregulated tumor proliferation pathways (epithelial-mesenchymal transition [EMT] and inflammatory response pathways) and altered immune cell infiltration patterns in the tumor microenvironment. The preoperative model enables treatment strategy optimization and reduces unnecessary drug toxicity by noninvasively predicting ER in LACRC.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Dual-Feature Cross-Fusion Network for Precise Brain Tumor Classification: A Neurocomputational Approach.

M M, G S, Bendre M, Nirmal M

•papers•Sep 23 2025

Brain tumors represent a significant neurological challenge, affecting individuals across all age groups. Accurate and timely diagnosis of tumor types is critical for effective treatment planning. Magnetic Resonance Imaging (MRI) remains a primary diagnostic modality due to its non-invasive nature and ability to provide detailed brain imaging. However, traditional tumor classification relies on expert interpretation, which is time-consuming and prone to subjectivity. This study proposes a novel deep learning architecture, the Dual-Feature Cross-Fusion Network (DF-CFN), for the automated classification of brain tumors using MRI data. The model integrates ConvNeXt for capturing global contextual features and a shallow CNN combined with Feature Channel Attention Network (FcaNet) for extracting local features. These are fused through a cross-feature fusion mechanism for improved classification. The model is trained and validated using a Kaggle dataset encompassing four tumor classes (glioma, meningioma, pituitary, and non-tumor), achieving an accuracy of 99.33%. Its generalizability is further confirmed using the Figshare dataset, yielding 99.22% accuracy. Comparative analyses with baseline and recent models validate the superiority of DF-CFN in terms of precision and robustness. This approach demonstrates strong potential for assisting clinicians in reliable brain tumor classification, thereby improving diagnostic efficiency and reducing the burden on healthcare professionals.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Yi Gu, Kuniaki Saito, Jiaxin Ma

•preprint•Sep 22 2025

As medical diagnoses increasingly leverage multimodal data, machine learning models are expected to effectively fuse heterogeneous information while remaining robust to missing modalities. In this work, we propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning to address real-world limitations such as modality imbalance and missingness. Our approach introduces learnable modality tokens for improving missingness-aware fusion of modalities and augments conventional unimodal contrastive objectives with fused multimodal representations. We validate our framework on large-scale clinical datasets for disease detection and prediction tasks, encompassing both visual and tabular modalities. Experimental results demonstrate that our method achieves state-of-the-art performance, particularly in challenging and practical scenarios where only a single modality is available. Furthermore, we show its adaptability through successful integration with a recent CT foundation model. Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning, offering a scalable, low-cost solution with significant potential for real-world clinical applications. The code is available at https://github.com/omron-sinicx/medical-modality-dropout.

CT Classification Methodology In Silico Academic Lab Open Code Benchmark SOTA

Filter Papers

Tags

Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography.

Exploring the role of preprocessing combinations in hyperspectral imaging for deep learning colorectal cancer detection.

Early prediction of periventricular leukomalacia from MRI changes: a machine learning approach for risk stratification.

Deep Learning Modeling to Differentiate Multiple Sclerosis From MOG Antibody-Associated Disease.

The LongiMam model for improved breast cancer risk prediction using longitudinal mammograms

A systematic review of early neuroimaging and neurophysiological biomarkers for post-stroke mobility prognostication

Enhancing the CAD-RADS™ 2.0 Category Assignment Performance of ChatGPT and DeepSeek Through "Few-shot" Prompting.

CT-based radiomics deep learning signatures for noninvasive prediction of early recurrence after radical surgery in locally advanced colorectal cancer: A multicenter study.

Dual-Feature Cross-Fusion Network for Precise Brain Tumor Classification: A Neurocomputational Approach.

Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction

Ready to Sharpen Your Edge?