Latest Papers on Radiology AI. Tags: Benchmark SOTA

Deep Learning Modeling to Differentiate Multiple Sclerosis From MOG Antibody-Associated Disease.

Cortese R, Sforazzini F, Gentile G, de Mauro A, Luchetti L, Amato MP, Apóstolos-Pereira SL, Arrambide G, Bellenberg B, Bianchi A, Bisecco A, Bodini B, Calabrese M, Camera V, Celius EG, de Medeiros Rimkus C, Duan Y, Durand-Dubief F, Filippi M, Gallo A, Gasperini C, Granziera C, Groppa S, Grothe M, Gueye M, Inglese M, Jacob A, Lapucci C, Lazzarotto A, Liu Y, Llufriu S, Lukas C, Marignier R, Messina S, Müller J, Palace J, Pastó L, Paul F, Prados F, Pröbstel AK, Rovira À, Rocca MA, Ruggieri S, Sastre-Garriga J, Sato DK, Schneider R, Sepulveda M, Sowa P, Stankoff B, Tortorella C, Barkhof F, Ciccarelli O, Battaglini M, De Stefano N

•papers•Sep 23 2025

Multiple sclerosis (MS) is common in adults while myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD) is rare. Our previous machine-learning algorithm, using clinical variables, ≤6 brain lesions, and no Dawson fingers, achieved 79% accuracy, 78% sensitivity, and 80% specificity in distinguishing MOGAD from MS but lacked validation. The aim of this study was to (1) evaluate the clinical/MRI algorithm for distinguishing MS from MOGAD, (2) develop a deep learning (DL) model, (3) assess the benefit of combining both, and (4) identify key differentiators using probability attention maps (PAMs). This multicenter, retrospective, cross-sectional MAGNIMS study included scans from 19 centers. Inclusion criteria were as follows: adults with non-acute MS and MOGAD, with high-quality T2-fluid-attenuated inversion recovery and T1-weighted scans. Brain scans were scored by 2 readers to assess the performance of the clinical/MRI algorithm on the validation data set. A DL-based classifier using a ResNet-10 convolutional neural network was developed and tested on an independent validation data set. PAMs were generated by averaging correctly classified attention maps from both groups, identifying key differentiating regions. We included 406 MRI scans (218 with relapsing remitting MS [RRMS], mean age: 39 years ±11, 69% F; 188 with MOGAD, age: 41 years ±14, 61% F), split into 2 data sets: a training/testing set (n = 265: 150 with RRMS, age: 39 years ±10, 72% F; 115 with MOGAD, age: 42 years ±13, 61% F) and an independent validation set (n = 141: 68 with RRMS, age: 40 years ±14, 65% F; 73 with MOGAD, age: 40 years ±15, 63% F). The clinical/MRI algorithm predicted RRMS over MOGAD with 75% accuracy (95% CI 67-82), 96% sensitivity (95% CI 88-99), and specificity 56% (95% CI 44-68) in the validation cohort. The DL model achieved 77% accuracy (95% CI 64-89), 73% sensitivity (95% CI 57-89), and 83% specificity (95% CI 65-96) in the training/testing cohort, and 70% accuracy (95% CI 63-77), 67% sensitivity (95% CI 55-79), and 73% specificity (95% CI 61-83) in the validation cohort without retraining. When combined, the classifiers reached 86% accuracy (95% CI 81-92), 84% sensitivity (95% CI 75-92), and 89% specificity (95% CI 81-96). PAMs identified key region volumes: corpus callosum (1872 mm3), left precentral gyrus (341 mm3), right thalamus (193 mm3), and right cingulate cortex (186 mm3) for identifying RRMS and brainstem (629 mm3), hippocampus (234 mm3), and parahippocampal gyrus (147 mm3) for identifying MOGAD. Both classifiers effectively distinguished RRMS from MOGAD. The clinical/MRI model showed higher sensitivity while the DL model offered higher specificity, suggesting complementary roles. Their combination improved diagnostic accuracy, and PAMs revealed distinct damage patterns. Future prospective studies should validate these models in diverse, real-world settings. This study provides Class III evidence that both a clinical/MRI algorithm and an MRI-based DL model accurately distinguish RRMS from MOGAD.

MRI Classification Neurological Retrospective Clinical In Silico Consortium Benchmark SOTA

Dual-Feature Cross-Fusion Network for Precise Brain Tumor Classification: A Neurocomputational Approach.

M M, G S, Bendre M, Nirmal M

•papers•Sep 23 2025

Brain tumors represent a significant neurological challenge, affecting individuals across all age groups. Accurate and timely diagnosis of tumor types is critical for effective treatment planning. Magnetic Resonance Imaging (MRI) remains a primary diagnostic modality due to its non-invasive nature and ability to provide detailed brain imaging. However, traditional tumor classification relies on expert interpretation, which is time-consuming and prone to subjectivity. This study proposes a novel deep learning architecture, the Dual-Feature Cross-Fusion Network (DF-CFN), for the automated classification of brain tumors using MRI data. The model integrates ConvNeXt for capturing global contextual features and a shallow CNN combined with Feature Channel Attention Network (FcaNet) for extracting local features. These are fused through a cross-feature fusion mechanism for improved classification. The model is trained and validated using a Kaggle dataset encompassing four tumor classes (glioma, meningioma, pituitary, and non-tumor), achieving an accuracy of 99.33%. Its generalizability is further confirmed using the Figshare dataset, yielding 99.22% accuracy. Comparative analyses with baseline and recent models validate the superiority of DF-CFN in terms of precision and robustness. This approach demonstrates strong potential for assisting clinicians in reliable brain tumor classification, thereby improving diagnostic efficiency and reducing the burden on healthcare professionals.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Exploiting Cross-modal Collaboration and Discrepancy for Semi-supervised Ischemic Stroke Lesion Segmentation from Multi-sequence MRI Images.

Cao Y, Qin T, Liu Y

•papers•Sep 23 2025

Accurate ischemic stroke lesion segmentation is useful to define the optimal reperfusion treatment and unveil the stroke etiology. Despite the importance of diffusion-weighted MRI (DWI) for stroke diagnosis, learning from multi-sequence MRI images like apparent diffusion coefficient (ADC) can capitalize on the complementary nature of information from various modalities and show strong potential to improve the performance of segmentation. However, existing deep learning-based methods require large amounts of well-annotated data from multiple modalities for training, while acquiring such datasets is often impractical. We conduct the exploration of semi-supervised stroke lesion segmentation from multi-sequence MRI images by utilizing unlabeled data to improve performance using limited annotation and propose a novel framework by exploiting cross-modality collaboration and discrepancy to efficiently utilize unlabeled data. Specifically, we adopt a cross-modal bidirectional copy-paste strategy to enable information collaboration between different modalities and a cross-modal discrepancy-informed correction strategy to efficiently learn from limited labeled multi-sequence MRI data and abundant unlabeled data. Extensive experiments on the ischemic stroke lesion segmentation (ISLES 22) dataset demonstrate that our method efficiently utilizes unlabeled data with 12.32% DSC improvements compared with a supervised baseline using 10% annotations and outperforms existing semi-supervised segmentation methods with better performance.

MRI Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA

Including AI in diffusion-weighted breast MRI has potential to increase reader confidence and reduce workload.

Bounias D, Simons L, Baumgartner M, Ehring C, Neher P, Kapsner LA, Kovacs B, Floca R, Jaeger PF, Eberle J, Hadler D, Laun FB, Ohlmeyer S, Maier-Hein L, Uder M, Wenkel E, Maier-Hein KH, Bickelhaupt S

•papers•Sep 23 2025

Breast diffusion-weighted imaging (DWI) has shown potential as a standalone imaging technique for certain indications, eg, supplemental screening of women with dense breasts. This study evaluates an artificial intelligence (AI)-powered computer-aided diagnosis (CAD) system for clinical interpretation and workload reduction in breast DWI. This retrospective IRB-approved study included: n = 824 examinations for model development (2017-2020) and n = 235 for evaluation (01/2021-06/2021). Readings were performed by three readers using either the AI-CAD or manual readings. BI-RADS-like (Breast Imaging Reporting and Data System) classification was based on DWI. Histopathology served as ground truth. The model was nnDetection-based, trained using 5-fold cross-validation and ensembling. Statistical significance was determined using McNemar's test. Inter-rater agreement was calculated using Cohen's kappa. Model performance was calculated using the area under the receiver operating curve (AUC). The AI-augmented approach significantly reduced BI-RADS-like 3 calls in breast DWI by 29% (P =.019) and increased interrater agreement (0.57 ± 0.10 vs 0.49 ± 0.11), while preserving diagnostic accuracy. Two of the three readers detected more malignant lesions (63/69 vs 59/69 and 64/69 vs 62/69) with the AI-CAD. The AI model achieved an AUC of 0.78 (95% CI: [0.72, 0.85]; P <.001), which increased for women at screening age to 0.82 (95% CI: [0.73, 0.90]; P <.001), indicating a potential for workload reduction of 20.9% at 96% sensitivity. Breast DWI might benefit from AI support. In our study, AI showed potential for reduction of BI-RADS-like 3 calls and increase of inter-rater agreement. However, given the limited study size, further research is needed.

MRI Classification Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Enhancing AI-based decision support system with automatic brain tumor segmentation for EGFR mutation classification.

Gökmen N, Kocadağlı O, Cevik S, Aktan C, Eghbali R, Liu C

•papers•Sep 23 2025

Glioblastoma (GBM) carries poor prognosis; epidermal-growth-factor-receptor (EGFR) mutations further shorten survival. We propose a fully automated MRI-based decision-support system (DSS) that segments GBM and classifies EGFR status, reducing reliance on invasive biopsy. The segmentation module (UNet SI) fuses multiresolution, entropy-ranked shearlet features with CNN features, preserving fine detail through identity long-skip connections, to yield a Lightweight 1.9 M-parameter network. Tumour masks are fed to an Inception ResNet-v2 classifier via a 512-D bottleneck. The pipeline was five-fold cross-validated on 98 contrast-enhanced T1-weighted scans (Memorial Hospital; Ethics 24.12.2021/008) and externally validated on BraTS 2019. On the Memorial cohort UNet SI achieved Dice 0.873, Jaccard 0.853, SSIM 0.992, HD95 24.19 mm. EGFR classification reached Accuracy 0.960, Precision 1.000, Recall 0.871, AUC 0.94, surpassing published state-of-the-art results. Inference time is ≤ 0.18 s per slice on a 4 GB GPU. By combining shearlet-enhanced segmentation with streamlined classification, the DSS delivers superior EGFR prediction and is suitable for integration into routine clinical workflows.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurobromas in whole-body MRI

Georgii Kolokolnikov, Marie-Lena Schmalhofer, Sophie Götz, Lennart Well, Said Farschtschi, Victor-Felix Mautner, Inka Ristow, Rene Werner

•preprint•Sep 23 2025

Background and Objectives: Neurofibromatosis type 1 is a genetic disorder characterized by the development of numerous neurofibromas (NFs) throughout the body. Whole-body MRI (WB-MRI) is the clinical standard for detection and longitudinal surveillance of NF tumor growth. Existing interactive segmentation methods fail to combine high lesion-wise precision with scalability to hundreds of lesions. This study proposes a novel interactive segmentation model tailored to this challenge. Methods: We introduce MOIS-SAM2, a multi-object interactive segmentation model that extends the state-of-the-art, transformer-based, promptable Segment Anything Model 2 (SAM2) with exemplar-based semantic propagation. MOIS-SAM2 was trained and evaluated on 119 WB-MRI scans from 84 NF1 patients acquired using T2-weighted fat-suppressed sequences. The dataset was split at the patient level into a training set and four test sets (one in-domain and three reflecting different domain shift scenarios, e.g., MRI field strength variation, low tumor burden, differences in clinical site and scanner vendor). Results: On the in-domain test set, MOIS-SAM2 achieved a scan-wise DSC of 0.60 against expert manual annotations, outperforming baseline 3D nnU-Net (DSC: 0.54) and SAM2 (DSC: 0.35). Performance of the proposed model was maintained under MRI field strength shift (DSC: 0.53) and scanner vendor variation (DSC: 0.50), and improved in low tumor burden cases (DSC: 0.61). Lesion detection F1 scores ranged from 0.62 to 0.78 across test sets. Preliminary inter-reader variability analysis showed model-to-expert agreement (DSC: 0.62-0.68), comparable to inter-expert agreement (DSC: 0.57-0.69). Conclusions: The proposed MOIS-SAM2 enables efficient and scalable interactive segmentation of NFs in WB-MRI with minimal user input and strong generalization, supporting integration into clinical workflows.

MRI Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Graph-Radiomic Learning (GrRAiL) Descriptor to Characterize Imaging Heterogeneity in Confounding Tumor Pathologies

Dheerendranath Battalapalli, Apoorva Safai, Maria Jaramillo, Hyemin Um, Gustavo Adalfo Pineda Ortiz, Ulas Bagci, Manmeet Singh Ahluwalia, Marwa Ismail, Pallavi Tiwari

•preprint•Sep 23 2025

A significant challenge in solid tumors is reliably distinguishing confounding pathologies from malignant neoplasms on routine imaging. While radiomics methods seek surrogate markers of lesion heterogeneity on CT/MRI, many aggregate features across the region of interest (ROI) and miss complex spatial relationships among varying intensity compositions. We present a new Graph-Radiomic Learning (GrRAiL) descriptor for characterizing intralesional heterogeneity (ILH) on clinical MRI scans. GrRAiL (1) identifies clusters of sub-regions using per-voxel radiomic measurements, then (2) computes graph-theoretic metrics to quantify spatial associations among clusters. The resulting weighted graphs encode higher-order spatial relationships within the ROI, aiming to reliably capture ILH and disambiguate confounding pathologies from malignancy. To assess efficacy and clinical feasibility, GrRAiL was evaluated in n=947 subjects spanning three use cases: differentiating tumor recurrence from radiation effects in glioblastoma (GBM; n=106) and brain metastasis (n=233), and stratifying pancreatic intraductal papillary mucinous neoplasms (IPMNs) into no+low vs high risk (n=608). In a multi-institutional setting, GrRAiL consistently outperformed state-of-the-art baselines - Graph Neural Networks (GNNs), textural radiomics, and intensity-graph analysis. In GBM, cross-validation (CV) and test accuracies for recurrence vs pseudo-progression were 89% and 78% with >10% test-accuracy gains over comparators. In brain metastasis, CV and test accuracies for recurrence vs radiation necrosis were 84% and 74% (>13% improvement). For IPMN risk stratification, CV and test accuracies were 84% and 75%, showing >10% improvement.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Learning neuroimaging models from health system-scale data

Yiwei Lyu, Samir Harake, Asadur Chowdury, Soumyanil Banerjee, Rachel Gologorsky, Shixuan Liu, Anna-Katharina Meissner, Akshay Rao, Chenhui Zhao, Akhil Kondepudi, Cheng Jiang, Xinhai Hou, Rushikesh S. Joshi, Volker Neuschmelting, Ashok Srinivasan, Dawn Kleindorfer, Brian Athey, Vikas Gulani, Aditya Pandey, Honglak Lee, Todd Hollon

•preprint•Sep 23 2025

Neuroimaging is a ubiquitous tool for evaluating patients with neurological diseases. The global demand for magnetic resonance imaging (MRI) studies has risen steadily, placing significant strain on health systems, prolonging turnaround times, and intensifying physician burnout \cite{Chen2017-bt, Rula2024-qp-1}. These challenges disproportionately impact patients in low-resource and rural settings. Here, we utilized a large academic health system as a data engine to develop Prima, the first vision language model (VLM) serving as an AI foundation for neuroimaging that supports real-world, clinical MRI studies as input. Trained on over 220,000 MRI studies, Prima uses a hierarchical vision architecture that provides general and transferable MRI features. Prima was tested in a 1-year health system-wide study that included 30K MRI studies. Across 52 radiologic diagnoses from the major neurologic disorders, including neoplastic, inflammatory, infectious, and developmental lesions, Prima achieved a mean diagnostic area under the ROC curve of 92.0, outperforming other state-of-the-art general and medical AI models. Prima offers explainable differential diagnoses, worklist priority for radiologists, and clinical referral recommendations across diverse patient demographics and MRI systems. Prima demonstrates algorithmic fairness across sensitive groups and can help mitigate health system biases, such as prolonged turnaround times for low-resource populations. These findings highlight the transformative potential of health system-scale VLMs and Prima's role in advancing AI-driven healthcare.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurofibromas in whole-body MRI

Georgii Kolokolnikov, Marie-Lena Schmalhofer, Sophie Goetz, Lennart Well, Said Farschtschi, Victor-Felix Mautner, Inka Ristow, Rene Werner

•preprint•Sep 23 2025

MRI Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Enhancing the CAD-RADS™ 2.0 Category Assignment Performance of ChatGPT and DeepSeek Through "Few-shot" Prompting.

Kaya HE

•papers•Sep 23 2025

To assess whether few-shot prompting improves the performance of 2 popular large language models (LLMs) (ChatGPT o1 and DeepSeek-R1) in assigning Coronary Artery Disease Reporting and Data System (CAD-RADS™ 2.0) categories. A detailed few-shot prompt based on CAD-RADS™ 2.0 framework was developed using 20 reports from the MIMIC-IV database. Subsequently, 100 modified reports from the same database were categorized using zero-shot and few-shot prompts through the models' user interface. Model accuracy was evaluated by comparing assignments to a reference radiologist's classifications, including stenosis categories and modifiers. To assess reproducibility, 50 reports were reclassified using the same few-shot prompt. McNemar tests and Cohen kappa were used for statistical analysis. Using zero-shot prompting, accuracy was low for both models (ChatGPT: 14%, DeepSeek: 8%), with correct assignments occurring almost exclusively in CAD-RADS 0 cases. Hallucinations occurred frequently (ChatGPT: 19%, DeepSeek: 54%). Few-shot prompting significantly improved accuracy to 98% for ChatGPT and 93% for DeepSeek (both P<0.001) and eliminated hallucinations. Kappa values for agreement between model-generated and radiologist-assigned classifications were 0.979 (0.950, 1.000) (P<0.001) for ChatGPT and 0.916 (0.859, 0.973) (P<0.001) for DeepSeek, indicating almost perfect agreement for both models without a significant difference between the models (P=0.180). Reproducibility analysis yielded kappa values of 0.957 (0.900, 1.000) (P<0.001) for ChatGPT and 0.873 [0.779, 0.967] (P<0.001) for DeepSeek, indicating almost perfect and strong agreement between repeated assignments, respectively, with no significant difference between the models (P=0.125). Few-shot prompting substantially enhances LLMs' accuracy in assigning CAD-RADS™ 2.0 categories, suggesting potential for clinical application and facilitating system adoption.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Deep Learning Modeling to Differentiate Multiple Sclerosis From MOG Antibody-Associated Disease.

Dual-Feature Cross-Fusion Network for Precise Brain Tumor Classification: A Neurocomputational Approach.

Exploiting Cross-modal Collaboration and Discrepancy for Semi-supervised Ischemic Stroke Lesion Segmentation from Multi-sequence MRI Images.

Including AI in diffusion-weighted breast MRI has potential to increase reader confidence and reduce workload.

Enhancing AI-based decision support system with automatic brain tumor segmentation for EGFR mutation classification.

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurobromas in whole-body MRI

Graph-Radiomic Learning (GrRAiL) Descriptor to Characterize Imaging Heterogeneity in Confounding Tumor Pathologies

Learning neuroimaging models from health system-scale data

MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurofibromas in whole-body MRI

Enhancing the CAD-RADS™ 2.0 Category Assignment Performance of ChatGPT and DeepSeek Through "Few-shot" Prompting.

Ready to Sharpen Your Edge?