Latest Papers on Radiology AI. Tags: Benchmark SOTA

Comparative Analysis of CNN Performance in Keras, PyTorch and JAX on PathMNIST

Anida Nezović, Jalal Romano, Nada Marić, Medina Kapo, Amila Akagić

•preprint•Jul 16 2025

Deep learning has significantly advanced the field of medical image classification, particularly with the adoption of Convolutional Neural Networks (CNNs). Various deep learning frameworks such as Keras, PyTorch and JAX offer unique advantages in model development and deployment. However, their comparative performance in medical imaging tasks remains underexplored. This study presents a comprehensive analysis of CNN implementations across these frameworks, using the PathMNIST dataset as a benchmark. We evaluate training efficiency, classification accuracy and inference speed to assess their suitability for real-world applications. Our findings highlight the trade-offs between computational speed and model accuracy, offering valuable insights for researchers and practitioners in medical image analysis.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis

Trong-Thang Pham, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Van Nguyen, Ngan Le

•preprint•Jul 16 2025

Radiologists rely on eye movements to navigate and interpret medical images. A trained radiologist possesses knowledge about the potential diseases that may be present in the images and, when searching, follows a mental checklist to locate them using their gaze. This is a key observation, yet existing models fail to capture the underlying intent behind each fixation. In this paper, we introduce a deep learning-based approach, RadGazeIntent, designed to model this behavior: having an intention to find something and actively searching for it. Our transformer-based architecture processes both the temporal and spatial dimensions of gaze data, transforming fine-grained fixation features into coarse, meaningful representations of diagnostic intent to interpret radiologists' goals. To capture the nuances of radiologists' varied intention-driven behaviors, we process existing medical eye-tracking datasets to create three intention-labeled subsets: RadSeq (Systematic Sequential Search), RadExplore (Uncertainty-driven Exploration), and RadHybrid (Hybrid Pattern). Experimental results demonstrate RadGazeIntent's ability to predict which findings radiologists are examining at specific moments, outperforming baseline methods across all intention-labeled datasets.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Sybelle Goedicke-Fritz, Michelle Bous, Annika Engel, Matthias Flotho, Pascal Hirsch, Hannah Wittig, Dino Milanovic, Dominik Mohr, Mathias Kaspar, Sogand Nemat, Dorothea Kerner, Arno Bücker, Andreas Keller, Sascha Meyer, Michael Zemlin, Philipp Flotho

•preprint•Jul 16 2025

Bronchopulmonary dysplasia (BPD) is a chronic lung disease affecting 35% of extremely low birth weight infants. Defined by oxygen dependence at 36 weeks postmenstrual age, it causes lifelong respiratory complications. However, preventive interventions carry severe risks, including neurodevelopmental impairment, ventilator-induced lung injury, and systemic complications. Therefore, early BPD prognosis and prediction of BPD outcome is crucial to avoid unnecessary toxicity in low risk infants. Admission radiographs of extremely preterm infants are routinely acquired within 24h of life and could serve as a non-invasive prognostic tool. In this work, we developed and investigated a deep learning approach using chest X-rays from 163 extremely low-birth-weight infants ($\leq$32 weeks gestation, 401-999g) obtained within 24 hours of birth. We fine-tuned a ResNet-50 pretrained specifically on adult chest radiographs, employing progressive layer freezing with discriminative learning rates to prevent overfitting and evaluated a CutMix augmentation and linear probing. For moderate/severe BPD outcome prediction, our best performing model with progressive freezing, linear probing and CutMix achieved an AUROC of 0.78 $\pm$ 0.10, balanced accuracy of 0.69 $\pm$ 0.10, and an F1-score of 0.67 $\pm$ 0.11. In-domain pre-training significantly outperformed ImageNet initialization (p = 0.031) which confirms domain-specific pretraining to be important for BPD outcome prediction. Routine IRDS grades showed limited prognostic value (AUROC 0.57 $\pm$ 0.11), confirming the need of learned markers. Our approach demonstrates that domain-specific pretraining enables accurate BPD prediction from routine day-1 radiographs. Through progressive freezing and linear probing, the method remains computationally feasible for site-level implementation and future federated learning deployments.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Hybrid Ensemble Approaches: Optimal Deep Feature Fusion and Hyperparameter-Tuned Classifier Ensembling for Enhanced Brain Tumor Classification

Zahid Ullah, Dragan Pamucar, Jihie Kim

•preprint•Jul 16 2025

Magnetic Resonance Imaging (MRI) is widely recognized as the most reliable tool for detecting tumors due to its capability to produce detailed images that reveal their presence. However, the accuracy of diagnosis can be compromised when human specialists evaluate these images. Factors such as fatigue, limited expertise, and insufficient image detail can lead to errors. For example, small tumors might go unnoticed, or overlap with healthy brain regions could result in misidentification. To address these challenges and enhance diagnostic precision, this study proposes a novel double ensembling framework, consisting of ensembled pre-trained deep learning (DL) models for feature extraction and ensembled fine-tuned hyperparameter machine learning (ML) models to efficiently classify brain tumors. Specifically, our method includes extensive preprocessing and augmentation, transfer learning concepts by utilizing various pre-trained deep convolutional neural networks and vision transformer networks to extract deep features from brain MRI, and fine-tune hyperparameters of ML classifiers. Our experiments utilized three different publicly available Kaggle MRI brain tumor datasets to evaluate the pre-trained DL feature extractor models, ML classifiers, and the effectiveness of an ensemble of deep features along with an ensemble of ML classifiers for brain tumor classification. Our results indicate that the proposed feature fusion and classifier fusion improve upon the state of the art, with hyperparameter fine-tuning providing a significant enhancement over the ensemble method. Additionally, we present an ablation study to illustrate how each component contributes to accurate brain tumor classification.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Benchmarking and Explaining Deep Learning Cortical Lesion MRI Segmentation in Multiple Sclerosis

Nataliia Molchanova, Alessandro Cagol, Mario Ocampo-Pineda, Po-Jui Lu, Matthias Weigel, Xinjie Chen, Erin Beck, Charidimos Tsagkas, Daniel Reich, Colin Vanden Bulcke, Anna Stolting, Serena Borrelli, Pietro Maggi, Adrien Depeursinge, Cristina Granziera, Henning Mueller, Pedro M. Gordaliza, Meritxell Bach Cuadra

•preprint•Jul 16 2025

Cortical lesions (CLs) have emerged as valuable biomarkers in multiple sclerosis (MS), offering high diagnostic specificity and prognostic relevance. However, their routine clinical integration remains limited due to subtle magnetic resonance imaging (MRI) appearance, challenges in expert annotation, and a lack of standardized automated methods. We propose a comprehensive multi-centric benchmark of CL detection and segmentation in MRI. A total of 656 MRI scans, including clinical trial and research data from four institutions, were acquired at 3T and 7T using MP2RAGE and MPRAGE sequences with expert-consensus annotations. We rely on the self-configuring nnU-Net framework, designed for medical imaging segmentation, and propose adaptations tailored to the improved CL detection. We evaluated model generalization through out-of-distribution testing, demonstrating strong lesion detection capabilities with an F1-score of 0.64 and 0.5 in and out of the domain, respectively. We also analyze internal model features and model errors for a better understanding of AI decision-making. Our study examines how data variability, lesion ambiguity, and protocol differences impact model performance, offering future recommendations to address these barriers to clinical adoption. To reinforce the reproducibility, the implementation and models will be publicly accessible and ready to use at https://github.com/Medical-Image-Analysis-Laboratory/ and https://doi.org/10.5281/zenodo.15911797.

MRI Segmentation Neurological Methodology In Silico Academic Lab Benchmark SOTA Reproducibility Open Code

Identifying Signatures of Image Phenotypes to Track Treatment Response in Liver Disease

Matthias Perkonigg, Nina Bastati, Ahmed Ba-Ssalamah, Peter Mesenbrink, Alexander Goehler, Miljen Martic, Xiaofei Zhou, Michael Trauner, Georg Langs

•preprint•Jul 16 2025

Quantifiable image patterns associated with disease progression and treatment response are critical tools for guiding individual treatment, and for developing novel therapies. Here, we show that unsupervised machine learning can identify a pattern vocabulary of liver tissue in magnetic resonance images that quantifies treatment response in diffuse liver disease. Deep clustering networks simultaneously encode and cluster patches of medical images into a low-dimensional latent space to establish a tissue vocabulary. The resulting tissue types capture differential tissue change and its location in the liver associated with treatment response. We demonstrate the utility of the vocabulary on a randomized controlled trial cohort of non-alcoholic steatohepatitis patients. First, we use the vocabulary to compare longitudinal liver change in a placebo and a treatment cohort. Results show that the method identifies specific liver tissue change pathways associated with treatment, and enables a better separation between treatment groups than established non-imaging measures. Moreover, we show that the vocabulary can predict biopsy derived features from non-invasive imaging data. We validate the method on a separate replication cohort to demonstrate the applicability of the proposed method.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multi-DECT image-based radiomics with interpretable machine learning for preoperative prediction of tumor budding grade and prognosis in colorectal cancer: a dual-center study.

Lin G, Chen W, Chen Y, Cao J, Mao W, Xia S, Chen M, Xu M, Lu C, Ji J

•papers•Jul 16 2025

This study evaluates the predictive ability of multiparametric dual-energy computed tomography (multi-DECT) radiomics for tumor budding (TB) grade and prognosis in patients with colorectal cancer (CRC). This study comprised 510 CRC patients at two institutions. The radiomics features of multi-DECT images (including polyenergetic, virtual monoenergetic, iodine concentration [IC], and effective atomic number images) were screened to build radiomics models utilizing nine machine learning (ML) algorithms. An ML-based fusion model comprising clinical-radiological variables and radiomics features was developed. The assessment of model performance was conducted through the area under the receiver operating characteristic curve (AUC), while the model's interpretability was assessed by shapley additive explanation (SHAP). The prognostic significance of the fusion model was determined via survival analysis. The CT-reported lymph node status and normalized IC were used to develop a clinical-radiological model. Among the nine examined ML algorithms, the extreme gradient boosting (XGB) algorithm performed best. The XGB-based fusion model containing multi-DECT radiomics features outperformed the clinical-radiological model in predicting TB grade, demonstrating superior AUCs of 0.969 in the training cohort, 0.934 in the internal validation cohort, and 0.897 in the external validation cohort. The SHAP analysis identified variables influencing model predictions. Patients with a model-predicted high TB grade had worse recurrence-free survival (RFS) in both the training (P < 0.001) and internal validation (P = 0.016) cohorts. The XGB-based fusion model using multi-DECT radiomics could serve as a non-invasive tool to predict TB grade and RFS in patients with CRC preoperatively.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Multi-scale machine learning model predicts muscle and functional disease progression.

Blemker SS, Riem L, DuCharme O, Pinette M, Costanzo KE, Weatherley E, Statland J, Tapscott SJ, Wang LH, Shaw DWW, Song X, Leung D, Friedman SD

•papers•Jul 16 2025

Facioscapulohumeral muscular dystrophy (FSHD) is a genetic neuromuscular disorder characterized by progressive muscle degeneration with substantial variability in severity and progression patterns. FSHD is a highly heterogeneous disease; however, current clinical metrics used for tracking disease progression lack sensitivity for personalized assessment, which greatly limits the design and execution of clinical trials. This study introduces a multi-scale machine learning framework leveraging whole-body magnetic resonance imaging (MRI) and clinical data to predict regional, muscle, joint, and functional progression in FSHD. The goal this work is to create a 'digital twin' of individual FSHD patients that can be leveraged in clinical trials. Using a combined dataset of over 100 patients from seven studies, MRI-derived metrics-including fat fraction, lean muscle volume, and fat spatial heterogeneity at baseline-were integrated with clinical and functional measures. A three-stage random forest model was developed to predict annualized changes in muscle composition and a functional outcome (timed up-and-go (TUG)). All model stages revealed strong predictive performance in separate holdout datasets. After training, the models predicted fat fraction change with a root mean square error (RMSE) of 2.16% and lean volume change with a RMSE of 8.1 ml in a holdout testing dataset. Feature analysis revealed that metrics of fat heterogeneity within muscle predicts muscle-level progression. The stage 3 model, which combined functional muscle groups, predicted change in TUG with a RMSE of 0.6 s in the holdout testing dataset. This study demonstrates the machine learning models incorporating individual muscle and performance data can effectively predict MRI disease progression and functional performance of complex tasks, addressing the heterogeneity and nonlinearity inherent in FSHD. Further studies incorporating larger longitudinal cohorts, as well as comprehensive clinical and functional measures, will allow for expanding and refining this model. As many neuromuscular diseases are characterized by variability and heterogeneity similar to FSHD, such approaches have broad applicability.

MRI Registration Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Scaling Chest X-ray Foundation Models from Mixed Supervisions for Dense Prediction.

Wang F, Yu L

•papers•Jul 16 2025

Foundation models have significantly revolutionized the field of chest X-ray diagnosis with their ability to transfer across various diseases and tasks. However, previous works have predominantly utilized self-supervised learning from medical image-text pairs, which falls short in dense medical prediction tasks due to their sole reliance on such coarse pair supervision, thereby limiting their applicability to detailed diagnostics. In this paper, we introduce a Dense Chest X-ray Foundation Model (DCXFM), which utilizes mixed supervision types (i.e., text, label, and segmentation masks) to significantly enhance the scalability of foundation models across various medical tasks. Our model involves two training stages: we first employ a novel self-distilled multimodal pretraining paradigm to exploit text and label supervision, along with local-to-global self-distillation and soft cross-modal contrastive alignment strategies to enhance localization capabilities. Subsequently, we introduce an efficient cost aggregation module, comprising spatial and class aggregation mechanisms, to further advance dense prediction tasks with densely annotated datasets. Comprehensive evaluations on three tasks (phrase grounding, zero-shot semantic segmentation, and zero-shot classification) demonstrate DCXFM's superior performance over other state-of-the-art medical image-text pretraining models. Remarkably, DCXFM exhibits powerful zero-shot capabilities across various datasets in phrase grounding and zero-shot semantic segmentation, underscoring its superior generalization in dense prediction tasks.

X-Ray Segmentation Chest Methodology In Silico Academic Lab Benchmark SOTA

Evaluating Artificial Intelligence-Assisted Prostate Biparametric MRI Interpretation: An International Multireader Study.

Gelikman DG, Yilmaz EC, Harmon SA, Huang EP, An JY, Azamat S, Law YM, Margolis DJA, Marko J, Panebianco V, Esengur OT, Lin Y, Belue MJ, Gaur S, Bicchetti M, Xu Z, Tetreault J, Yang D, Xu D, Lay NS, Gurram S, Shih JH, Merino MJ, Lis R, Choyke PL, Wood BJ, Pinto PA, Turkbey B

•papers•Jul 16 2025

Background: Variability in prostate biparametric MRI (bpMRI) interpretation limits diagnostic reliability for prostate cancer (PCa). Artificial intelligence (AI) has potential to reduce this variability and improve diagnostic accuracy. Objective: The objective of this study was to evaluate impact of a deep learning AI model on lesion- and patient-level clinically significant PCa (csPCa) and PCa detection rates and interreader agreement in bpMRI interpretations. Methods: This retrospective, multireader, multicenter study used a balanced incomplete block design for MRI randomization. Six radiologists of varying experience interpreted bpMRI scans with and without AI assistance in alternating sessions. The reference standard for lesion-level detection for cases was whole-mount pathology after radical prostatectomy; for control patients, negative 12-core systematic biopsies. In all, 180 patients (120 in the case group, 60 in the control group) who underwent mpMRI and prostate biopsy or radical prostatectomy between January 2013 and December 2022 were included. Lesion-level sensitivity, PPV, patient-level AUC for csPCa and PCa detection, and interreader agreement in lesion-level PI-RADS scores and size measurements were assessed. Results: AI assistance improved lesion-level PPV (PI-RADS ≥ 3: 77.2% [95% CI, 71.0-83.1%] vs 67.2% [61.1-72.2%] for csPCa; 80.9% [75.2-85.7%] vs 69.4% [63.4-74.1%] for PCa; both p < .001), reduced lesion-level sensitivity (PIRADS ≥ 3: 44.4% [38.6-50.5%] vs 48.0% [42.0-54.2%] for csPCa, p = .01; 41.7% [37.0-47.4%] vs 44.9% [40.5-50.2%] for PCa, p = .01), and no difference in patient-level AUC (0.822 [95% CI, 0.768-0.866] vs 0.832 [0.787-0.868] for csPCa, p = .61; 0.833 [0.782-0.874] vs 0.835 [0.792-0.871] for PCa, p = .91). AI assistance improved interreader agreement for lesion-level PI-RADS scores (κ = 0.748 [95% CI, 0.701-0.796] vs 0.336 [0.288-0.381], p < .001), lesion size measurements (coverage probability of 0.397 [0.376-0.419] vs 0.367 [0.349-0.383], p < .001), and patient-level PI-RADS scores (κ = 0.704 [0.627-0.767] versus 0.507 [0.421-0.584], p < .001). Conclusion: AI improved lesion-level PPV and interreader agreement with slightly lower lesion-level sensitivity. Clinical Impact: AI may enhance consistency and reduce false-positives in bpMRI interpretations. Further optimization is required to improve sensitivity without compromising specificity.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Comparative Analysis of CNN Performance in Keras, PyTorch and JAX on PathMNIST

Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis

Site-Level Fine-Tuning with Progressive Layer Freezing: Towards Robust Prediction of Bronchopulmonary Dysplasia from Day-1 Chest Radiographs in Extremely Preterm Infants

Hybrid Ensemble Approaches: Optimal Deep Feature Fusion and Hyperparameter-Tuned Classifier Ensembling for Enhanced Brain Tumor Classification

Benchmarking and Explaining Deep Learning Cortical Lesion MRI Segmentation in Multiple Sclerosis

Identifying Signatures of Image Phenotypes to Track Treatment Response in Liver Disease

Multi-DECT image-based radiomics with interpretable machine learning for preoperative prediction of tumor budding grade and prognosis in colorectal cancer: a dual-center study.

Multi-scale machine learning model predicts muscle and functional disease progression.

Scaling Chest X-ray Foundation Models from Mixed Supervisions for Dense Prediction.

Evaluating Artificial Intelligence-Assisted Prostate Biparametric MRI Interpretation: An International Multireader Study.

Ready to Sharpen Your Edge?