Latest Papers on Radiology AI. Tags: Benchmark SOTA

FractMorph: A Fractional Fourier-Based Multi-Domain Transformer for Deformable Image Registration

Shayan Kebriti, Shahabedin Nabavi, Ali Gooya

•preprint•Aug 17 2025

Deformable image registration (DIR) is a crucial and challenging technique for aligning anatomical structures in medical images and is widely applied in diverse clinical applications. However, existing approaches often struggle to capture fine-grained local deformations and large-scale global deformations simultaneously within a unified framework. We present FractMorph, a novel 3D dual-parallel transformer-based architecture that enhances cross-image feature matching through multi-domain fractional Fourier transform (FrFT) branches. Each Fractional Cross-Attention (FCA) block applies parallel FrFTs at fractional angles of 0{\deg}, 45{\deg}, 90{\deg}, along with a log-magnitude branch, to effectively extract local, semi-global, and global features at the same time. These features are fused via cross-attention between the fixed and moving image streams. A lightweight U-Net style network then predicts a dense deformation field from the transformer-enriched features. On the ACDC cardiac MRI dataset, FractMorph achieves state-of-the-art performance with an overall Dice Similarity Coefficient (DSC) of 86.45%, an average per-structure DSC of 75.15%, and a 95th-percentile Hausdorff distance (HD95) of 1.54 mm on our data split. We also introduce FractMorph-Light, a lightweight variant of our model with only 29.6M parameters, which maintains the superior accuracy of the main model while using approximately half the memory. Our results demonstrate that multi-domain spectral-spatial attention in transformers can robustly and efficiently model complex non-rigid deformations in medical images using a single end-to-end network, without the need for scenario-specific tuning or hierarchical multi-scale networks. The source code of our implementation is available at https://github.com/shayankebriti/FractMorph.

MRI Registration Cardiac Methodology In Silico Academic Lab Open Code Benchmark SOTA

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

Yucheng Tang, Pawel Rajwa, Alexander Ng, Yipei Wang, Wen Yan, Natasha Thorley, Aqua Asif, Clare Allen, Louise Dickinson, Francesco Giganti, Shonit Punwani, Daniel C. Alexander, Veeru Kasivisvanathan, Yipeng Hu

•preprint•Aug 16 2025

Foundation models in medical imaging have shown promising label efficiency, achieving high downstream performance with only a fraction of annotated data. Here, we evaluate this in prostate multiparametric MRI using ProFound, a domain-specific vision foundation model pretrained on large-scale prostate MRI datasets. We investigate how variable image quality affects label-efficient finetuning by measuring the generalisability of finetuned models. Experiments systematically vary high-/low-quality image ratios in finetuning and evaluation sets. Our findings indicate that image quality distribution and its finetune-and-test mismatch significantly affect model performance. In particular: a) Varying the ratio of high- to low-quality images between finetuning and test sets leads to notable differences in downstream performance; and b) The presence of sufficient high-quality images in the finetuning set is critical for maintaining strong performance, whilst the importance of matched finetuning and testing distribution varies between different downstream tasks, such as automated radiology reporting and prostate cancer detection.When quality ratios are consistent, finetuning needs far less labeled data than training from scratch, but label efficiency depends on image quality distribution. Without enough high-quality finetuning data, pretrained models may fail to outperform those trained without pretraining. This highlights the importance of assessing and aligning quality distributions between finetuning and deployment, and the need for quality standards in finetuning data for specific downstream tasks. Using ProFound, we show the value of quantifying image quality in both finetuning and deployment to fully realise the data and compute efficiency benefits of foundation models.

MRI Classification Abdominal Methodology In Silico Academic Lab Benchmark SOTA GenAI

Point-of-Care Ultrasound Imaging for Automated Detection of Abdominal Haemorrhage: A Systematic Review.

Zgool T, Antico M, Edwards C, Fontanarosa D

•papers•Aug 16 2025

Abdominal haemorrhage is a life-threatening condition requiring prompt detection to enable timely intervention. Conventional ultrasound (US) is widely used but is highly operator-dependent, limiting its reliability outside clinical settings. In anatomical regions, in particular Morison's Pouch, US provides a higher detection reliability due to the preferential accumulation of free fluid in dependent areas. Recent advancements in artificial intelligence (AI)-integrated point-of-care US (POCUS) systems show promise for use in emergency, pre-hospital, military, and resource-limited environments. This systematic review evaluates the performance of AI-driven POCUS systems for detecting and estimating abdominal haemorrhage. A systematic search of Scopus, PubMed, EMBASE, and Web of Science (2014-2024) identified seven studies with sample sizes ranging from 94 to 6608 images and patient numbers ranging between 78 and 864 trauma patients. AI models, including YOLOv3, U-Net, and ResNet50, demonstrated high diagnostic accuracy, with sensitivity ranging from 88% to 98% and specificity from 68% to 99%. Most studies utilized 2D US imaging and conducted internal validation, typically employing systems such as the Philips Lumify and Mindray TE7. Model performance was predominantly assessed using internal datasets, wherein training and evaluation were performed on the same dataset. Of particular note, only one study validated its model on an independent dataset obtained from a different clinical setting. This limited use of external validation restricts the ability to evaluate the applicability of AI models across diverse populations and varying imaging conditions. Moreover, the Focused Assessment with Sonography in Trauma (FAST) is a protocol drive US method for detecting free fluid in the abdominal cavity, primarily in trauma cases. However, while it is commonly used to assess the right upper quadrant, particularly Morison's pouch, which is gravity-dependent and sensitive for early haemorrhage its application to other abdominal regions, such as the left upper quadrant and pelvis, remains underexplored. This is clinically significant, as fluid may preferentially accumulate in these areas depending on the mechanism of injury, patient positioning, or time since trauma, underscoring the need for broader anatomical coverage in AI applications. Researchers aiming to address the current reliance on 2D imaging and the limited use of external validation should focus future studies on integrating 3D imaging and utilising diverse, multicentre datasets to improve the reliability and generalizability of AI-driven POCUS systems for haemorrhage detection in trauma care.

Ultrasound Detection Abdominal Review In Silico Benchmark SOTA

Improving skull-stripping for infant MRI via weakly supervised domain adaptation using adversarial learning.

Omidi A, Shamaei A, Aktar M, King R, Leijser L, Souza R

•papers•Aug 16 2025

Skull-stripping is an essential preprocessing step in the analysis of brain Magnetic Resonance Imaging (MRI). While deep learning-based methods have shown success with this task, strong domain shifts between adult and newborn brain MR images complicate model transferability. We previously developed unsupervised domain adaptation techniques to address the domain shift between these data, without requiring newborn MRI data to be labeled. In this work, we build upon our previous domain adaptation framework by extensively expanding the training and validation datasets using weakly labeled newborn MRI scans from the Developing Human Connectome Project (dHCP), our private newborn dataset, and synthetic data generated by a Gaussian Mixture Model (GMM). While the core model architecture remains similar, we focus on validating the model's generalization across four diverse domains, adult, synthetic, public newborn, and private newborn MRI, demonstrating improved performance and robustness over our prior methods. These results highlight the impact of incorporating broader training data under weak supervision for newborn brain imaging analysis. The experimental results reveal that our proposed approach outperforms our previous work achieving a Dice coefficient of 0.9509±0.0055 and a Hausdorff distance of 3.0883±0.1833 for newborn MRI data, surpassing state-of-the-art models such as SynthStrip (Dice =0.9412±0.0063, Hausdorff =3.1570±0.1389). These results reveal that including weakly labeled newborn data results in improvements in model performance and generalization and is useful for newborn brain imaging analysis. Our code is available at: https://github.com/abbasomidi77/Weakly-Supervised-DAUnet.

MRI Segmentation Neurological Methodology In Silico Academic Lab Open Code Benchmark SOTA

An interpretable CT-based deep learning model for predicting overall survival in patients with bladder cancer: a multicenter study.

Zhang M, Zhao Y, Hao D, Song Y, Lin X, Hou F, Huang Y, Yang S, Niu H, Lu C, Wang H

•papers•Aug 16 2025

Predicting the prognosis of bladder cancer remains challenging despite standard treatments. We developed an interpretable bladder cancer deep learning (BCDL) model using preoperative CT scans to predict overall survival. The model was trained on a cohort (n = 765) and validated in three independent cohorts (n = 438; n = 181; n = 72). The BCDL model outperformed other models in survival risk prediction, with the SHapley Additive exPlanation method identifying pixel-level features contributing to predictions. Patients were stratified into high- and low-risk groups using deep learning score cutoff. Adjuvant therapy significantly improved overall survival in high-risk patients (p = 0.028) and women in the low-risk group (p = 0.046). RNA sequencing analysis revealed differential gene expression and pathway enrichment between risk groups, with high-risk patients exhibiting an immunosuppressive microenvironment and altered microbial composition. Our BCDL model accurately predicts survival risk and supports personalized treatment strategies for improved clinical decision-making.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Diagnostic performance of deep learning for predicting glioma isocitrate dehydrogenase and 1p/19q co-deletion in MRI: a systematic review and meta-analysis.

Farahani S, Hejazi M, Tabassum M, Di Ieva A, Mahdavifar N, Liu S

•papers•Aug 16 2025

We aimed to evaluate the diagnostic performance of deep learning (DL)-based radiomics models for the noninvasive prediction of isocitrate dehydrogenase (IDH) mutation and 1p/19q co-deletion status in glioma patients using MRI sequences, and to identify methodological factors influencing accuracy and generalizability. Following PRISMA guidelines, we systematically searched major databases (PubMed, Scopus, Embase, Web of Science, and Google Scholar) up to March 2025, screening studies that utilized DL to predict IDH and 1p/19q co-deletion status from MRI data. We assessed study quality and risk of bias using the Radiomics Quality Score and the QUADAS-2 tool. Our meta-analysis employed a bivariate model to compute pooled sensitivity and specificity, and meta-regression to assess interstudy heterogeneity. Among the 1517 unique publications, 104 were included in the qualitative synthesis, and 72 underwent meta-analysis. Pooled estimates for IDH prediction in test cohorts yielded a sensitivity of 0.80 (95% CI: 0.77-0.83) and specificity of 0.85 (95% CI: 0.81-0.87). For 1p/19q co-deletion, sensitivity was 0.75 (95% CI: 0.65-0.82) and specificity was 0.82 (95% CI: 0.75-0.88). Meta-regression identified the tumor segmentation method and the extent of DL integration into the radiomics pipeline as significant contributors to interstudy variability. Although DL models demonstrate strong potential for noninvasive molecular classification of gliomas, clinical translation requires several critical steps: harmonization of multi-center MRI data using techniques such as histogram matching and DL-based style transfer; adoption of standardized and automated segmentation protocols; extensive multi-center external validation; and prospective clinical validation. Question Can DL based radiomics using routine MRI noninvasively predict IDH mutation and 1p/19q co-deletion status in gliomas, and what factors affect diagnostic accuracy? Findings Meta-analysis showed 80% sensitivity and 85% specificity for predicting IDH mutation, and 75% sensitivity and 82% specificity for 1p/19q co-deletion status. Clinical relevance MRI-based DL models demonstrate clinically useful accuracy for noninvasive glioma molecular classification, but data harmonization, standardized automated segmentation, and rigorous multi-center external validation are essential for clinical adoption.

MRI Classification Neurological Meta Analysis In Silico Benchmark SOTA

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

•preprint•Aug 16 2025

Foundation models in medical imaging have shown promising label efficiency, achieving high performance on downstream tasks using only a fraction of the annotated data otherwise required. In this study, we evaluate this potential in the context of prostate multiparametric MRI using ProFound, a recently developed domain-specific vision foundation model pretrained on large-scale prostate MRI datasets. We investigate the impact of variable image quality on the label-efficient finetuning, by quantifying the generalisability of the finetuned models. We conduct a comprehensive set of experiments by systematically varying the ratios of high- and low-quality images in the finetuning and evaluation sets. Our findings indicate that image quality distribution and its finetune-and-test mismatch significantly affect model performance. In particular: a) Varying the ratio of high- to low-quality images between finetuning and test sets leads to notable differences in downstream performance; and b) The presence of sufficient high-quality images in the finetuning set is critical for maintaining strong performance, whilst the importance of matched finetuning and testing distribution varies between different downstream tasks, such as automated radiology reporting and prostate cancer detection. Importantly, experimental results also show that, although finetuning requires significantly less labeled data compared to training from scratch when the quality ratio is consistent, this label efficiency is not independent of the image quality distribution. For example, we show cases that, without sufficient high-quality images in finetuning, finetuned models may fail to outperform those without pretraining.

MRI Classification Abdominal Methodology In Silico Benchmark SOTA

Deep learning-based identification of necrosis and microvascular proliferation in adult diffuse gliomas from whole-slide images

Guo, Y., Huang, H., Liu, X., Zou, W., Qiu, F., Liu, Y., Chai, R., Jiang, T., Wang, J.

•preprint•Aug 16 2025

For adult diffuse gliomas (ADGs), most grading can be achieved through molecular subtyping, retaining only two key histopathological features for high-grade glioma (HGG): necrosis (NEC) and microvascular proliferation (MVP). We developed a deep learning (DL) framework to automatically identify and characterize these features. We trained patch-level models to detect and quantify NEC and MVP using a dataset that employed active learning, incorporating patches from 621 whole-slide images (WSIs) from the Chinese Glioma Genome Atlas (CGGA). Utilizing trained patch-level models, we effectively integrated the predicted outcomes and positions of individual patches within WSIs from The Cancer Genome Atlas (TCGA) cohort to form datasets. Subsequently, we introduced a patient-level model, named PLNet (Probability Localization Network), which was trained on these datasets to facilitate patient diagnosis. We also explored the subtypes of NEC and MVP based on the features extracted from patch-level models with clustering process applied on all positive patches. The patient-level models demonstrated exceptional performance, achieving an AUC of 0.9968, 0.9995 and AUPRC of 0.9788, 0.9860 for NEC and MVP, respectively. Compared to pathological reports, our patient-level models achieved the accuracy of 88.05% for NEC and 90.20% for MVP, along with a sensitivity of 73.68% and 77%. When sensitivity was set at 80%, the accuracy for NEC reached 79.28% and for MVP reached 77.55%. DL models enabled more efficient and accurate histopathological image analysis which will aid traditional glioma diagnosis. Clustering-based analyses utilizing features extracted from patch-level models could further investigate the subtypes of NEC and MVP.

Mixed Modality Detection Neurological Methodology In Silico Academic Lab Benchmark SOTA

Determination of Skeletal Age From Hand Radiographs Using Deep Learning.

Bram JT, Pareek A, Beber SA, Jones RH, Shariatnia MM, Daliliyazdi A, Tracey OC, Green DW, Fabricant PD

•papers•Aug 15 2025

Surgeons treating skeletally immature patients use skeletal age to determine appropriate surgical strategies. Traditional bone age estimation methods utilizing hand radiographs are time-consuming. To develop highly accurate/reliable deep learning (DL) models for determination of accurate skeletal age from hand radiographs. Cohort Study. The authors utilized 3 publicly available hand radiograph data sets for model development/validation from (1) the Radiological Society of North America (RSNA), (2) the Radiological Hand Pose Estimation (RHPE) data set, and (3) the Digital Hand Atlas (DHA). All 3 data sets report corresponding sex and skeletal age. The RHPE and DHA also contain chronological age. After image preprocessing, a ConvNeXt model was trained first on the RSNA data set using sex/skeletal age as inputs using 5-fold cross-validation, with subsequent training on the RHPE with addition of chronological age. Final model validation was performed on the DHA and an institutional data set of 200 images. The first model, trained on the RSNA, achieved a mean absolute error (MAE) of 3.68 months on the RSNA test set and 5.66 months on the DHA. This outperformed the 4.2 months achieved on the RSNA test set by the best model from previous work (12.4% improvement) and 3.9 months by the open-source software Deeplasia (5.6% improvement). After incorporation of chronological age from the RHPE in model 2, this error improved to an MAE of 4.65 months on the DHA, again surpassing the best previously published models (19.8% improvement). Leveraging newer DL technologies trained on >20,000 hand radiographs across 3 distinct, diverse data sets, this study developed a robust model for predicting bone age. Utilizing features extracted from an RSNA model, combined with chronological age inputs, this model outperforms previous state-of-the-art models when applied to validation data sets. These results indicate that the models provide a highly accurate/reliable platform for clinical use to improve confidence about appropriate surgical selection (eg, physeal-sparing procedures) and time savings for orthopaedic surgeons/radiologists evaluating skeletal age. Development of an accurate DL model for determination of bone age from the hand reduces the time required for age estimation. Additionally, streamlined skeletal age estimation can aid practitioners in determining optimal treatment strategies and may be useful in research settings to decrease workload and improve reporting.

X-Ray Registration Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Rahman MM, Masry ME, Gnyawali SC, Xue Y, Gordillo G, Wachs JP

•papers•Aug 15 2025

Burn injuries represent a significant clinical challenge due to the complexity of accurately assessing burn depth, which directly influences the course of treatment and patient outcomes. Traditional diagnostic methods primarily rely on visual inspection by experienced burn surgeons. Studies report diagnostic accuracies of around 76% for experts, dropping to nearly 50% for less experienced clinicians. Such inaccuracies can result in suboptimal clinical decisions-delaying vital surgical interventions in severe cases or initiating unnecessary treatments for superficial burns. This diagnostic variability not only compromises patient care but also strains health care resources and increases the likelihood of adverse outcomes. Hence, a more consistent and precise approach to burn classification is urgently needed. The objective is to determine whether a multimodal integrated artificial intelligence (AI) system for accurate classification of burn depth can preserve diagnostic accuracy and provide an important resource when used as part of the electronic medical record (EMR). This study used a novel multimodal AI system, integrating digital photographs and ultrasound tissue Doppler imaging (TDI) data to accurately assess burn depth. These imaging modalities were accessed and processed through an EMR system, enabling real-time data retrieval and AI-assisted evaluation. TDI was instrumental in evaluating the biomechanical properties of subcutaneous tissues, using color-coded images to identify burn-induced changes in tissue stiffness and elasticity. The collected imaging data were uploaded to the EMR system (DrChrono), where they were processed by a vision-language model built on GPT-4 architecture. This model received expert-formulated prompts describing how to interpret both digital and TDI images, guiding the AI in making explainable classifications. This study evaluated whether a multimodal AI classifier, designed to identify first-, second-, and third-degree burns, could be effectively applied to imaging data stored within an EMR system. The classifier achieved an overall accuracy of 84.38%, significantly surpassing human performance benchmarks typically cited in the literature. This highlights the potential of the AI model to serve as a robust clinical decision support tool, especially in settings lacking highly specialized expertise. In addition to accuracy, the classifier demonstrated strong performance across multiple evaluation metrics. The classifier's ability to distinguish between burn severities was further validated by the area under the receiver operating characteristic: 0.97 for first-degree, 0.96 for second-degree, and a perfect 1.00 for third-degree burns, each with narrow 95% CIs. The storage of multimodal imaging data within the EMR, along with the ability for post hoc analysis by AI algorithms, offers significant advancements in burn care, enabling real-time burn depth prediction on currently available data. Using digital photos for superficial burns, easily diagnosed through physical examinations, reduces reliance on TDI, while TDI helps distinguish deep second- and third-degree burns, enhancing diagnostic efficiency.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

Filter Papers

Tags

FractMorph: A Fractional Fourier-Based Multi-Domain Transformer for Deformable Image Registration

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

Point-of-Care Ultrasound Imaging for Automated Detection of Abdominal Haemorrhage: A Systematic Review.

Improving skull-stripping for infant MRI via weakly supervised domain adaptation using adversarial learning.

An interpretable CT-based deep learning model for predicting overall survival in patients with bladder cancer: a multicenter study.

Diagnostic performance of deep learning for predicting glioma isocitrate dehydrogenase and 1p/19q co-deletion in MRI: a systematic review and meta-analysis.

Impact of Clinical Image Quality on Efficient Foundation Model Finetuning

Deep learning-based identification of necrosis and microvascular proliferation in adult diffuse gliomas from whole-slide images

Determination of Skeletal Age From Hand Radiographs Using Deep Learning.

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Ready to Sharpen Your Edge?