Latest Papers on Radiology AI. Tags: Benchmark SOTA

Dual-Parallel Artificial Intelligence Framework for Breast Cancer Grading via High-Intensity Ultrasound and Biomarkers.

Parwekar P, Agrawal KK, Ali J, Gundagatti S, Rajpoot DS, Ahmed T, Vidyarthi A

•papers•Oct 1 2025

Background: Accurate and noninvasive breast cancer grading and therapy monitoring remain critical challenges in oncology. Traditional methods often rely on invasive histopathological assessments or imaging-only techniques, which may not fully capture the molecular and morphological intricacies of tumor response. Method: This article presents a novel, noninvasive framework for breast cancer analysis and therapy monitoring that combines two parallel mechanisms: (1) a dual-stream convolutional neural network (CNN) processing high-intensity ultrasound images, and (2) a biomarker-aware CNN stream utilizing patient-specific breast cancer biomarkers, including carbohydrate antigen 15-3, carcinoembryonic antigen, and human epidermal growth factor receptor 2 levels. The imaging stream extracts spatial and morphological features, while the biomarker stream encodes quantitative molecular indicators, enabling a multimodal understanding of tumor characteristics. The outputs from both streams are fused to predict the cancer grade (G1-G3) with high reliability. Results: Experimental evaluation on a cohort of pre- and postchemotherapy patients demonstrated the effectiveness of the proposed approach, achieving an overall grading accuracy of 97.8%, with an area under the curve of 0.981 for malignancy classification. The model also enables quantitative post-therapy analysis, revealing an average tumor response improvement of 41.3% across the test set, as measured by predicted regression in grade and changes in biomarker-imaging correlation. Conclusions: This dual-parallel artificial intelligence strategy offers a promising noninvasive alternative to traditional histopathological and imaging-alone methods, supporting real-time cancer monitoring and personalized treatment evaluation. The integration of high-resolution imaging with biomolecular data significantly enhances diagnostic depth, paving the way for intelligent, patient-specific breast cancer management.

Ultrasound Classification Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Real-Time Deep-Learning Image Reconstruction and Instrument Tracking in MR-Guided Biopsies.

Noordman CR, Te Molder LPW, Maas MC, Overduin CG, Fütterer JJ, Huisman HJ

•papers•Oct 1 2025

Transrectal in-bore MR-guided biopsy (MRGB) is accurate but time-consuming, limiting clinical throughput. Faster imaging could improve workflow and enable real-time instrument tracking. Existing acceleration methods often use simulated data and lack validation in clinical settings. To accelerate MRGB by using deep learning for undersampled image reconstruction and instrument tracking, trained on multi-slice MR DICOM images and evaluated on raw k-space acquisitions. Prospective feasibility study. Briefly, 1289 male patients (aged 44-87, median age 68) for model training, 8 male patients (aged 59-78, median age 65) for prospective feasibility testing. 2D Cartesian balanced steady-state free precession, 3 T. Segmentation and reconstruction models were trained on 8464 MRGB confirmation scans containing a biopsy needle guide instrument and evaluated on 10 prospectively acquired dynamic k-space samples. Needle guide tracking accuracy was assessed using instrument tip prediction (ITP) error, computed per frame as the Euclidean distance from reference positions defined via pre- and post-movement scans. Feasibility was measured by the proportion of frames with < 5 mm error. Additional experiments tested model robustness under increasing undersampling rates. In a segmentation validation experiment, a one-sample t-test tested if the mean ITP error was below 5 mm. Statistical significance was defined as p < 0.05. In the tracking experiments, the mean, standard deviation, and Wilson 95% CI of the ITP success rate were computed per sample, across undersampling levels. ITP was first evaluated independently on 201 fully sampled scans, yielding an ITP error of 1.55 ± 1.01 mm (95% CI: 1.41-1.69). Tracking performance was assessed across increasing undersampling factors, achieving high ITP success rates from 97.5% ± 5.8% (68.8%-99.9%) at 8× up to 92.5% ± 10.3% (62.5%-98.9%) at 16× undersampling. Performance declined at 18×, dropping to 74.6% ± 33.6% (43.8%-91.7%). Results confirm stable needle guide tip prediction accuracy and support the robustness of the reconstruction model for tracking at high undersampling. 2. Stage 2.

MRI Reconstruction Abdominal Prospective Clinical Pilot Academic Lab Benchmark SOTA

An interpretable hybrid deep learning framework for gastric cancer diagnosis using histopathological imaging.

Ren T, Govindarajan V, Bourouis S, Wang X, Ke S

•papers•Oct 1 2025

The increasing incidence of gastric cancer and the complexity of histopathological image interpretation present significant challenges for accurate and timely diagnosis. Manual assessments are often subjective and time-intensive, leading to a growing demand for reliable, automated diagnostic tools in digital pathology. This study proposes a hybrid deep learning approach combining convolutional neural networks (CNNs) and Transformer-based architectures to classify gastric histopathological images with high precision. The model is designed to enhance feature representation and spatial contextual understanding, particularly across diverse tissue subtypes and staining variations. Three publicly available datasets-GasHisSDB, TCGA-STAD, and NCT-CRC-HE-100 K-were utilized to train and evaluate the model. Image patches were preprocessed through stain normalization, augmented using standard techniques, and fed into the hybrid model. The CNN backbone extracts local spatial features, while the Transformer encoder captures global context. Performance was assessed using fivefold cross-validation and evaluated through accuracy, F1-score, AUC, and Grad-CAM-based interpretability. The proposed model achieved a 99.2% accuracy on the GasHisSDB dataset, with a macro F1-score of 0.991 and AUC of 0.996. External validation on TCGA-STAD and NCT-CRC-HE-100 K further confirmed the model's robustness. Grad-CAM visualizations highlighted biologically relevant regions, demonstrating interpretability and alignment with expert annotations. This hybrid deep learning framework offers a reliable, interpretable, and generalizable tool for gastric cancer diagnosis. Its superior performance and explainability highlight its clinical potential for deployment in digital pathology workflows.

OCT Classification Abdominal Methodology In Silico Benchmark SOTA

From 2D to 3D, Deep Learning-based Shape Reconstruction in Magnetic Resonance Imaging: A Review

Emma McMillian, Abhirup Banerjee, Alfonso Bueno-Orovio

•preprint•Oct 1 2025

Deep learning-based 3-dimensional (3D) shape reconstruction from 2-dimensional (2D) magnetic resonance imaging (MRI) has become increasingly important in medical disease diagnosis, treatment planning, and computational modeling. This review surveys the methodological landscape of 3D MRI reconstruction, focusing on 4 primary approaches: point cloud, mesh-based, shape-aware, and volumetric models. For each category, we analyze the current state-of-the-art techniques, their methodological foundation, limitations, and applications across anatomical structures. We provide an extensive overview ranging from cardiac to neurological to lung imaging. We also focus on the clinical applicability of models to diseased anatomy, and the influence of their training and testing data. We examine publicly available datasets, computational demands, and evaluation metrics. Finally, we highlight the emerging research directions including multimodal integration and cross-modality frameworks. This review aims to provide researchers with a structured overview of current 3D reconstruction methodologies to identify opportunities for advancing deep learning towards more robust, generalizable, and clinically impactful solutions.

MRI Reconstruction Review Concept Benchmark SOTA Open Dataset

Deep learning motion correction of quantitative stress perfusion cardiovascular magnetic resonance

Noortje I. P. Schueler, Nathan C. K. Wong, Richard J. Crawley, Josien P. W. Pluim, Amedeo Chiribiri, Cian M. Scannell

•preprint•Oct 1 2025

Background: Quantitative stress perfusion cardiovascular magnetic resonance (CMR) is a powerful tool for assessing myocardial ischemia. Motion correction is essential for accurate pixel-wise mapping but traditional registration-based methods are slow and sensitive to acquisition variability, limiting robustness and scalability. Methods: We developed an unsupervised deep learning-based motion correction pipeline that replaces iterative registration with efficient one-shot estimation. The method corrects motion in three steps and uses robust principal component analysis to reduce contrast-related effects. It aligns the perfusion series and auxiliary images (arterial input function and proton density-weighted series). Models were trained and validated on multivendor data from 201 patients, with 38 held out for testing. Performance was assessed via temporal alignment and quantitative perfusion values, compared to a previously published registration-based method. Results: The deep learning approach significantly improved temporal smoothness of time-intensity curves (p<0.001). Myocardial alignment (Dice = 0.92 (0.04) and 0.91 (0.05)) was comparable to the baseline and superior to before registration (Dice = 0.80 (0.09), p<0.001). Perfusion maps showed reduced motion, with lower standard deviation in the myocardium (0.52 (0.39) ml/min/g) compared to baseline (0.55 (0.44) ml/min/g). Processing time was reduced 15-fold. Conclusion: This deep learning pipeline enables fast, robust motion correction for stress perfusion CMR, improving accuracy across dynamic and auxiliary images. Trained on multivendor data, it generalizes across sequences and may facilitate broader clinical adoption of quantitative perfusion imaging.

MRI Registration Cardiac Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

Ran Tong, Jiaqi Liu, Su Liu, Jiexi Xu, Lanruo Wang, Tong Wang

•preprint•Oct 1 2025

The accurate interpretation of chest radiographs using automated methods is a critical task in medical imaging. This paper presents a comparative analysis between a supervised lightweight Convolutional Neural Network (CNN) and a state-of-the-art, zero-shot medical Vision-Language Model (VLM), BiomedCLIP, across two distinct diagnostic tasks: pneumonia detection on the PneumoniaMNIST benchmark and tuberculosis detection on the Shenzhen TB dataset. Our experiments show that supervised CNNs serve as highly competitive baselines in both cases. While the default zero-shot performance of the VLM is lower, we demonstrate that its potential can be unlocked via a simple yet crucial remedy: decision threshold calibration. By optimizing the classification threshold on a validation set, the performance of BiomedCLIP is significantly boosted across both datasets. For pneumonia detection, calibration enables the zero-shot VLM to achieve a superior F1-score of 0.8841, surpassing the supervised CNN's 0.8803. For tuberculosis detection, calibration dramatically improves the F1-score from 0.4812 to 0.7684, bringing it close to the supervised baseline's 0.7834. This work highlights a key insight: proper calibration is essential for leveraging the full diagnostic power of zero-shot VLMs, enabling them to match or even outperform efficient, task-specific supervised models.

X-Ray Classification Chest Methodology In Silico Benchmark SOTA

U-DFA: A Unified DINOv2-Unet with Dual Fusion Attention for Multi-Dataset Medical Segmentation

Zulkaif Sajjad, Furqan Shaukat, Junaid Mir

•preprint•Oct 1 2025

Accurate medical image segmentation plays a crucial role in overall diagnosis and is one of the most essential tasks in the diagnostic pipeline. CNN-based models, despite their extensive use, suffer from a local receptive field and fail to capture the global context. A common approach that combines CNNs with transformers attempts to bridge this gap but fails to effectively fuse the local and global features. With the recent emergence of VLMs and foundation models, they have been adapted for downstream medical imaging tasks; however, they suffer from an inherent domain gap and high computational cost. To this end, we propose U-DFA, a unified DINOv2-Unet encoder-decoder architecture that integrates a novel Local-Global Fusion Adapter (LGFA) to enhance segmentation performance. LGFA modules inject spatial features from a CNN-based Spatial Pattern Adapter (SPA) module into frozen DINOv2 blocks at multiple stages, enabling effective fusion of high-level semantic and spatial features. Our method achieves state-of-the-art performance on the Synapse and ACDC datasets with only 33\% of the trainable model parameters. These results demonstrate that U-DFA is a robust and scalable framework for medical image segmentation across multiple modalities.

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Benchmark SOTA

CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?

Darya Taratynova, Ahmed Aly, Numan Saeed, Mohammad Yaqub

•preprint•Oct 1 2025

Foundation models (FMs) are reshaping medical imaging, yet their application in echocardiography remains limited. While several echocardiography-specific FMs have recently been introduced, no standardized benchmark exists to evaluate them. Echocardiography poses unique challenges, including noisy acquisitions, high frame redundancy, and limited public datasets. Most existing solutions evaluate on private data, restricting comparability. To address this, we introduce CardioBench, a comprehensive benchmark for echocardiography FMs. CardioBench unifies eight publicly available datasets into a standardized suite spanning four regression and five classification tasks, covering functional, structural, diagnostic, and view recognition endpoints. We evaluate several leading FM, including cardiac-specific, biomedical, and general-purpose encoders, under consistent zero-shot, probing, and alignment protocols. Our results highlight complementary strengths across model families: temporal modeling is critical for functional regression, retrieval provides robustness under distribution shift, and domain-specific text encoders capture physiologically meaningful axes. General-purpose encoders transfer strongly and often close the gap with probing, but struggle with fine-grained distinctions like view classification and subtle pathology recognition. By releasing preprocessing, splits, and public evaluation pipelines, CardioBench establishes a reproducible reference point and offers actionable insights to guide the design of future echocardiography foundation models.

Ultrasound Classification Cardiac Dataset Release In Silico Academic Lab Benchmark SOTA Open Dataset Reproducibility

AI-Driven CBCT Analysis for Surgical Decision-Making and Mucosal Damage Prediction in Sinus Lift Surgery for patients with low RBH.

Deng Y, He Y, Liu C, Gao Z, Yu S, Cao S, Li C, Zhu Q, Ma P

•papers•Oct 1 2025

Decision-making for maxillary sinus floor elevation (MSFE) surgery in patients with low residual bone height (<4 mm) presents significant challenges, particularly in selecting surgical approaches and predicting intraoperative mucosal perforation. Traditional methods rely heavily on physician experience, lack standardization and objectivity, and often fail to meet the demands of precision medicine. This study aims to build an intelligent decision-making system based on deep learning to optimize surgical selection and predict the risk of mucosal perforation, providing clinicians with a reliable auxiliary tool. This study retrospectively analysed the cone-beam computed tomography imaging data of 79 patients who underwent MSFE and constructed a three-dimensional (3D) deep-learning model based on the overall CT data of the patients for surgical procedure selection and prediction of mucosal perforation. The model innovatively introduced the Convolutional Block Attention Module mechanism and depthwise separable convolution technology to enhance the model's ability to capture spatial features and computational efficiency. The model was rigorously trained and validated on multiple datasets, with visualization achieved through attention heatmaps to improve interpretability. The modified EfficientNet model achieved an F1 score of 0.6 in the procedure decision task of MSFE. For predicting mucosal perforation, the improved ResNet model achieved an accuracy of 0.8485 and an F1-score of 0.7273 on the mixed dataset. In the experimental group, the improved ResNet model achieved an accuracy of 0.8235, a recall of 0.7619, and an F1-score of 0.7302. In the control group, the model also maintained stable performance, with an F1-score of 0.6483. Overall, the 3D convolutional model enhanced the accuracy and stability of mucosal perforation prediction by leveraging the spatial features of cone-beam computed tomography imaging, demonstrating a certain degree of generalization capability. This study is the first to construct a deep learning-based 3D intelligent decision-making model for MSFE. These findings confirm the model's effectiveness in surgical decision-making and in predicting the risk of mucosal perforation. The system provides an objective decision-making basis for clinicians, improves the standardization level of complex case management, and demonstrates potential for clinical application.

CT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Human Readers versus AI-Based Systems in ASPECTS Scoring for Acute Ischemic Stroke: A Systematic Review and Meta-Analysis with Region-Specific Guidance.

Azzam AY, Hadadi I, Al-Shahrani LM, Shanqeeti UA, Alqurqush NA, Alsehli MA, Alali RS, Tammar RS, Morsy MM, Essibayi MA

•papers•Oct 1 2025

The Alberta Stroke Program Early CT Score (ASPECTS) is widely used to evaluate early ischemic changes and guide thrombectomy decisions in acute stroke patients. However, significant interobserver variability in manual ASPECTS assessment presents a challenge. Recent advances in artificial intelligence have enabled the development of automated ASPECTS scoring systems; however, their comparative performance against expert interpretation remains insufficiently studied. We conducted a systematic review and meta-analysis following PRISMA 2020 guidelines. We searched multiple scientific databases for studies comparing automated and manual ASPECTS on Non-Contrast Computed Tomography (NCCT). Interobserver reliability was assessed using pooled interclass correlation coefficients (ICCs). Subgroup analyses were made using software types, reference standards, time windows, and computed tomography-based factors. Eleven studies with a total of 1,976 patients were included. Automated ASPECTS demonstrated good reliability against reference standards (ICC: 0.72), comparable to expert readings (ICC: 0.62). RAPID ASPECTS performed highest (ICC: 0.86), especially for high-stakes decision-making. AI advantages were most significant with thin-slice CT (≤2.5mm; +0.16), intermediate time windows (120-240min; +0.16), and higher NIHSS scores (p=0.026). AI-driven ASPECTS systems perform comparably or even better in some cases than human readers in detecting early ischemic changes, especially in specific scenarios. Strategic utilization focusing on high-impact scenarios and region-specific performance patterns offers better diagnostic accuracy, reduced interpretation times, and better and wiser treatment selection in acute stroke care.

CT Classification Neurological Meta Analysis In Silico Benchmark SOTA

Filter Papers

Tags

Dual-Parallel Artificial Intelligence Framework for Breast Cancer Grading via High-Intensity Ultrasound and Biomarkers.

Real-Time Deep-Learning Image Reconstruction and Instrument Tracking in MR-Guided Biopsies.

An interpretable hybrid deep learning framework for gastric cancer diagnosis using histopathological imaging.

From 2D to 3D, Deep Learning-based Shape Reconstruction in Magnetic Resonance Imaging: A Review

Deep learning motion correction of quantitative stress perfusion cardiovascular magnetic resonance

Does Bigger Mean Better? Comparitive Analysis of CNNs and Biomedical Vision Language Modles in Medical Diagnosis

U-DFA: A Unified DINOv2-Unet with Dual Fusion Attention for Multi-Dataset Medical Segmentation

CardioBench: Do Echocardiography Foundation Models Generalize Beyond the Lab?

AI-Driven CBCT Analysis for Surgical Decision-Making and Mucosal Damage Prediction in Sinus Lift Surgery for patients with low RBH.

Human Readers versus AI-Based Systems in ASPECTS Scoring for Acute Ischemic Stroke: A Systematic Review and Meta-Analysis with Region-Specific Guidance.

Ready to Sharpen Your Edge?