Latest Papers on Radiology AI. Tags: Benchmark SOTA

A novel interpreted deep network for Alzheimer's disease prediction based on inverted self attention and vision transformer.

Ibrar W, Khan MA, Hamza A, Rubab S, Alqahtani O, Alouane MT, Teng S, Nam Y

•papers•Aug 15 2025

In the world, Alzheimer's disease (AD) is the utmost public reason for dementia. AD causes memory loss and disturbing mental function impairment in aging people. The loss of memory and disturbing mental function brings a significant load on patients as well as on society. So far, there is no actual treatment that can cure AD; however, early diagnosis can slow down this disease. Deep learning has shown substantial success in diagnosing AZ disease. However, challenges remain due to limited data, improper model selection, and extraction of irrelevant features. In this work, we proposed a fully automated framework based on the fusion of a vision transformer and a novel inverted residual bottleneck with self-attention (IRBwSA) for AD diagnosis. In the first step, data augmentation was performed to balance the selected dataset. After that, the vision model is designed and modified according to the dataset. Similarly, a new inverted bottleneck self-attention model is developed. The designed models are trained on the augmented dataset, and extracted features are fused using a novel search-based approach. Moreover, the designed models are interpreted using an explainable artificial intelligence technique named LIME. The fused features are finally classified using a shallow wide neural network and other classifiers. The experimental process was conducted on an augmented MRI dataset, and 96.1% accuracy and 96.05% precision rate were obtained. Comparison with a few recent techniques shows the proposed framework's better performance.

MRI Classification Neurological Methodology In Silico Academic Lab Benchmark SOTA

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

Zhenhao Li, Long Yang, Xiaojie Yin, Haijun Yu, Jiazhou Wang, Hongbin Han, Weigang Hu, Yixing Huang

•preprint•Aug 15 2025

Computed tomography (CT) is a cornerstone imaging modality for non-invasive, high-resolution visualization of internal anatomical structures. However, when the scanned object exceeds the scanner's field of view (FOV), projection data are truncated, resulting in incomplete reconstructions and pronounced artifacts near FOV boundaries. Conventional reconstruction algorithms struggle to recover accurate anatomy from such data, limiting clinical reliability. Deep learning approaches have been explored for FOV extension, with diffusion generative models representing the latest advances in image synthesis. Yet, conventional diffusion models are computationally demanding and slow at inference due to their iterative sampling process. To address these limitations, we propose an efficient CT FOV extension framework based on the image-to-image Schr\"odinger Bridge (I$^2$SB) diffusion model. Unlike traditional diffusion models that synthesize images from pure Gaussian noise, I$^2$SB learns a direct stochastic mapping between paired limited-FOV and extended-FOV images. This direct correspondence yields a more interpretable and traceable generative process, enhancing anatomical consistency and structural fidelity in reconstructions. I$^2$SB achieves superior quantitative performance, with root-mean-square error (RMSE) values of 49.8\,HU on simulated noisy data and 152.0HU on real data, outperforming state-of-the-art diffusion models such as conditional denoising diffusion probabilistic models (cDDPM) and patch-based diffusion methods. Moreover, its one-step inference enables reconstruction in just 0.19s per 2D slice, representing over a 700-fold speedup compared to cDDPM (135s) and surpassing diffusionGAN (0.58s), the second fastest. This combination of accuracy and efficiency makes I$^2$SB highly suitable for real-time or clinical deployment.

CT Image Synthesis Methodology In Silico Academic Lab Benchmark SOTA

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

Mingzhe Hu, Zach Eidex, Shansong Wang, Mojtaba Safari, Qiang Li, Xiaofeng Yang

•preprint•Aug 15 2025

Radiology, radiation oncology, and medical physics require decision-making that integrates medical images, textual reports, and quantitative data under high-stakes conditions. With the introduction of GPT-5, it is critical to assess whether recent advances in large multimodal models translate into measurable gains in these safety-critical domains. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks. We present a targeted zero-shot evaluation of GPT-5 and its smaller variants (GPT-5-mini, GPT-5-nano) against GPT-4o across three representative tasks: (1) VQA-RAD, a benchmark for visual question answering in radiology; (2) SLAKE, a semantically annotated, multilingual VQA dataset testing cross-modal grounding; and (3) a curated Medical Physics Board Examination-style dataset of 150 multiple-choice questions spanning treatment planning, dosimetry, imaging, and quality assurance. Across all datasets, GPT-5 achieved the highest accuracy, with substantial gains over GPT-4o up to +20.00% in challenging anatomical regions such as the chest-mediastinal, +13.60% in lung-focused questions, and +11.44% in brain-tissue interpretation. On the board-style physics questions, GPT-5 attained 90.7% accuracy (136/150), exceeding the estimated human passing threshold, while GPT-4o trailed at 78.0%. These results demonstrate that GPT-5 delivers consistent and often pronounced performance improvements over GPT-4o in both image-grounded reasoning and domain-specific numerical problem-solving, highlighting its potential to augment expert workflows in medical imaging and therapeutic physics.

Mixed Modality Classification Methodology In Silico Academic Lab Benchmark SOTA GenAI

SMAS: Structural MRI-based AD Score using Bayesian supervised VAE.

Nemali A, Bernal J, Yakupov R, D S, Dyrba M, Incesoy EI, Mukherjee S, Peters O, Ersözlü E, Hellmann-Regen J, Preis L, Priller J, Spruth E, Altenstein S, Lohse A, Schneider A, Fliessbach K, Kimmich O, Wiltfang J, Hansen N, Schott B, Rostamzadeh A, Glanz W, Butryn M, Buerger K, Janowitz D, Ewers M, Perneczky R, Rauchmann B, Teipel S, Kilimann I, Goerss D, Laske C, Sodenkamp S, Spottke A, Coenjaerts M, Brosseron F, Lüsebrink F, Dechent P, Scheffler K, Hetzer S, Kleineidam L, Stark M, Jessen F, Duzel E, Ziegler G

•papers•Aug 15 2025

This study introduces the Structural MRI-based Alzheimer's Disease Score (SMAS), a novel index intended to quantify Alzheimer's Disease (AD)-related morphometric patterns using a deep learning Bayesian-supervised Variational Autoencoder (Bayesian-SVAE). The SMAS index was constructed using baseline structural MRI data from the DELCODE study and evaluated longitudinally in two independent cohorts: DELCODE (n=415) and ADNI (n=190). Our findings indicate that SMAS has strong associations with cognitive performance (DELCODE: r=-0.83; ADNI: r=-0.62), age (DELCODE: r=0.50; ADNI: r=0.28), hippocampal volume (DELCODE: r=-0.44; ADNI: r=-0.66), and total gray matter volume (DELCODE: r=-0.42; ADNI: r=-0.47), suggesting its potential as a biomarker for AD-related brain atrophy. Moreover, our longitudinal studies indicated that SMAS may be useful for the early identification and tracking of AD. The model demonstrated significant predictive accuracy in distinguishing cognitively healthy individuals from those with AD (DELCODE: AUC=0.971 at baseline, 0.833 at 36 months; ADNI: AUC=0.817 at baseline, improving to 0.903 at 24 months). Notably, over 36 months, the SMAS index outperformed existing measures such as SPARE-AD and hippocampal volume. The relevance map analysis revealed significant morphological changes in key AD-related brain regions, including the hippocampus, posterior cingulate cortex, precuneus, and lateral parietal cortex, highlighting that SMAS is a sensitive and interpretable biomarker of brain atrophy, suitable for early AD detection and longitudinal monitoring of disease progression.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

Rahman MM, Masry ME, Gnyawali SC, Xue Y, Gordillo G, Wachs JP

•papers•Aug 15 2025

Burn injuries represent a significant clinical challenge due to the complexity of accurately assessing burn depth, which directly influences the course of treatment and patient outcomes. Traditional diagnostic methods primarily rely on visual inspection by experienced burn surgeons. Studies report diagnostic accuracies of around 76% for experts, dropping to nearly 50% for less experienced clinicians. Such inaccuracies can result in suboptimal clinical decisions-delaying vital surgical interventions in severe cases or initiating unnecessary treatments for superficial burns. This diagnostic variability not only compromises patient care but also strains health care resources and increases the likelihood of adverse outcomes. Hence, a more consistent and precise approach to burn classification is urgently needed. The objective is to determine whether a multimodal integrated artificial intelligence (AI) system for accurate classification of burn depth can preserve diagnostic accuracy and provide an important resource when used as part of the electronic medical record (EMR). This study used a novel multimodal AI system, integrating digital photographs and ultrasound tissue Doppler imaging (TDI) data to accurately assess burn depth. These imaging modalities were accessed and processed through an EMR system, enabling real-time data retrieval and AI-assisted evaluation. TDI was instrumental in evaluating the biomechanical properties of subcutaneous tissues, using color-coded images to identify burn-induced changes in tissue stiffness and elasticity. The collected imaging data were uploaded to the EMR system (DrChrono), where they were processed by a vision-language model built on GPT-4 architecture. This model received expert-formulated prompts describing how to interpret both digital and TDI images, guiding the AI in making explainable classifications. This study evaluated whether a multimodal AI classifier, designed to identify first-, second-, and third-degree burns, could be effectively applied to imaging data stored within an EMR system. The classifier achieved an overall accuracy of 84.38%, significantly surpassing human performance benchmarks typically cited in the literature. This highlights the potential of the AI model to serve as a robust clinical decision support tool, especially in settings lacking highly specialized expertise. In addition to accuracy, the classifier demonstrated strong performance across multiple evaluation metrics. The classifier's ability to distinguish between burn severities was further validated by the area under the receiver operating characteristic: 0.97 for first-degree, 0.96 for second-degree, and a perfect 1.00 for third-degree burns, each with narrow 95% CIs. The storage of multimodal imaging data within the EMR, along with the ability for post hoc analysis by AI algorithms, offers significant advancements in burn care, enabling real-time burn depth prediction on currently available data. Using digital photos for superficial burns, easily diagnosed through physical examinations, reduces reliance on TDI, while TDI helps distinguish deep second- and third-degree burns, enhancing diagnostic efficiency.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

UniDCF: A Foundation Model for Comprehensive Dentocraniofacial Hard Tissue Reconstruction

Chunxia Ren, Ning Zhu, Yue Lai, Gui Chen, Ruijie Wang, Yangyi Hu, Suyao Liu, Shuwen Mao, Hong Su, Yu Zhang, Li Xiao

•preprint•Aug 15 2025

Dentocraniofacial hard tissue defects profoundly affect patients' physiological functions, facial aesthetics, and psychological well-being, posing significant challenges for precise reconstruction. Current deep learning models are limited to single-tissue scenarios and modality-specific imaging inputs, resulting in poor generalizability and trade-offs between anatomical fidelity, computational efficiency, and cross-tissue adaptability. Here we introduce UniDCF, a unified framework capable of reconstructing multiple dentocraniofacial hard tissues through multimodal fusion encoding of point clouds and multi-view images. By leveraging the complementary strengths of each modality and incorporating a score-based denoising module to refine surface smoothness, UniDCF overcomes the limitations of prior single-modality approaches. We curated the largest multimodal dataset, comprising intraoral scans, CBCT, and CT from 6,609 patients, resulting in 54,555 annotated instances. Evaluations demonstrate that UniDCF outperforms existing state-of-the-art methods in terms of geometric precision, structural completeness, and spatial accuracy. Clinical simulations indicate UniDCF reduces reconstruction design time by 99% and achieves clinician-rated acceptability exceeding 94%. Overall, UniDCF enables rapid, automated, and high-fidelity reconstruction, supporting personalized and precise restorative treatments, streamlining clinical workflows, and enhancing patient outcomes.

Mixed Modality Reconstruction Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

Ke Zou, Jocelyn Hui Lin Goh, Yukun Zhou, Tian Lin, Samantha Min Er Yew, Sahana Srinivasan, Meng Wang, Rui Santos, Gabor M. Somfai, Huazhu Fu, Haoyu Chen, Pearse A. Keane, Ching-Yu Cheng, Yih Chung Tham

•preprint•Aug 15 2025

Foundation models (FMs) have shown great promise in medical image analysis by improving generalization across diverse downstream tasks. In ophthalmology, several FMs have recently emerged, but there is still no clear answer to fundamental questions: Which FM performs the best? Are they equally good across different tasks? What if we combine all FMs together? To our knowledge, this is the first study to systematically evaluate both single and fused ophthalmic FMs. To address these questions, we propose FusionFM, a comprehensive evaluation suite, along with two fusion approaches to integrate different ophthalmic FMs. Our framework covers both ophthalmic disease detection (glaucoma, diabetic retinopathy, and age-related macular degeneration) and systemic disease prediction (diabetes and hypertension) based on retinal imaging. We benchmarked four state-of-the-art FMs (RETFound, VisionFM, RetiZero, and DINORET) using standardized datasets from multiple countries and evaluated their performance using AUC and F1 metrics. Our results show that DINORET and RetiZero achieve superior performance in both ophthalmic and systemic disease tasks, with RetiZero exhibiting stronger generalization on external datasets. Regarding fusion strategies, the Gating-based approach provides modest improvements in predicting glaucoma, AMD, and hypertension. Despite these advances, predicting systemic diseases, especially hypertension in external cohort remains challenging. These findings provide an evidence-based evaluation of ophthalmic FMs, highlight the benefits of model fusion, and point to strategies for enhancing their clinical applicability.

OCT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

BRIEF: BRain-Inspired network connection search with Extensive temporal feature Fusion enhances disease classification

Xiangxiang Cui, Min Zhao, Dongmei Zhi, Shile Qi, Vince D Calhoun, Jing Sui

•preprint•Aug 15 2025

Existing deep learning models for functional MRI-based classification have limitations in network architecture determination (relying on experience) and feature space fusion (mostly simple concatenation, lacking mutual learning). Inspired by the human brain's mechanism of updating neural connections through learning and decision-making, we proposed a novel BRain-Inspired feature Fusion (BRIEF) framework, which is able to optimize network architecture automatically by incorporating an improved neural network connection search (NCS) strategy and a Transformer-based multi-feature fusion module. Specifically, we first extracted 4 types of fMRI temporal representations, i.e., time series (TCs), static/dynamic functional connection (FNC/dFNC), and multi-scale dispersion entropy (MsDE), to construct four encoders. Within each encoder, we employed a modified Q-learning to dynamically optimize the NCS to extract high-level feature vectors, where the NCS is formulated as a Markov Decision Process. Then, all feature vectors were fused via a Transformer, leveraging both stable/time-varying connections and multi-scale dependencies across different brain regions to achieve the final classification. Additionally, an attention module was embedded to improve interpretability. The classification performance of our proposed BRIEF was compared with 21 state-of-the-art models by discriminating two mental disorders from healthy controls: schizophrenia (SZ, n=1100) and autism spectrum disorder (ASD, n=1550). BRIEF demonstrated significant improvements of 2.2% to 12.1% compared to 21 algorithms, reaching an AUC of 91.5% - 0.6% for SZ and 78.4% - 0.5% for ASD, respectively. This is the first attempt to incorporate a brain-inspired, reinforcement learning strategy to optimize fMRI-based mental disorder classification, showing significant potential for identifying precise neuroimaging biomarkers.

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Data-Driven Abdominal Phenotypes of Type 2 Diabetes in Lean, Overweight, and Obese Cohorts

Lucas W. Remedios, Chloe Choe, Trent M. Schwartz, Dingjie Su, Gaurav Rudravaram, Chenyu Gao, Aravind R. Krishnan, Adam M. Saunders, Michael E. Kim, Shunxing Bao, Alvin C. Powers, Bennett A. Landman, John Virostko

•preprint•Aug 14 2025

Purpose: Although elevated BMI is a well-known risk factor for type 2 diabetes, the disease's presence in some lean adults and absence in others with obesity suggests that detailed body composition may uncover abdominal phenotypes of type 2 diabetes. With AI, we can now extract detailed measurements of size, shape, and fat content from abdominal structures in 3D clinical imaging at scale. This creates an opportunity to empirically define body composition signatures linked to type 2 diabetes risk and protection using large-scale clinical data. Approach: To uncover BMI-specific diabetic abdominal patterns from clinical CT, we applied our design four times: once on the full cohort (n = 1,728) and once on lean (n = 497), overweight (n = 611), and obese (n = 620) subgroups separately. Briefly, our experimental design transforms abdominal scans into collections of explainable measurements through segmentation, classifies type 2 diabetes through a cross-validated random forest, measures how features contribute to model-estimated risk or protection through SHAP analysis, groups scans by shared model decision patterns (clustering from SHAP) and links back to anatomical differences (classification). Results: The random-forests achieved mean AUCs of 0.72-0.74. There were shared type 2 diabetes signatures in each group; fatty skeletal muscle, older age, greater visceral and subcutaneous fat, and a smaller or fat-laden pancreas. Univariate logistic regression confirmed the direction of 14-18 of the top 20 predictors within each subgroup (p < 0.05). Conclusions: Our findings suggest that abdominal drivers of type 2 diabetes may be consistent across weight classes.

CT Segmentation Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Deep learning-based non-invasive prediction of PD-L1 status and immunotherapy survival stratification in esophageal cancer using [18F]FDG PET/CT.

Xie F, Zhang M, Zheng C, Zhao Z, Wang J, Li Y, Wang K, Wang W, Lin J, Wu T, Wang Y, Chen X, Li Y, Zhu Z, Wu H, Li Y, Liu Q

•papers•Aug 14 2025

This study aimed to develop and validate deep learning models using [18F]FDG PET/CT to predict PD-L1 status in esophageal cancer (EC) patients. Additionally, we assessed the potential of derived deep learning model scores (DLS) for survival stratification in immunotherapy. In this retrospective study, we included 331 EC patients from two centers, dividing them into training, internal validation, and external validation cohorts. Fifty patients who received immunotherapy were followed up. We developed four 3D ResNet10-based models-PET + CT + clinical factors (CPC), PET + CT (PC), PET (P), and CT (C)-using pre-treatment [18F]FDG PET/CT scans. For comparison, we also constructed a logistic model incorporating clinical factors (clinical model). The DLS were evaluated as radiological markers for survival stratification, and nomograms for predicting survival were constructed. The models demonstrated accurate prediction of PD-L1 status. The areas under the curve (AUCs) for predicting PD-L1 status were as follows: CPC (0.927), PC (0.904), P (0.886), C (0.934), and the clinical model (0.603) in the training cohort; CPC (0.882), PC (0.848), P (0.770), C (0.745), and the clinical model (0.524) in the internal validation cohort; and CPC (0.843), PC (0.806), P (0.759), C (0.667), and the clinical model (0.671) in the external validation cohort. The CPC and PC models exhibited superior predictive performance. Survival analysis revealed that the DLS from most models effectively stratified overall survival and progression-free survival at appropriate cut-off points (P < 0.05), outperforming stratification based on PD-L1 status (combined positive score ≥ 10). Furthermore, incorporating model scores with clinical factors in nomograms enhanced the predictive probability of survival after immunotherapy. Deep learning models based on [18F]FDG PET/CT can accurately predict PD-L1 status in esophageal cancer patients. The derived DLS can effectively stratify survival outcomes following immunotherapy, particularly when combined with clinical factors.

PET Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

A novel interpreted deep network for Alzheimer's disease prediction based on inverted self attention and vision transformer.

Efficient Image-to-Image Schrödinger Bridge for CT Field of View Extension

Benchmarking GPT-5 for Zero-Shot Multimodal Medical Reasoning in Radiology and Radiation Oncology

SMAS: Structural MRI-based AD Score using Bayesian supervised VAE.

AI-Driven Integrated System for Burn Depth Prediction With Electronic Medical Records: Algorithm Development and Validation.

UniDCF: A Foundation Model for Comprehensive Dentocraniofacial Hard Tissue Reconstruction

FusionFM: Fusing Eye-specific Foundational Models for Optimized Ophthalmic Diagnosis

BRIEF: BRain-Inspired network connection search with Extensive temporal feature Fusion enhances disease classification

Data-Driven Abdominal Phenotypes of Type 2 Diabetes in Lean, Overweight, and Obese Cohorts

Deep learning-based non-invasive prediction of PD-L1 status and immunotherapy survival stratification in esophageal cancer using [<sup>18</sup>F]FDG PET/CT.

Ready to Sharpen Your Edge?