Latest Papers on Radiology AI. Tags: Benchmark SOTA

A novel deep learning framework for retinal disease detection leveraging contextual and local features cues from retinal images.

Khan SD, Basalamah S, Lbath A

•papers•Jul 1 2025

Retinal diseases are a serious global threat to human vision, and early identification is essential for effective prevention and treatment. However, current diagnostic methods rely on manual analysis of fundus images, which heavily depends on the expertise of ophthalmologists. This manual process is time-consuming and labor-intensive and can sometimes lead to missed diagnoses. With advancements in computer vision technology, several automated models have been proposed to improve diagnostic accuracy for retinal diseases and medical imaging in general. However, these methods face challenges in accurately detecting specific diseases within images due to inherent issues associated with fundus images, including inter-class similarities, intra-class variations, limited local information, insufficient contextual understanding, and class imbalances within datasets. To address these challenges, we propose a novel deep learning framework for accurate retinal disease classification. This framework is designed to achieve high accuracy in identifying various retinal diseases while overcoming inherent challenges associated with fundus images. Generally, the framework consists of three main modules. The first module is Densely Connected Multidilated Convolution Neural Network (DCM-CNN) that extracts global contextual information by effectively integrating novel Casual Dilated Dense Convolutional Blocks (CDDCBs). The second module of the framework, namely, Local-Patch-based Convolution Neural Network (LP-CNN), utilizes Class Activation Map (CAM) (obtained from DCM-CNN) to extract local and fine-grained information. To identify the correct class and minimize the error, a synergic network is utilized that takes the feature maps of both DCM-CNN and LP-CNN and connects both maps in a fully connected fashion to identify the correct class and minimize the errors. The framework is evaluated through a comprehensive set of experiments, both quantitatively and qualitatively, using two publicly available benchmark datasets: RFMiD and ODIR-5K. Our experimental results demonstrate the effectiveness of the proposed framework and achieves higher performance on RFMiD and ODIR-5K datasets compared to reference methods.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

A Workflow-Efficient Approach to Pre- and Post-Operative Assessment of Weight-Bearing Three-Dimensional Knee Kinematics.

Banks SA, Yildirim G, Jachode G, Cox J, Anderson O, Jensen A, Cole JD, Kessler O

•papers•Jul 1 2025

Knee kinematics during daily activities reflect disease severity preoperatively and are associated with clinical outcomes after total knee arthroplasty (TKA). It is widely believed that measured kinematics would be useful for preoperative planning and postoperative assessment. Despite decades-long interest in measuring three-dimensional (3D) knee kinematics, no methods are available for routine, practical clinical examinations. We report a clinically practical method utilizing machine-learning-enhanced software and upgraded C-arm fluoroscopy for the accurate and time-efficient measurement of pre-TKA and post-TKA 3D dynamic knee kinematics. Using a common C-arm with an upgraded detector and software, we performed an 8-s horizontal sweeping pulsed fluoroscopic scan of the weight-bearing knee joint. The patient's knee was then imaged using pulsed C-arm fluoroscopy while performing standing, kneeling, squatting, stair, chair, and gait motion activities. We used limited-arc cone-beam reconstruction methods to create 3D models of the femur and tibia/fibula bones with implants, which can then be used to perform model-image registration to quantify the 3D knee kinematics. The proposed protocol can be accomplished by an individual radiology technician in ten minutes and does not require additional equipment beyond a step and stool. The image analysis can be performed by a computer onboard the upgraded c-arm or in the cloud, before loading the examination results into the Picture Archiving and Communication System and Electronic Medical Record systems. Weight-bearing kinematics affects knee function pre- and post-TKA. It has long been exclusively the domain of researchers to make such measurements. We present an approach that leverages common, but digitally upgraded, imaging hardware and software to implement an efficient examination protocol for accurately assessing 3D knee kinematics. With these capabilities, it will be possible to include dynamic 3D knee kinematics as a component of the routine clinical workup for patients who have diseased or replaced knees.

Fluoroscopy Registration Musculoskeletal Methodology Clinical Pilot Startup Benchmark SOTA

Automatic recognition and differentiation of pulmonary contusion and bacterial pneumonia based on deep learning and radiomics.

Deng T, Feng J, Le X, Xia Y, Shi F, Yu F, Zhan Y, Liu X, Li C

•papers•Jul 1 2025

In clinical work, there are difficulties in distinguishing pulmonary contusion(PC) from bacterial pneumonia(BP) on CT images by the naked eye alone when the history of trauma is unknown. Artificial intelligence is widely used in medical imaging, but its diagnostic performance for pulmonary contusion is unclear. In this study, artificial intelligence was used for the first time to identify lung contusion and bacterial pneumonia, and its diagnostic performance was compared with that of manual. In this retrospective study, 2179 patients between April 2016 and July 2022 from two hospitals were collected and divided into a training set, an internal validation set, an external validation set. PC and BP were automatically recognized, segmented using VB-net and radiomics features were automatically extracted. Four machine learning algorithms including Decision Trees, Logistic Regression, Random Forests and Support Vector Machines(SVM) were using to built the models. De-long test was used to compare the performance among models. The best performing model and four radiologists diagnosed the external validation set, and compare the diagnostic efficacy of human and artificial intelligence. VB-net automatically detected and segmented PC and BP. Among the four machine learning models we've built, De-long test showed that SVM model had the best performance, with AUC, accuracy, sensitivity, and specificity of 0.998 (95% CI: 0.995-1), 0.980, 0.979, 0.982 in the training set, 0.891 (95% CI: 0.854-0.928), 0.979, 0.750, 0.860 in the internal validation set, 0.885 (95% CI: 0.850-0.920), 0.903, 0.976, 0.794 in the external validation set. The diagnostic ability of the SVM model was superior to that of human (P < 0.05). Our VB-net automatically recognizes and segments PC and BP in chest CT images. SVM model based on radiomics features can quickly and accurately differentiate between them with higher accuracy than experienced radiologist.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Integrating multi-scale information and diverse prompts in large model SAM-Med2D for accurate left ventricular ejection fraction estimation.

Wu Y, Zhao T, Hu S, Wu Q, Chen Y, Huang X, Zheng Z

•papers•Jul 1 2025

Left ventricular ejection fraction (LVEF) is a critical indicator of cardiac function, aiding in the assessment of heart conditions. Accurate segmentation of the left ventricle (LV) is essential for LVEF calculation. However, current methods are often limited by small datasets and exhibit poor generalization. While leveraging large models can address this issue, many fail to capture multi-scale information and introduce additional burdens on users to generate prompts. To overcome these challenges, we propose LV-SAM, a model based on the large model SAM-Med2D, for accurate LV segmentation. It comprises three key components: an image encoder with a multi-scale adapter (MSAd), a multimodal prompt encoder (MPE), and a multi-scale decoder (MSD). The MSAd extracts multi-scale information at the encoder level and fine-tunes the model, while the MSD employs skip connections to effectively utilize multi-scale information at the decoder level. Additionally, we introduce an automated pipeline for generating self-extracted dense prompts and use a large language model to generate text prompts, reducing the user burden. The MPE processes these prompts, further enhancing model performance. Evaluations on the CAMUS dataset show that LV-SAM outperforms existing SOAT methods in LV segmentation, achieving the lowest MAE of 5.016 in LVEF estimation.

Ultrasound Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Robust and generalizable artificial intelligence for multi-organ segmentation in ultra-low-dose total-body PET imaging: a multi-center and cross-tracer study.

Wang H, Qiao X, Ding W, Chen G, Miao Y, Guo R, Zhu X, Cheng Z, Xu J, Li B, Huang Q

•papers•Jul 1 2025

Positron Emission Tomography (PET) is a powerful molecular imaging tool that visualizes radiotracer distribution to reveal physiological processes. Recent advances in total-body PET have enabled low-dose, CT-free imaging; however, accurate organ segmentation using PET-only data remains challenging. This study develops and validates a deep learning model for multi-organ PET segmentation across varied imaging conditions and tracers, addressing critical needs for fully PET-based quantitative analysis. This retrospective study employed a 3D deep learning-based model for automated multi-organ segmentation on PET images acquired under diverse conditions, including low-dose and non-attenuation-corrected scans. Using a dataset of 798 patients from multiple centers with varied tracers, model robustness and generalizability were evaluated via multi-center and cross-tracer tests. Ground-truth labels for 23 organs were generated from CT images, and segmentation accuracy was assessed using the Dice similarity coefficient (DSC). In the multi-center dataset from four different institutions, our model achieved average DSC values of 0.834, 0.825, 0.819, and 0.816 across varying dose reduction factors and correction conditions for FDG PET images. In the cross-tracer dataset, the model reached average DSC values of 0.737, 0.573, 0.830, 0.661, and 0.708 for DOTATATE, FAPI, FDG, Grazytracer, and PSMA, respectively. The proposed model demonstrated effective, fully PET-based multi-organ segmentation across a range of imaging conditions, centers, and tracers, achieving high robustness and generalizability. These findings underscore the model's potential to enhance clinical diagnostic workflows by supporting ultra-low dose PET imaging. Not applicable. This is a retrospective study based on collected data, which has been approved by the Research Ethics Committee of Ruijin Hospital affiliated to Shanghai Jiao Tong University School of Medicine.

PET Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Using deep feature distances for evaluating the perceptual quality of MR image reconstructions.

Adamson PM, Desai AD, Dominic J, Varma M, Bluethgen C, Wood JP, Syed AB, Boutin RD, Stevens KJ, Vasanawala S, Pauly JM, Gunel B, Chaudhari AS

•papers•Jul 1 2025

Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation. We compare commonly used IQ metrics (PSNR and SSIM) to two "out-of-domain" DFDs with encoders trained on natural images, an "in-domain" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise. All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs. A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.

MRI Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Diffusion-driven multi-modality medical image fusion.

Qu J, Huang D, Shi Y, Liu J, Tang W

•papers•Jul 1 2025

Multi-modality medical image fusion (MMIF) technology utilizes the complementarity of different modalities to provide more comprehensive diagnostic insights for clinical practice. Existing deep learning-based methods often focus on extracting the primary information from individual modalities while ignoring the correlation of information distribution across different modalities, which leads to insufficient fusion of image details and color information. To address this problem, a diffusion-driven MMIF method is proposed to leverage the information distribution relationship among multi-modality images in the latent space. To better preserve the complementary information from different modalities, a local and global network (LAGN) is suggested. Additionally, a loss strategy is designed to establish robust constraints among diffusion-generated images, original images, and fused images. This strategy supervises the training process and prevents information loss in fused images. The experimental results demonstrate that the proposed method surpasses state-of-the-art image fusion methods in terms of unsupervised metrics on three datasets: MRI/CT, MRI/PET, and MRI/SPECT images. The proposed method successfully captures rich details and color information. Furthermore, 16 doctors and medical students were invited to evaluate the effectiveness of our method in assisting clinical diagnosis and treatment.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Benchmark SOTA

Prediction of adverse pathology in prostate cancer using a multimodal deep learning approach based on [18F]PSMA-1007 PET/CT and multiparametric MRI.

Lin H, Yao F, Yi X, Yuan Y, Xu J, Chen L, Wang H, Zhuang Y, Lin Q, Xue Y, Yang Y, Pan Z

•papers•Jul 1 2025

Accurate prediction of adverse pathology (AP) in prostate cancer (PCa) patients is crucial for formulating effective treatment strategies. This study aims to develop and evaluate a multimodal deep learning model based on [18F]PSMA-1007 PET/CT and multiparametric MRI (mpMRI) to predict the presence of AP, and investigate whether the model that integrates [18F]PSMA-1007 PET/CT and mpMRI outperforms the individual PET/CT or mpMRI models in predicting AP. 341 PCa patients who underwent radical prostatectomy (RP) with mpMRI and PET/CT scans were retrospectively analyzed. We generated deep learning signature from mpMRI and PET/CT with a multimodal deep learning model (MPC) based on convolutional neural networks and transformer, which was subsequently incorporated with clinical characteristics to construct an integrated model (MPCC). These models were compared with clinical models and single mpMRI or PET/CT models. The MPCC model showed the best performance in predicting AP (AUC, 0.955 [95% CI: 0.932-0.975]), which is higher than MPC model (AUC, 0.930 [95% CI: 0.901-0.955]). The performance of the MPC model is better than that of single PET/CT (AUC, 0.813 [95% CI: 0.780-0.845]) or mpMRI (AUC, 0.865 [95% CI: 0.829-0.901]). Additionally, MPCC model is also effective in predicting single adverse pathological features. The deep learning model that integrates mpMRI and [18F]PSMA-1007 PET/CT enhances the predictive capabilities for the presence of AP in PCa patients. This improvement aids physicians in making informed preoperative decisions, ultimately enhancing patient prognosis.

Mixed Modality Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Measuring kidney stone volume - practical considerations and current evidence from the EAU endourology section.

Grossmann NC, Panthier F, Afferi L, Kallidonis P, Somani BK

•papers•Jul 1 2025

This narrative review provides an overview of the use, differences, and clinical impact of current methods for kidney stone volume assessment. The different approaches to volume measurement are based on noncontrast computed tomography (NCCT). While volume measurement using formulas is sufficient for smaller stones, it tends to overestimate volume for larger or irregularly shaped calculi. In contrast, software-based segmentation significantly improves accuracy and reproducibility, and artificial intelligence based volumetry additionally shows excellent agreement with reference standards while reducing observer variability and measurement time. Moreover, specific CT preparation protocols may further enhance image quality and thus improve measurement accuracy. Clinically, stone volume has proven to be a superior predictor of stone-related events during follow-up, spontaneous stone passage under conservative management, and stone-free rates after shockwave lithotripsy (SWL) and ureteroscopy (URS) compared to linear measurements. Although manual measurement remains practical, its accuracy diminishes for complex or larger stones. Software-based segmentation and volumetry offer higher precision and efficiency but require established standards and broader access to dedicated software for routine clinical use.

CT Segmentation Abdominal Review In Silico Benchmark SOTA

Dual-type deep learning-based image reconstruction for advanced denoising and super-resolution processing in head and neck T2-weighted imaging.

Fujima N, Shimizu Y, Ikebe Y, Kameda H, Harada T, Tsushima N, Kano S, Homma A, Kwon J, Yoneyama M, Kudo K

•papers•Jul 1 2025

To assess the utility of dual-type deep learning (DL)-based image reconstruction with DL-based image denoising and super-resolution processing by comparing images reconstructed with the conventional method in head and neck fat-suppressed (Fs) T2-weighted imaging (T2WI). We retrospectively analyzed the cases of 43 patients who underwent head/neck Fs-T2WI for the assessment of their head and neck lesions. All patients underwent two sets of Fs-T2WI scans with conventional- and DL-based reconstruction. The Fs-T2WI with DL-based reconstruction was acquired based on a 30% reduction of its spatial resolution in both the x- and y-axes with a shortened scan time. Qualitative and quantitative assessments were performed with both the conventional method- and DL-based reconstructions. For the qualitative assessment, we visually evaluated the overall image quality, visibility of anatomical structures, degree of artifact(s), lesion conspicuity, and lesion edge sharpness based on five-point grading. In the quantitative assessment, we measured the signal-to-noise ratio (SNR) of the lesion and the contrast-to-noise ratio (CNR) between the lesion and the adjacent or nearest muscle. In the qualitative analysis, significant differences were observed between the Fs-T2WI with the conventional- and DL-based reconstruction in all of the evaluation items except the degree of the artifact(s) (p < 0.001). In the quantitative analysis, significant differences were observed in the SNR between the Fs-T2WI with conventional- (21.4 ± 14.7) and DL-based reconstructions (26.2 ± 13.5) (p < 0.001). In the CNR assessment, the CNR between the lesion and adjacent or nearest muscle in the DL-based Fs-T2WI (16.8 ± 11.6) was significantly higher than that in the conventional Fs-T2WI (14.2 ± 12.9) (p < 0.001). Dual-type DL-based image reconstruction by an effective denoising and super-resolution process successfully provided high image quality in head and neck Fs-T2WI with a shortened scan time compared to the conventional imaging method.

MRI Reconstruction Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Filter Papers

Tags

A novel deep learning framework for retinal disease detection leveraging contextual and local features cues from retinal images.

A Workflow-Efficient Approach to Pre- and Post-Operative Assessment of Weight-Bearing Three-Dimensional Knee Kinematics.

Automatic recognition and differentiation of pulmonary contusion and bacterial pneumonia based on deep learning and radiomics.

Integrating multi-scale information and diverse prompts in large model SAM-Med2D for accurate left ventricular ejection fraction estimation.

Robust and generalizable artificial intelligence for multi-organ segmentation in ultra-low-dose total-body PET imaging: a multi-center and cross-tracer study.

Using deep feature distances for evaluating the perceptual quality of MR image reconstructions.

Diffusion-driven multi-modality medical image fusion.

Prediction of adverse pathology in prostate cancer using a multimodal deep learning approach based on [<sup>18</sup>F]PSMA-1007 PET/CT and multiparametric MRI.

Measuring kidney stone volume - practical considerations and current evidence from the EAU endourology section.

Dual-type deep learning-based image reconstruction for advanced denoising and super-resolution processing in head and neck T2-weighted imaging.

Ready to Sharpen Your Edge?