Latest Papers on Radiology AI. Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

Using deep feature distances for evaluating the perceptual quality of MR image reconstructions.

Adamson PM, Desai AD, Dominic J, Varma M, Bluethgen C, Wood JP, Syed AB, Boutin RD, Stevens KJ, Vasanawala S, Pauly JM, Gunel B, Chaudhari AS

•papers•Jul 1 2025

Commonly used MR image quality (IQ) metrics have poor concordance with radiologist-perceived diagnostic IQ. Here, we develop and explore deep feature distances (DFDs)-distances computed in a lower-dimensional feature space encoded by a convolutional neural network (CNN)-as improved perceptual IQ metrics for MR image reconstruction. We further explore the impact of distribution shifts between images in the DFD CNN encoder training data and the IQ metric evaluation. We compare commonly used IQ metrics (PSNR and SSIM) to two "out-of-domain" DFDs with encoders trained on natural images, an "in-domain" DFD trained on MR images alone, and two domain-adjacent DFDs trained on large medical imaging datasets. We additionally compare these with several state-of-the-art but less commonly reported IQ metrics, visual information fidelity (VIF), noise quality metric (NQM), and the high-frequency error norm (HFEN). IQ metric performance is assessed via correlations with five expert radiologist reader scores of perceived diagnostic IQ of various accelerated MR image reconstructions. We characterize the behavior of these IQ metrics under common distortions expected during image acquisition, including their sensitivity to acquisition noise. All DFDs and HFEN correlate more strongly with radiologist-perceived diagnostic IQ than SSIM, PSNR, and other state-of-the-art metrics, with correlations being comparable to radiologist inter-reader variability. Surprisingly, out-of-domain DFDs perform comparably to in-domain and domain-adjacent DFDs. A suite of IQ metrics, including DFDs and HFEN, should be used alongside commonly-reported IQ metrics for a more holistic evaluation of MR image reconstruction perceptual quality. We also observe that general vision encoders are capable of assessing visual IQ even for MR images.

MRI Reconstruction Methodology In Silico Academic Lab Benchmark SOTA

Robust and generalizable artificial intelligence for multi-organ segmentation in ultra-low-dose total-body PET imaging: a multi-center and cross-tracer study.

Wang H, Qiao X, Ding W, Chen G, Miao Y, Guo R, Zhu X, Cheng Z, Xu J, Li B, Huang Q

•papers•Jul 1 2025

Positron Emission Tomography (PET) is a powerful molecular imaging tool that visualizes radiotracer distribution to reveal physiological processes. Recent advances in total-body PET have enabled low-dose, CT-free imaging; however, accurate organ segmentation using PET-only data remains challenging. This study develops and validates a deep learning model for multi-organ PET segmentation across varied imaging conditions and tracers, addressing critical needs for fully PET-based quantitative analysis. This retrospective study employed a 3D deep learning-based model for automated multi-organ segmentation on PET images acquired under diverse conditions, including low-dose and non-attenuation-corrected scans. Using a dataset of 798 patients from multiple centers with varied tracers, model robustness and generalizability were evaluated via multi-center and cross-tracer tests. Ground-truth labels for 23 organs were generated from CT images, and segmentation accuracy was assessed using the Dice similarity coefficient (DSC). In the multi-center dataset from four different institutions, our model achieved average DSC values of 0.834, 0.825, 0.819, and 0.816 across varying dose reduction factors and correction conditions for FDG PET images. In the cross-tracer dataset, the model reached average DSC values of 0.737, 0.573, 0.830, 0.661, and 0.708 for DOTATATE, FAPI, FDG, Grazytracer, and PSMA, respectively. The proposed model demonstrated effective, fully PET-based multi-organ segmentation across a range of imaging conditions, centers, and tracers, achieving high robustness and generalizability. These findings underscore the model's potential to enhance clinical diagnostic workflows by supporting ultra-low dose PET imaging. Not applicable. This is a retrospective study based on collected data, which has been approved by the Research Ethics Committee of Ruijin Hospital affiliated to Shanghai Jiao Tong University School of Medicine.

PET Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Integrating multi-scale information and diverse prompts in large model SAM-Med2D for accurate left ventricular ejection fraction estimation.

Wu Y, Zhao T, Hu S, Wu Q, Chen Y, Huang X, Zheng Z

•papers•Jul 1 2025

Left ventricular ejection fraction (LVEF) is a critical indicator of cardiac function, aiding in the assessment of heart conditions. Accurate segmentation of the left ventricle (LV) is essential for LVEF calculation. However, current methods are often limited by small datasets and exhibit poor generalization. While leveraging large models can address this issue, many fail to capture multi-scale information and introduce additional burdens on users to generate prompts. To overcome these challenges, we propose LV-SAM, a model based on the large model SAM-Med2D, for accurate LV segmentation. It comprises three key components: an image encoder with a multi-scale adapter (MSAd), a multimodal prompt encoder (MPE), and a multi-scale decoder (MSD). The MSAd extracts multi-scale information at the encoder level and fine-tunes the model, while the MSD employs skip connections to effectively utilize multi-scale information at the decoder level. Additionally, we introduce an automated pipeline for generating self-extracted dense prompts and use a large language model to generate text prompts, reducing the user burden. The MPE processes these prompts, further enhancing model performance. Evaluations on the CAMUS dataset show that LV-SAM outperforms existing SOAT methods in LV segmentation, achieving the lowest MAE of 5.016 in LVEF estimation.

Ultrasound Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Automatic recognition and differentiation of pulmonary contusion and bacterial pneumonia based on deep learning and radiomics.

Deng T, Feng J, Le X, Xia Y, Shi F, Yu F, Zhan Y, Liu X, Li C

•papers•Jul 1 2025

In clinical work, there are difficulties in distinguishing pulmonary contusion(PC) from bacterial pneumonia(BP) on CT images by the naked eye alone when the history of trauma is unknown. Artificial intelligence is widely used in medical imaging, but its diagnostic performance for pulmonary contusion is unclear. In this study, artificial intelligence was used for the first time to identify lung contusion and bacterial pneumonia, and its diagnostic performance was compared with that of manual. In this retrospective study, 2179 patients between April 2016 and July 2022 from two hospitals were collected and divided into a training set, an internal validation set, an external validation set. PC and BP were automatically recognized, segmented using VB-net and radiomics features were automatically extracted. Four machine learning algorithms including Decision Trees, Logistic Regression, Random Forests and Support Vector Machines(SVM) were using to built the models. De-long test was used to compare the performance among models. The best performing model and four radiologists diagnosed the external validation set, and compare the diagnostic efficacy of human and artificial intelligence. VB-net automatically detected and segmented PC and BP. Among the four machine learning models we've built, De-long test showed that SVM model had the best performance, with AUC, accuracy, sensitivity, and specificity of 0.998 (95% CI: 0.995-1), 0.980, 0.979, 0.982 in the training set, 0.891 (95% CI: 0.854-0.928), 0.979, 0.750, 0.860 in the internal validation set, 0.885 (95% CI: 0.850-0.920), 0.903, 0.976, 0.794 in the external validation set. The diagnostic ability of the SVM model was superior to that of human (P < 0.05). Our VB-net automatically recognizes and segments PC and BP in chest CT images. SVM model based on radiomics features can quickly and accurately differentiate between them with higher accuracy than experienced radiologist.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

A Workflow-Efficient Approach to Pre- and Post-Operative Assessment of Weight-Bearing Three-Dimensional Knee Kinematics.

Banks SA, Yildirim G, Jachode G, Cox J, Anderson O, Jensen A, Cole JD, Kessler O

•papers•Jul 1 2025

Knee kinematics during daily activities reflect disease severity preoperatively and are associated with clinical outcomes after total knee arthroplasty (TKA). It is widely believed that measured kinematics would be useful for preoperative planning and postoperative assessment. Despite decades-long interest in measuring three-dimensional (3D) knee kinematics, no methods are available for routine, practical clinical examinations. We report a clinically practical method utilizing machine-learning-enhanced software and upgraded C-arm fluoroscopy for the accurate and time-efficient measurement of pre-TKA and post-TKA 3D dynamic knee kinematics. Using a common C-arm with an upgraded detector and software, we performed an 8-s horizontal sweeping pulsed fluoroscopic scan of the weight-bearing knee joint. The patient's knee was then imaged using pulsed C-arm fluoroscopy while performing standing, kneeling, squatting, stair, chair, and gait motion activities. We used limited-arc cone-beam reconstruction methods to create 3D models of the femur and tibia/fibula bones with implants, which can then be used to perform model-image registration to quantify the 3D knee kinematics. The proposed protocol can be accomplished by an individual radiology technician in ten minutes and does not require additional equipment beyond a step and stool. The image analysis can be performed by a computer onboard the upgraded c-arm or in the cloud, before loading the examination results into the Picture Archiving and Communication System and Electronic Medical Record systems. Weight-bearing kinematics affects knee function pre- and post-TKA. It has long been exclusively the domain of researchers to make such measurements. We present an approach that leverages common, but digitally upgraded, imaging hardware and software to implement an efficient examination protocol for accurately assessing 3D knee kinematics. With these capabilities, it will be possible to include dynamic 3D knee kinematics as a component of the routine clinical workup for patients who have diseased or replaced knees.

Fluoroscopy Registration Musculoskeletal Methodology Clinical Pilot Startup Benchmark SOTA

A novel deep learning framework for retinal disease detection leveraging contextual and local features cues from retinal images.

Khan SD, Basalamah S, Lbath A

•papers•Jul 1 2025

Retinal diseases are a serious global threat to human vision, and early identification is essential for effective prevention and treatment. However, current diagnostic methods rely on manual analysis of fundus images, which heavily depends on the expertise of ophthalmologists. This manual process is time-consuming and labor-intensive and can sometimes lead to missed diagnoses. With advancements in computer vision technology, several automated models have been proposed to improve diagnostic accuracy for retinal diseases and medical imaging in general. However, these methods face challenges in accurately detecting specific diseases within images due to inherent issues associated with fundus images, including inter-class similarities, intra-class variations, limited local information, insufficient contextual understanding, and class imbalances within datasets. To address these challenges, we propose a novel deep learning framework for accurate retinal disease classification. This framework is designed to achieve high accuracy in identifying various retinal diseases while overcoming inherent challenges associated with fundus images. Generally, the framework consists of three main modules. The first module is Densely Connected Multidilated Convolution Neural Network (DCM-CNN) that extracts global contextual information by effectively integrating novel Casual Dilated Dense Convolutional Blocks (CDDCBs). The second module of the framework, namely, Local-Patch-based Convolution Neural Network (LP-CNN), utilizes Class Activation Map (CAM) (obtained from DCM-CNN) to extract local and fine-grained information. To identify the correct class and minimize the error, a synergic network is utilized that takes the feature maps of both DCM-CNN and LP-CNN and connects both maps in a fully connected fashion to identify the correct class and minimize the errors. The framework is evaluated through a comprehensive set of experiments, both quantitatively and qualitatively, using two publicly available benchmark datasets: RFMiD and ODIR-5K. Our experimental results demonstrate the effectiveness of the proposed framework and achieves higher performance on RFMiD and ODIR-5K datasets compared to reference methods.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

Intraindividual Comparison of Image Quality Between Low-Dose and Ultra-Low-Dose Abdominal CT With Deep Learning Reconstruction and Standard-Dose Abdominal CT Using Dual-Split Scan.

Lee TY, Yoon JH, Park JY, Park SH, Kim H, Lee CM, Choi Y, Lee JM

•papers•Jul 1 2025

The aim of this study was to intraindividually compare the conspicuity of focal liver lesions (FLLs) between low- and ultra-low-dose computed tomography (CT) with deep learning reconstruction (DLR) and standard-dose CT with model-based iterative reconstruction (MBIR) from a single CT using dual-split scan in patients with suspected liver metastasis via a noninferiority design. This prospective study enrolled participants who met the eligibility criteria at 2 tertiary hospitals in South Korea from June 2022 to January 2023. The criteria included ( a ) being aged between 20 and 85 years and ( b ) having suspected or known liver metastases. Dual-source CT scans were conducted, with the standard radiation dose divided in a 2:1 ratio between tubes A and B (67% and 33%, respectively). The voltage settings of 100/120 kVp were selected based on the participant's body mass index (<30 vs ≥30 kg/m 2 ). For image reconstruction, MBIR was utilized for standard-dose (100%) images, whereas DLR was employed for both low-dose (67%) and ultra-low-dose (33%) images. Three radiologists independently evaluated FLL conspicuity, the probability of metastasis, and subjective image quality using a 5-point Likert scale, in addition to quantitative signal-to-noise and contrast-to-noise ratios. The noninferiority margins were set at -0.5 for conspicuity and -0.1 for detection. One hundred thirty-three participants (male = 58, mean body mass index = 23.0 ± 3.4 kg/m 2 ) were included in the analysis. The low- and ultra-low- dose had a lower radiation dose than the standard-dose (median CT dose index volume: 3.75, 1.87 vs 5.62 mGy, respectively, in the arterial phase; 3.89, 1.95 vs 5.84 in the portal venous phase, P < 0.001 for all). Median FLL conspicuity was lower in the low- and ultra-low-dose scans compared with the standard-dose (3.0 [interquartile range, IQR: 2.0, 4.0], 3.0 [IQR: 1.0, 4.0] vs 3.0 [IQR: 2.0, 4.0] in the arterial phase; 4.0 [IQR: 1.0, 5.0], 3.0 [IQR: 1.0, 4.0] vs 4.0 [IQR: 2.0, 5.0] in the portal venous phases), yet within the noninferiority margin ( P < 0.001 for all). FLL detection was also lower but remained within the margin (lesion detection rate: 0.772 [95% confidence interval, CI: 0.727, 0.812], 0.754 [0.708, 0.795], respectively) compared with the standard-dose (0.810 [95% CI: 0.770, 0.844]). Sensitivity for liver metastasis differed between the standard- (80.6% [95% CI: 76.0, 84.5]), low-, and ultra-low-doses (75.7% [95% CI: 70.2, 80.5], 73.7 [95% CI: 68.3, 78.5], respectively, P < 0.001 for both), whereas specificity was similar ( P > 0.05). Low- and ultra-low-dose CT with DLR showed noninferior FLL conspicuity and detection compared with standard-dose CT with MBIR. Caution is needed due to a potential decrease in sensitivity for metastasis ( clinicaltrials.gov/NCT05324046 ).

CT Reconstruction Abdominal Prospective Clinical Pilot Academic Lab Benchmark SOTA

Machine-learning model based on ultrasomics for non-invasive evaluation of fibrosis in IgA nephropathy.

Huang Q, Huang F, Chen C, Xiao P, Liu J, Gao Y

•papers•Jul 1 2025

To develop and validate an ultrasomics-based machine-learning (ML) model for non-invasive assessment of interstitial fibrosis and tubular atrophy (IF/TA) in patients with IgA nephropathy (IgAN). In this multi-center retrospective study, 471 patients with primary IgA nephropathy from four institutions were included (training, n = 275; internal testing, n = 69; external testing, n = 127; respectively). The least absolute shrinkage and selection operator logistic regression with tenfold cross-validation was used to identify the most relevant features. The ML models were constructed based on ultrasomics. The Shapley Additive Explanation (SHAP) was used to explore the interpretability of the models. Logistic regression analysis was employed to combine ultrasomics, clinical data, and ultrasound imaging characteristics, creating a comprehensive model. A receiver operating characteristic curve, calibration, decision curve, and clinical impact curve were used to evaluate prediction performance. To differentiate between mild and moderate-to-severe IF/TA, three prediction models were developed: the Rad_SVM_Model, Clinic_LR_Model, and Rad_Clinic_Model. The area under curves of these three models were 0.861, 0.884, and 0.913 in the training cohort, and 0.760, 0.860, and 0.894 in the internal validation cohort, as well as 0.794, 0.865, and 0.904 in the external validation cohort. SHAP identified the contribution of radiomics features. Difference analysis showed that there were significant differences between radiomics features and fibrosis. The comprehensive model was superior to that of individual indicators and performed well. We developed and validated a model that combined ultrasomics, clinical data, and clinical ultrasonic characteristics based on ML to assess the extent of fibrosis in IgAN. Question Currently, there is a lack of a comprehensive ultrasomics-based machine-learning model for non-invasive assessment of the extent of Immunoglobulin A nephropathy (IgAN) fibrosis. Findings We have developed and validated a robust and interpretable machine-learning model based on ultrasomics for assessing the degree of fibrosis in IgAN. Clinical relevance The machine-learning model developed in this study has significant interpretable clinical relevance. The ultrasomics-based comprehensive model had the potential for non-invasive assessment of fibrosis in IgAN, which helped evaluate disease progress.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Automated vs manual cardiac MRI planning: a single-center prospective evaluation of reliability and scan times.

Glessgen C, Crowe LA, Wetzl J, Schmidt M, Yoon SS, Vallée JP, Deux JF

•papers•Jul 1 2025

Evaluating the impact of an AI-based automated cardiac MRI (CMR) planning software on procedure errors and scan times compared to manual planning alone. Consecutive patients undergoing non-stress CMR were prospectively enrolled at a single center (August 2023-February 2024) and randomized into manual, or automated scan execution using prototype software. Patients with pacemakers, targeted indications, or inability to consent were excluded. All patients underwent the same CMR protocol with contrast, in breath-hold (BH) or free breathing (FB). Supervising radiologists recorded procedure errors (plane prescription, forgotten views, incorrect propagation of cardiac planes, and field-of-view mismanagement). Scan times and idle phase (non-acquisition portion) were computed from scanner logs. Most data were non-normally distributed and compared using non-parametric tests. Eighty-two patients (mean age, 51.6 years ± 17.5; 56 men) were included. Forty-four patients underwent automated and 38 manual CMRs. The mean rate of procedure errors was significantly (p = 0.01) lower in the automated (0.45) than in the manual group (1.13). The rate of error-free examinations was higher (p = 0.03) in the automated (31/44; 70.5%) than in the manual group (17/38; 44.7%). Automated studies were shorter than manual studies in FB (30.3 vs 36.5 min, p < 0.001) but had similar durations in BH (42.0 vs 43.5 min, p = 0.42). The idle phase was lower in automated studies for FB and BH strategies (both p < 0.001). An AI-based automated software performed CMR at a clinical level with fewer planning errors and improved efficiency compared to manual planning. Question What is the impact of an AI-based automated CMR planning software on procedure errors and scan times compared to manual planning alone? Findings Software-driven examinations were more reliable (71% error-free) than human-planned ones (45% error-free) and showed improved efficiency with reduced idle time. Clinical relevance CMR examinations require extensive technologist training, and continuous attention, and involve many planning steps. A fully automated software reliably acquired non-stress CMR potentially reducing mistake risk and increasing data homogeneity.

MRI Reconstruction Cardiac Prospective Clinical Pilot Academic Lab Benchmark SOTA

CT-based clinical-radiomics model to predict progression and drive clinical applicability in locally advanced head and neck cancer.

Bruixola G, Dualde-Beltrán D, Jimenez-Pastor A, Nogué A, Bellvís F, Fuster-Matanzo A, Alfaro-Cervelló C, Grimalt N, Salhab-Ibáñez N, Escorihuela V, Iglesias ME, Maroñas M, Alberich-Bayarri Á, Cervantes A, Tarazona N

•papers•Jul 1 2025

Definitive chemoradiation is the primary treatment for locally advanced head and neck carcinoma (LAHNSCC). Optimising outcome predictions requires validated biomarkers, since TNM8 and HPV could have limitations. Radiomics may enhance risk stratification. This single-centre observational study collected clinical data and baseline CT scans from 171 LAHNSCC patients treated with chemoradiation. The dataset was divided into training (80%) and test (20%) sets, with a 5-fold cross-validation on the training set. Researchers extracted 108 radiomics features from each primary tumour and applied survival analysis and classification models to predict progression-free survival (PFS) and 5-year progression, respectively. Performance was evaluated using inverse probability of censoring weights and c-index for the PFS model and AUC, sensitivity, specificity, and accuracy for the 5-year progression model. Feature importance was measured by the SHapley Additive exPlanations (SHAP) method and patient stratification was assessed through Kaplan-Meier curves. The final dataset included 171 LAHNSCC patients, with 53% experiencing disease progression at 5 years. The random survival forest model best predicted PFS, with an AUC of 0.64 and CI of 0.66 on the test set, highlighting 4 radiomics features and TNM8 as significant contributors. It successfully stratified patients into low and high-risk groups (log-rank p < 0.005). The extreme gradient boosting model most effectively predicted a 5-year progression, incorporating 12 radiomics features and four clinical variables, achieving an AUC of 0.74, sensitivity of 0.53, specificity of 0.81, and accuracy of 0.66 on the test set. The combined clinical-radiomics model improved the standard TNM8 and clinical variables in predicting 5-year progression though further validation is necessary. Question There is an unmet need for non-invasive biomarkers to guide treatment in locally advanced head and neck cancer. Findings Clinical data (TNM8 staging, primary tumour site, age, and smoking) plus radiomics improved 5-year progression prediction compared with the clinical comprehensive model or TNM staging alone. Clinical relevance SHAP simplifies complex machine learning radiomics models for clinicians by using easy-to-understand graphical representations, promoting explainability.

CT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Using deep feature distances for evaluating the perceptual quality of MR image reconstructions.

Robust and generalizable artificial intelligence for multi-organ segmentation in ultra-low-dose total-body PET imaging: a multi-center and cross-tracer study.

Integrating multi-scale information and diverse prompts in large model SAM-Med2D for accurate left ventricular ejection fraction estimation.

Automatic recognition and differentiation of pulmonary contusion and bacterial pneumonia based on deep learning and radiomics.

A Workflow-Efficient Approach to Pre- and Post-Operative Assessment of Weight-Bearing Three-Dimensional Knee Kinematics.

A novel deep learning framework for retinal disease detection leveraging contextual and local features cues from retinal images.

Intraindividual Comparison of Image Quality Between Low-Dose and Ultra-Low-Dose Abdominal CT With Deep Learning Reconstruction and Standard-Dose Abdominal CT Using Dual-Split Scan.

Machine-learning model based on ultrasomics for non-invasive evaluation of fibrosis in IgA nephropathy.

Automated vs manual cardiac MRI planning: a single-center prospective evaluation of reliability and scan times.

CT-based clinical-radiomics model to predict progression and drive clinical applicability in locally advanced head and neck cancer.

Ready to Sharpen Your Edge?