Latest Papers on Radiology AI. Since Date: 2025-09-15, Until Date: 2025-09-21.

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

Qiu J, Cao J, Huang Y, Zhu Z, Wang F, Lu C, Li Y, Zheng Y

•papers•Sep 19 2025

In the field of medical image analysis, medical image classification is one of the most fundamental and critical tasks. Current researches often rely on the off-the-shelf backbone networks derived from the field of computer vision, hoping to achieve satisfactory classification performance for medical images. However, given the characteristics of medical images, such as scattered distribution and varying sizes of lesions, features extracted with a single scale from the existing backbones often fail to perform accurate medical image classification. To this end, we propose a novel multi-scale learning paradigm, namely MUlti-SCale Learning with trusted Evidences (MUSCLE), which extracts and integrates features from different scales based on the theory of evidence, to generate the more comprehensive feature representation for the medical image classification task. Particularly, the proposed MUSCLE first estimates the uncertainties of features extracted from different scales/stages of the classification backbone as the evidences, and accordingly form the opinions regarding to the feature trustworthiness via a set of evidential deep neural networks. Then, these opinions on different scales of features are ensembled to yield an aggregated opinion, which can be used to adaptively tune the weights of multi-scale features for scatteredly distributed and size-varying lesions, and consequently improve the network capacity for accurate medical image classification. Our MUSCLE paradigm has been evaluated on five publicly available medical image datasets. The experimental results show that the proposed MUSCLE not only improves the accuracy of the original backbone network, but also enhances the reliability and interpretability of model decisions with the trusted evidences (https://github.com/Q4CS/MUSCLE).

Classification Methodology In Silico Open Code

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

Tran HH, Thu A, Twayana AR, Fuertes A, Gonzalez M, Basta M, James M, Mehta KA, Elias D, Figaro YM, Islek D, Frishman WH, Aronow WS

•papers•Sep 19 2025

Artificial intelligence (AI)-enabled multimodal cardiovascular imaging holds significant promise for improving diagnostic accuracy, enhancing risk stratification, and supporting clinical decision-making. However, its translation into routine practice remains limited by multiple technical, infrastructural, and clinical barriers. This review synthesizes current challenges, including variability in image quality, alignment, and acquisition protocols; scarcity of large, annotated multimodality datasets; interoperability limitations across vendors and institutions; clinical skepticism due to limited prospective validation; and substantial development and implementation costs. Drawing from recent advances, we outline future research priorities to bridge the gap between technical feasibility and clinical utility. Key strategies include developing unified, vendor-agnostic AI models resilient to inter-institutional variability; integrating diverse data types such as genomics, wearable biosensors, and longitudinal clinical records; leveraging reinforcement learning for adaptive decision-support systems; and employing longitudinal imaging fusion for disease tracking and predictive analytics. We emphasize the need for rigorous prospective clinical trials, harmonized imaging standards, and collaborative data-sharing frameworks to ensure robust, equitable, and scalable deployment. Addressing these challenges through coordinated multidisciplinary efforts will be essential to realize the full potential of AI-driven multimodal cardiovascular imaging in advancing precision cardiovascular care.

Mixed Modality Registration Cardiac Review Concept Academic Lab Benchmark SOTA

Assessing Inter-rater Reliability of ChatGPT-4 and Orthopaedic Clinicians in Radiographic Fracture Classification.

Walker AN, Smith JB, Simister SK, Patel O, Choudhary S, Seidu M, Dallas-Orr D, Tse S, Shahzad H, Wise P, Scott M, Saiz AM, Lum ZC

•papers•Sep 19 2025

To assess the inter-rater reliability of ChatGPT-4 to that of orthopaedic surgery attendings and residents in classifying fractures on upper extremity (UE) and lower extremity (LE) radiographs. 84 radiographs of various fracture patterns were collected from publicly available online repositories. These images were presented to ChatGPT-4 with the prompt asking it to identify the view, body location, fracture type, and AO/OTA fracture classification. Two orthopaedic surgery residents and two attending orthopaedic surgeons also independently reviewed the images and identified the same categories. Fleiss' Kappa values were calculated to determine inter-rater reliability (IRR) for the following: All Raters Combined, AI vs. Residents (AIR); AI vs. Attendings (AIA); Attendings vs. Residents (AR). ChatGPT-4 achieved substantial to almost perfect agreement with clinicians on location (UE: κ = 0.655-0.708, LE: κ = 0.834-0.909) and fracture type (UE: κ = 0.546-0.563, LE: κ = 0.58-0.697). For view, ChatGPT-4 showed consistent fair agreement for both UE (κ = 0.370-0.404) and LE (κ = 0.309-0.390). ChatGPT-4 struggled the most with AO/OTA classification achieving slight agreement for UE (κ = -0.062-0.159) and moderate agreement for LE (κ = 0.418-0.455). IRR for AIR was consistently lower than IRR for AR. For AR comparisons, almost perfect agreement was observed for location (UE: κ = 0.896, LE: κ = 0.912) and fracture type (UE: κ = 0.948, LE: κ = 0.859), while AO/OTA classification showed fair agreement for UE (κ = 0.257) and moderate for LE (κ = 0.517). The p-values for all comparison groups were significant except for LE AO/OTA classification between AI and residents (p = 0.051). Although ChatcGPT-4 showed promise in classifying basic fracture features, it was not yet at a level comparable to experts, especially with more nuanced interpretations. These findings suggest that the use of AI is more effective as an adjunct to the judgment of trained clinicians rather than a replacement for it.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Deep learning-based acceleration and denoising of 0.55T MRI for enhanced conspicuity of vestibular Schwannoma post contrast administration.

Hinsen M, Nagel A, Heiss R, May M, Wiesmueller M, Mathy C, Zeilinger M, Hornung J, Mueller S, Uder M, Kopp M

•papers•Sep 19 2025

Deep-learning (DL) based MRI denoising techniques promise improved image quality and shorter examination times. This advancement is particularly beneficial for 0.55T MRI, where the inherently lower signal-to-noise (SNR) ratio can compromise image quality. Sufficient SNR is crucial for the reliable detection of vestibular schwannoma (VS). The objective of this study is to evaluate the VS conspicuity and acquisition time (TA) of 0.55T MRI examinations with contrast agents using a DL-denoising algorithm. From January 2024 to October 2024, we retrospectively included 30 patients with VS (9 women). We acquired a clinical reference protocol of the cerebellopontine angle containing a T1w fat-saturated (fs) axial (number of signal averages [NSA] 4) and a T1w Spectral Attenuated Inversion Recovery (SPAIR) coronal (NSA 2) sequence after contrast agent (CA) application without advanced DL-based denoising (w/o DL). We reconstructed the T1w fs CA sequence axial and the T1w SPAIR CA coronal with full DL-denoising mode without change of NSA, and secondly with 1 NSA for T1w fs CA axial and T1w SPAIR coronal (DL&1NSA). Each sequence was rated on a 5-point Likert scale (1: insufficient, 3: moderate, clinically sufficient; 5: perfect) for: overall image quality; VS conspicuity, and artifacts. Secondly, we analyzed the reliability of the size measurements. Two radiologists specializing in head and neck imaging performed the reading and measurements. The Wilcoxon Signed-Rank Test was used for non-parametric statistical comparison. The DL&4NSA axial/coronal study sequence achieved the highest overall IQ (median 4.9). The image quality (IQ) for DL&1NSA was higher (M: 4.0) than for the reference sequence (w/o DL; median 4.0 versus 3.5, each p < 0.01). Similarly, the VS conspicuity was best for DL&4NSA (M: 4.9), decreased for DL&1NSA (M: 4.1), and was lower but still sufficient for w/o DL (M: 3.7, each p < 0.01). The TA for the axial and coronal post-contrast sequences was 8:59 minutes for DL&4NSA and w/o DL and decreased to 3:24 minutes with DL&1NSA. This study underlines that advanced DL-based denoising techniques can reduce the examination time by more than half while simultaneously improving image quality.

MRI Reconstruction Neurological Retrospective Clinical In Silico Academic Lab

AI-driven innovations for dental implant treatment planning: A systematic review.

Zaww K, Abbas H, Vanegas Sáenz JR, Hong G

•papers•Sep 19 2025

This systematic review evaluates the effectiveness of artificial intelligence (AI) models in dental implant treatment planning, focusing on: 1) identification, detection, and segmentation of anatomical structures; 2) technical assistance during treatment planning; and 3) additional relevant applications. A literature search of PubMed/MEDLINE, Scopus, and Web of Science was conducted for studies published in English until July 31, 2024. The included studies explored AI applications in implant treatment planning, excluding expert opinions, guidelines, and protocols. Three reviewers independently assessed study quality using the Joanna Briggs Institute (JBI) Critical Appraisal Checklist for Quasi-Experimental Studies, resolving disagreements by consensus. Of the 28 included studies, four were high, four were medium, and 20 were low quality according to the JBI scale. Eighteen studies on anatomical segmentation have demonstrated AI models with accuracy rates ranging from 66.4% to 99.1%. Eight studies examined AI's role in technical assistance for surgical planning, demonstrating its potential in predicting jawbone mineral density, optimizing drilling protocols, and classifying plans for maxillary sinus augmentation. One study indicated a learning curve for AI in implant planning, recommending at least 50 images for over 70% predictive accuracy. Another study reported 83% accuracy in localizing stent markers for implant sites, suggesting additional imaging planes to address a 17% miss rate and 2.8% false positives. AI models exhibit potential for automating dental implant planning with high accuracy in anatomical segmentation and insightful technical assistance. However, further well-designed studies with standardized evaluation parameters are required for pragmatic integration into clinical settings.

CT Segmentation Review In Silico Academic Lab Benchmark SOTA

Intratumoral and peritumoral heterogeneity based on CT to predict the pathological response after neoadjuvant chemoimmunotherapy in esophageal squamous cell carcinoma.

Ling X, Yang X, Wang P, Li Y, Wen Z, Wang J, Chen K, Yu Y, Liu A, Ma J, Meng W

•papers•Sep 19 2025

Neoadjuvant chemoimmunotherapy (NACI) regimen (camrelizumab plus paclitaxel and nedaplatin) has shown promising potential in patients with esophageal squamous cell carcinoma (ESCC), but accurately predicting the therapeutic response remains a challenge. To develop and validate a CT-based machine learning model that incorporates both intratumoral and peritumoral heterogeneity for predicting the pathological response of ESCC patients after NACI. Patients with ESCC who underwent surgery following NACI between June 2020 and July 2024 were included retrospectively and prospectively. Univariate and multivariate logistic regression analyses were performed to identify clinical variables associated with pathological response. Traditional radiomics features and habitat radiomics features from the intratumoral and peritumoral regions were extracted from post-treatment CT images, and six predictive models were established using 14 machine learning algorithms. The combined model was developed by integrating intratumoral and peritumoral habitat radiomics features with clinical variables. The performance of the models was evaluated using the area under the receiver operating characteristic curve (AUC). A total of 157 patients (mean [SD] age, 59.6 [6.5] years) were enrolled in our study, of whom 60 (38.2%) achieved major pathological response (MPR) and 40 (25.5%) achieved pathological complete response (pCR). The combined model demonstrated excellent predictive ability for MPR after NACI, with an AUC of 0.915 (95% CI, 0.844-0.981), accuracy of 0.872, sensitivity of 0.733, and specificity of 0.938 in the test set. In sensitivity analysis focusing on pCR, the combined model exhibited robust performance, with an AUC of 0.895 (95% CI, 0.782-0.980) in the test set. The combined model integrating intratumoral and peritumoral habitat radiomics features with clinical variables can accurately predict MPR in ESCC patients after NACI and shows promising potential in predicting pCR.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Fully Automated Image-Based Multiplexing of Serial PET/CT Imaging for Facilitating Comprehensive Disease Phenotyping.

Shiyam Sundar LK, Gutschmayer S, Pires M, Ferrara D, Nguyen T, Abdelhafez YG, Spencer B, Cherry SR, Badawi RD, Kersting D, Fendler WP, Kim MS, Lassen ML, Hasbak P, Schmidt F, Linder P, Mu X, Jiang Z, Abenavoli EM, Sciagrà R, Frille A, Wirtz H, Hesse S, Sabri O, Bailey D, Chan D, Callahan J, Hicks RJ, Beyer T

•papers•Sep 18 2025

Combined PET/CT imaging provides critical insights into both anatomic and molecular processes, yet traditional single-tracer approaches limit multidimensional disease phenotyping; to address this, we developed the PET Unified Multitracer Alignment (PUMA) framework-an open-source, postprocessing tool that multiplexes serial PET/CT scans for comprehensive voxelwise tissue characterization. Methods: PUMA utilizes artificial intelligence-based CT segmentation from multiorgan objective segmentation to generate multilabel maps of 24 body regions, guiding a 2-step registration: affine alignment followed by symmetric diffeomorphic registration. Tracer images are then normalized and assigned to red-green-blue channels for simultaneous visualization of up to 3 tracers. The framework was evaluated on longitudinal PET/CT scans from 114 subjects across multiple centers and vendors. Rigid, affine, and deformable registration methods were compared for optimal coregistration. Performance was assessed using the Dice similarity coefficient for organ alignment and absolute percentage differences in organ intensity and tumor SUVmean Results: Deformable registration consistently achieved superior alignment, with Dice similarity coefficient values exceeding 0.90 in 60% of organs while maintaining organ intensity differences below 3%; similarly, SUVmean differences for tumors were minimal at 1.6% ± 0.9%, confirming that PUMA preserves quantitative PET data while enabling robust spatial multiplexing. Conclusion: PUMA provides a vendor-independent solution for postacquisition multiplexing of serial PET/CT images, integrating complementary tracer data voxelwise into a composite image without modifying clinical protocols. This enhances multidimensional disease phenotyping and supports better diagnostic and therapeutic decisions using serial multitracer PET/CT imaging.

PET Segmentation Whole Body Methodology In Silico Open Source Open Code

Deep Learning for Automated Measures of SUV and Molecular Tumor Volume in [68Ga]PSMA-11 or [18F]DCFPyL, [18F]FDG, and [177Lu]Lu-PSMA-617 Imaging with Global Threshold Regional Consensus Network.

Jackson P, Buteau JP, McIntosh L, Sun Y, Kashyap R, Casanueva S, Ravi Kumar AS, Sandhu S, Azad AA, Alipour R, Saghebi J, Kong G, Jewell K, Eifer M, Bollampally N, Hofman MS

•papers•Sep 18 2025

Metastatic castration-resistant prostate cancer has a high rate of mortality with a limited number of effective treatments after hormone therapy. Radiopharmaceutical therapy with [177Lu]Lu-prostate-specific membrane antigen-617 (LuPSMA) is one treatment option; however, response varies and is partly predicted by PSMA expression and metabolic activity, assessed on [68Ga]PSMA-11 or [18F]DCFPyL and [18F]FDG PET, respectively. Automated methods to measure these on PET imaging have previously yielded modest accuracy. Refining computational workflows and standardizing approaches may improve patient selection and prognostication for LuPSMA therapy. Methods: PET/CT and quantitative SPECT/CT images from an institutional cohort of patients staged for LuPSMA therapy were annotated for total disease burden. In total, 676 [68Ga]PSMA-11 or [18F]DCFPyL PET, 390 [18F]FDG PET, and 477 LuPSMA SPECT images were used for development of automated workflow and tested on 56 cases with externally referred PET/CT staging. A segmentation framework, the Global Threshold Regional Consensus Network, was developed based on nnU-Net, with processing refinements to improve boundary definition and overall label accuracy. Results: Using the model to contour disease extent, the mean volumetric Dice similarity coefficient for [68Ga]PSMA-11 or [18F]DCFPyL PET was 0.94, for [18F]FDG PET was 0.84, and for LuPSMA SPECT was 0.97. On external test cases, Dice accuracy was 0.95 and 0.84 on PSMA and FDG PET, respectively. The refined models yielded consistent improvements compared with nnU-Net, with an increase of 3%-5% in Dice accuracy and 10%-17% in surface agreement. Quantitative biomarkers were compared with a human-defined ground truth using the Pearson coefficient, with scores for [68Ga]PSMA-11 or [18F]DCFPyL, [18F]FDG, and LuPSMA, respectively, of 0.98, 0.94, and 0.99 for disease volume; 0.98, 0.88, and 0.99 for SUVmean; 0.96, 0.91, and 0.99 for SUVmax; and 0.97, 0.96, and 0.99 for volume intensity product. Conclusion: Delineation of disease extent and tracer avidity can be performed with a high degree of accuracy using automated deep learning methods. By incorporating threshold-based postprocessing, the tools can closely match the output of manual workflows. Pretrained models and scripts to adapt to institutional data are provided for open use.

Mixed Modality Segmentation Abdominal Methodology In Silico Academic Lab Open Code

Optimising Generalisable Deep Learning Models for CT Coronary Segmentation: A Multifactorial Evaluation.

Zhang S, Gharleghi R, Singh S, Shen C, Adikari D, Zhang M, Moses D, Vickers D, Sowmya A, Beier S

•papers•Sep 18 2025

Coronary artery disease (CAD) remains a leading cause of morbidity and mortality worldwide, with incidence rates continuing to rise. Automated coronary artery medical image segmentation can ultimately improve CAD management by enabling more advanced and efficient diagnostic assessments. Deep learning-based segmentation methods have shown significant promise and offered higher accuracy while reducing reliance on manual inputs. However, achieving consistent performance across diverse datasets remains a persistent challenge due to substantial variability in imaging protocols, equipment and patient-specific factors, such as signal intensities, anatomical differences and disease severity. This study investigates the influence of image quality and resolution, governed by vessel size and common disease characteristics that introduce artefacts, such as calcification, on coronary artery segmentation accuracy in computed tomography coronary angiography (CTCA). Two datasets were utilised for model training and validation, including the publicly available ASOCA dataset (40 cases) and a GeoCAD dataset (70 cases) with more cases of coronary disease. Coronary artery segmentations were generated using three deep learning frameworks/architectures: default U-Net, Swin-UNETR, and EfficientNet-LinkNet. The impact of various factors on model generalisation was evaluated, focusing on imaging characteristics (contrast-to-noise ratio, artery contrast enhancement, and edge sharpness) and the extent of calcification at both the coronary tree and individual vessel branch levels. The calcification ranges considered were 0 (no calcification), 1-99 (low), 100-399 (moderate), and > 400 (high). The findings demonstrated that image features, including artery contrast enhancement (r = 0.408, p < 0.001) and edge sharpness (r = 0.239, p = 0.046), were significantly correlated with improved segmentation performance in test cases. Regardless of severity, calcification had a negative impact on segmentation accuracy, with low calcification affecting the segmentation most poorly (p < 0.05). This may be because smaller calcified lesions produce less distinct contrast against the bright lumen, making it harder for the model to accurately identify and segment these lesions. Additionally, in males, a larger diameter of the first obtuse marginal branch (OM1) (p = 0.036) was associated with improved segmentation performance for OM1. Similarly, in females, larger diameters of left main (LM) coronary artery (p = 0.008) and right coronary artery (RCA) (p < 0.001) were associated with better segmentation performance for LM and RCA, respectively. These findings emphasise the importance of accounting for imaging characteristics and anatomical variability when developing generalisable deep learning models for coronary artery segmentation. Unlike previous studies, which broadly acknowledge the role of image quality in segmentation, our work quantitatively demonstrates the extent to which contrast enhancement, edge sharpness, calcification and vessel diameter impact segmentation performance, offering a data-driven foundation for model adaptation strategies. Potential improvements include optimising pre-segmentation imaging (e.g. ensuring adequate edge sharpness in low-contrast regions) and developing algorithms to address vessel-specific challenges, such as improving segmentation of low-level calcifications and accurately identifying LM, RCA and OM1 of smaller diameters.

CT Segmentation Cardiac Methodology In Silico

Rapid and robust quantitative cartilage assessment for the clinical setting: deep learning-enhanced accelerated T2 mapping.

Carretero-Gómez L, Wiesinger F, Fung M, Nunes B, Pedoia V, Majumdar S, Desai AD, Gatti A, Chaudhari A, Sánchez-Lacalle E, Malpica N, Padrón M

•papers•Sep 18 2025

Clinical adoption of T2 mapping is limited by poor reproducibility, lengthy examination times, and cumbersome image analysis. This study aimed to develop an accelerated deep learning (DL)-enhanced cartilage T2 mapping sequence (DL CartiGram), demonstrate its repeatability and reproducibility, and evaluate its accuracy compared to conventional T2 mapping using a semi-automatic pipeline. DL CartiGram was implemented using a modified 2D Multi-Echo Spin-Echo sequence at 3 T, incorporating parallel imaging and DL-based image reconstruction. Phantom tests were performed at two sites to obtain test-retest T2 maps, using single-echo spin-echo (SE) measurements as reference values. At one site, DL CartiGram and conventional T2 mapping were performed on 43 patients. T2 values were extracted from 52 patellar and femoral compartments using DL knee segmentation and the DOSMA framework. Repeatability and reproducibility were assessed using coefficients of variation (CV), Bland-Altman analysis, and concordance correlation coefficients (CCC). T2 differences were evaluated with Wilcoxon signed-rank tests, paired t tests, and accuracy CV. Phantom tests showed intra-site repeatability with CVs ≤ 2.52% and T2 precision ≤ 1 ms. Inter-site reproducibility showed a CV of 2.74% and a CCC of 99% (CI 92-100%). Bland-Altman analysis showed a bias of 1.56 ms between sites (p = 0.03), likely due to temperature effects. In vivo, DL CartiGram reduced scan time by 40%, yielding accurate cartilage T2 measurements (CV = 0.97%) with no significant differences compared to conventional T2 mapping (p = 0.1). DL CartiGram significantly accelerates T2 mapping, while still assuring excellent repeatability and reproducibility. Combined with the semi-automatic post-processing pipeline, it emerges as a promising tool for quantitative T2 cartilage biomarker assessment in clinical settings.

MRI Reconstruction Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

MUSCLE: A New Perspective to Multi-scale Fusion for Medical Image Classification based on the Theory of Evidence.

AI-Driven Multimodality Fusion in Cardiac Imaging: Integrating CT, MRI, and Echocardiography for Precision.

Assessing Inter-rater Reliability of ChatGPT-4 and Orthopaedic Clinicians in Radiographic Fracture Classification.

Deep learning-based acceleration and denoising of 0.55T MRI for enhanced conspicuity of vestibular Schwannoma post contrast administration.

AI-driven innovations for dental implant treatment planning: A systematic review.

Intratumoral and peritumoral heterogeneity based on CT to predict the pathological response after neoadjuvant chemoimmunotherapy in esophageal squamous cell carcinoma.

Fully Automated Image-Based Multiplexing of Serial PET/CT Imaging for Facilitating Comprehensive Disease Phenotyping.

Deep Learning for Automated Measures of SUV and Molecular Tumor Volume in [<sup>68</sup>Ga]PSMA-11 or [<sup>18</sup>F]DCFPyL, [<sup>18</sup>F]FDG, and [<sup>177</sup>Lu]Lu-PSMA-617 Imaging with Global Threshold Regional Consensus Network.

Optimising Generalisable Deep Learning Models for CT Coronary Segmentation: A Multifactorial Evaluation.

Rapid and robust quantitative cartilage assessment for the clinical setting: deep learning-enhanced accelerated T2 mapping.

Ready to Sharpen Your Edge?