Latest Papers on Radiology AI. Tags: Benchmark SOTA

[Advances in low-dose cone-beam computed tomography image reconstruction methods based on deep learning].

Shi J, Song Y, Li G, Bai S

•papers•Jun 25 2025

Cone-beam computed tomography (CBCT) is widely used in dentistry, surgery, radiotherapy and other medical fields. However, repeated CBCT scans expose patients to additional radiation doses, increasing the risk of secondary malignant tumors. Low-dose CBCT image reconstruction technology, which employs advanced algorithms to reduce radiation dose while enhancing image quality, has emerged as a focal point of recent research. This review systematically examined deep learning-based methods for low-dose CBCT reconstruction. It compared different network architectures in terms of noise reduction, artifact removal, detail preservation, and computational efficiency, covering three approaches: image-domain, projection-domain, and dual-domain techniques. The review also explored how emerging technologies like multimodal fusion and self-supervised learning could enhance these methods. By summarizing the strengths and weaknesses of current approaches, this work provides insights to optimize low-dose CBCT algorithms and support their clinical adoption.

CT Reconstruction Review Concept Academic Lab Benchmark SOTA

Computed tomography-derived quantitative imaging biomarkers enable the prediction of disease manifestations and survival in patients with systemic sclerosis.

Sieren MM, Grasshoff H, Riemekasten G, Berkel L, Nensa F, Hosch R, Barkhausen J, Kloeckner R, Wegner F

•papers•Jun 25 2025

Systemic sclerosis (SSc) is a complex inflammatory vasculopathy with diverse symptoms and variable disease progression. Despite its known impact on body composition (BC), clinical decision-making has yet to incorporate these biomarkers. This study aims to extract quantitative BC imaging biomarkers from CT scans to assess disease severity, define BC phenotypes, track changes over time and predict survival. CT exams were extracted from a prospectively maintained cohort of 452 SSc patients. 128 patients with at least one CT exam were included. An artificial intelligence-based 3D body composition analysis (BCA) algorithm assessed muscle volume, different adipose tissue compartments, and bone mineral density. These parameters were analysed with regard to various clinical, laboratory, functional parameters and survival. Phenotypes were identified performing K-means cluster analysis. Longitudinal evaluation of BCA changes employed regression analyses. A regression model using BCA parameters outperformed models based on Body Mass Index and clinical parameters in predicting survival (area under the curve (AUC)=0.75). Longitudinal development of the cardiac marker enabled prediction of survival with an AUC=0.82. Patients with altered BCA parameters had increased ORs for various complications, including interstitial lung disease (p<0.05). Two distinct BCA phenotypes were identified, showing significant differences in gastrointestinal disease manifestations (p<0.01). This study highlights several parameters with the potential to reshape clinical pathways for SSc patients. Quantitative BCA biomarkers offer a means to predict survival and individual disease manifestations, in part outperforming established parameters. These insights open new avenues for research into the mechanisms driving body composition changes in SSc and for developing enhanced disease management tools, ultimately leading to more personalised and effective patient care.

CT Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA

U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs

Racheal Mukisa, Arvind K. Bansal

•preprint•Jun 25 2025

Artificial intelligence, including deep learning models, will play a transformative role in automated medical image analysis for the diagnosis of cardiac disorders and their management. Automated accurate delineation of cardiac images is the first necessary initial step for the quantification and automated diagnosis of cardiac disorders. In this paper, we propose a deep learning based enhanced UNet model, U-R-Veda, which integrates convolution transformations, vision transformer, residual links, channel-attention, and spatial attention, together with edge-detection based skip-connections for an accurate fully-automated semantic segmentation of cardiac magnetic resonance (CMR) images. The model extracts local-features and their interrelationships using a stack of combination convolution blocks, with embedded channel and spatial attention in the convolution block, and vision transformers. Deep embedding of channel and spatial attention in the convolution block identifies important features and their spatial localization. The combined edge information with channel and spatial attention as skip connection reduces information-loss during convolution transformations. The overall model significantly improves the semantic segmentation of CMR images necessary for improved medical image analysis. An algorithm for the dual attention module (channel and spatial attention) has been presented. Performance results show that U-R-Veda achieves an average accuracy of 95.2%, based on DSC metrics. The model outperforms the accuracy attained by other models, based on DSC and HD metrics, especially for the delineation of right-ventricle and left-ventricle-myocardium.

MRI Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

Zhu Q, Hou B, Mathai TS, Mukherjee P, Jin Q, Chen X, Wang Z, Cheng R, Summers RM, Lu Z

•papers•Jun 25 2025

This study introduces a novel evaluation framework, GPTRadScore, to systematically assess the performance of multimodal large language models (MLLMs) in generating clinically accurate findings from CT imaging. Specifically, GPTRadScore leverages LLMs as an evaluation metric, aiming to provide a more accurate and clinically informed assessment than traditional language-specific methods. Using this framework, we evaluate the capability of several MLLMs, including GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, to interpret findings in CT scans. This retrospective study leverages a subset of the public DeepLesion dataset to evaluate the performance of several multimodal LLMs in describing findings in CT slices. GPTRadScore was developed to assess the generated descriptions (location, body part, and type) using GPT-4, alongside traditional metrics. RadFM was fine-tuned using a subset of the DeepLesion dataset with additional labeled examples targeting complex findings. Post fine-tuning, performance was reassessed using GPTRadScore to measure accuracy improvements. Evaluations demonstrated a high correlation of GPTRadScore with clinician assessments, with Pearson's correlation coefficients of 0.87, 0.91, 0.75, 0.90, and 0.89. These results highlight its superiority over traditional metrics, such as BLEU, METEOR, and ROUGE, and indicate that GPTRadScore can serve as a reliable evaluation metric. Using GPTRadScore, it was observed that while GPT-4V and Gemini Pro Vision outperformed other models, significant areas for improvement remain, primarily due to limitations in the datasets used for training. Fine-tuning RadFM resulted in substantial accuracy gains: location accuracy increased from 3.41% to 12.8%, body part accuracy improved from 29.12% to 53%, and type accuracy rose from 9.24% to 30%. These findings reinforce the hypothesis that fine-tuning RadFM can significantly enhance its performance. GPT-4 effectively correlates with expert assessments, validating its use as a reliable metric for evaluating multimodal LLMs in radiological diagnostics. Additionally, the results underscore the efficacy of fine-tuning approaches in improving the descriptive accuracy of LLM-generated medical imaging findings.

CT LLM Radiology Report Whole Body Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

High-performance Open-source AI for Breast Cancer Detection and Localization in MRI.

Hirsch L, Sutton EJ, Huang Y, Kayis B, Hughes M, Martinez D, Makse HA, Parra LC

•papers•Jun 25 2025

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To develop and evaluate an open-source deep learning model for detection and localization of breast cancer on MRI. Materials and Methods In this retrospective study, a deep learning model for breast cancer detection and localization was trained on the largest breast MRI dataset to date. Data included all breast MRIs conducted at a tertiary cancer center in the United States between 2002 and 2019. The model was validated on sagittal MRIs from the primary site (n = 6,615 breasts). Generalizability was assessed by evaluating model performance on axial data from the primary site (n = 7,058 breasts) and a second clinical site (n = 1,840 breasts). Results The primary site dataset included 30,672 sagittal MRI examinations (52,598 breasts) from 9,986 female patients (mean [SD] age, 53 [11] years). The model achieved an area under the receiver operating characteristic curve (AUC) of 0.95 for detecting cancer in the primary site. At 90% specificity (5717/6353), model sensitivity was 83% (217/262), which was comparable to historical performance data for radiologists. The model generalized well to axial examinations, achieving an AUC of 0.92 on data from the same clinical site and 0.92 on data from a secondary site. The model accurately located the tumor in 88.5% (232/262) of sagittal images, 92.8% (272/293) of axial images from the primary site, and 87.7% (807/920) of secondary site axial images. Conclusion The model demonstrated state-of-the-art performance on breast cancer detection. Code and weights are openly available to stimulate further development and validation. ©RSNA, 2025.

MRI Detection Breast Retrospective Clinical In Silico Academic Lab Open Code Benchmark SOTA Open Dataset

Diagnostic Performance of Radiomics for Differentiating Intrahepatic Cholangiocarcinoma from Hepatocellular Carcinoma: A Systematic Review and Meta-analysis.

Wang D, Sun L

•papers•Jun 25 2025

Differentiating intrahepatic cholangiocarcinoma (ICC) from hepatocellular carcinoma (HCC) is essential for selecting the most effective treatment strategies. However, traditional imaging modalities and serum biomarkers often lack sufficient specificity. Radiomics, a sophisticated image analysis approach that derives quantitative data from medical imaging, has emerged as a promising non-invasive tool. To systematically review and meta-analyze the radiomics diagnostic accuracy in differentiating ICC from HCC. PubMed, EMBASE, and Web of Science databases were systematically searched through January 24, 2025. Studies evaluating radiomics models for distinguishing ICC from HCC were included. Assessing the quality of included studies was done by using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) and METhodological RadiomICs Score tools. Pooled sensitivity, specificity, and area under the curve (AUC) were calculated using a bivariate random-effects model. Subgroup and publication bias analyses were also performed. 12 studies with 2541 patients were included, with 14 validation cohorts entered into meta-analysis. The pooled sensitivity and specificity of radiomics models were 0.82 (95% CI: 0.76-0.86) and 0.90 (95% CI: 0.85-0.93), respectively, with an AUC of 0.88 (95% CI: 0.85-0.91). Subgroup analyses revealed variations based on segmentation method, software used, and sample size, though not all differences were statistically significant. Publication bias was not detected. Radiomics demonstrates high diagnostic accuracy in distinguishing ICC from HCC and offers a non-invasive adjunct to conventional diagnostics. Further prospective, multicenter studies with standardized workflows are needed to enhance clinical applicability and reproducibility.

CT Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

The evaluation of artificial intelligence in mammography-based breast cancer screening: Is breast-level analysis enough?

Taib AG, Partridge GJW, Yao L, Darker I, Chen Y

•papers•Jun 25 2025

To assess whether the diagnostic performance of a commercial artificial intelligence (AI) algorithm for mammography differs between breast-level and lesion-level interpretations and to compare performance to a large population of specialised human readers. We retrospectively analysed 1200 mammograms from the NHS breast cancer screening programme using a commercial AI algorithm and assessments from 1258 trained human readers from the Personal Performance in Mammographic Screening (PERFORMS) external quality assurance programme. For breasts containing pathologically confirmed malignancies, a breast and lesion-level analysis was performed. The latter considered the locations of marked regions of interest for AI and humans. The highest score per lesion was recorded. For non-malignant breasts, a breast-level analysis recorded the highest score per breast. Area under the curve (AUC), sensitivity and specificity were calculated at the developer's recommended threshold for recall. The study was designed to detect a medium-sized effect (odds ratio 3.5 or 0.29) for sensitivity. The test set contained 882 non-malignant (73%) and 318 malignant breasts (27%), with 328 cancer lesions. The AI AUC was 0.942 at breast level and 0.929 at lesion level (difference -0.013, p < 0.01). The mean human AUC was 0.878 at breast level and 0.851 at lesion level (difference -0.027, p < 0.01). AI outperformed human readers at the breast and lesion level (ps < 0.01, respectively) according to the AUC. AI's diagnostic performance significantly decreased at the lesion level, indicating reduced accuracy in localising malignancies. However, its overall performance exceeded that of human readers. Question AI often recalls mammography cases not recalled by humans; to understand why, we as humans must consider the regions of interest it has marked as cancerous. Findings Evaluations of AI typically occur at the breast level, but performance decreases when AI is evaluated on a lesion level. This also occurs for humans. Clinical relevance To improve human-AI collaboration, AI should be assessed at the lesion level; poor accuracy here may lead to automation bias and unnecessary patient procedures.

Mammography Detection Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Comparative Analysis of Automated vs. Expert-Designed Machine Learning Models in Age-Related Macular Degeneration Detection and Classification.

Durmaz Engin C, Beşenk U, Özizmirliler D, Selver MA

•papers•Jun 25 2025

To compare the effectiveness of expert-designed machine learning models and code-free automated machine learning (AutoML) models in classifying optical coherence tomography (OCT) images for detecting age-related macular degeneration (AMD) and distinguishing between its dry and wet forms. Custom models were developed by an artificial intelligence expert using the EfficientNet V2 architecture, while AutoML models were created by an ophthalmologist utilizing LobeAI with transfer learning via ResNet-50 V2. Both models were designed to differentiate normal OCT images from AMD and to also distinguish between dry and wet AMD. The models were trained and tested using an 80:20 split, with each diagnostic group containing 500 OCT images. Performance metrics, including sensitivity, specificity, accuracy, and F1 scores, were calculated and compared. The expert-designed model achieved an overall accuracy of 99.67% for classifying all images, with F1 scores of 0.99 or higher across all binary class comparisons. In contrast, the AutoML model achieved an overall accuracy of 89.00%, with F1 scores ranging from 0.86 to 0.90 in binary comparisons. Notably lower recall was observed for dry AMD vs. normal (0.85) in the AutoML model, indicating challenges in correctly identifying dry AMD. While the AutoML models demonstrated acceptable performance in identifying and classifying AMD cases, the expert-designed models significantly outperformed them. The use of advanced neural network architectures and rigorous optimization in the expert-developed models underscores the continued necessity of expert involvement in the development of high-precision diagnostic tools for medical image classification.

OCT Classification Methodology In Silico Academic Lab Benchmark SOTA

Few-Shot Learning for Prostate Cancer Detection on MRI: Comparative Analysis with Radiologists' Performance.

Yamagishi Y, Baba Y, Suzuki J, Okada Y, Kanao K, Oyama M

•papers•Jun 25 2025

Deep-learning models for prostate cancer detection typically require large datasets, limiting clinical applicability across institutions due to domain shift issues. This study aimed to develop a few-shot learning deep-learning model for prostate cancer detection on multiparametric MRI that requires minimal training data and to compare its diagnostic performance with experienced radiologists. In this retrospective study, we used 99 cases (80 positive, 19 negative) of biopsy-confirmed prostate cancer (2017-2022), with 20 cases for training, 5 for validation, and 74 for testing. A 2D transformer model was trained on T2-weighted, diffusion-weighted, and apparent diffusion coefficient map images. Model predictions were compared with two radiologists using Matthews correlation coefficient (MCC) and F1 score, with 95% confidence intervals (CIs) calculated via bootstrap method. The model achieved an MCC of 0.297 (95% CI: 0.095-0.474) and F1 score of 0.707 (95% CI: 0.598-0.847). Radiologist 1 had an MCC of 0.276 (95% CI: 0.054-0.484) and F1 score of 0.741; Radiologist 2 had an MCC of 0.504 (95% CI: 0.289-0.703) and F1 score of 0.871, showing that the model performance was comparable to Radiologist 1. External validation on the Prostate158 dataset revealed that ImageNet pretraining substantially improved model performance, increasing study-level ROC-AUC from 0.464 to 0.636 and study-level PR-AUC from 0.637 to 0.773 across all architectures. Our findings demonstrate that few-shot deep-learning models can achieve clinically relevant performance when using pretrained transformer architectures, offering a promising approach to address domain shift challenges across institutions.

MRI Detection Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Generalizable medical image enhancement using structure-preserved diffusion models.

Chen L, Yu X, Li H, Lin H, Niu K, Li H

•papers•Jun 25 2025

Clinical medical images often suffer from compromised quality, which negatively impacts the diagnostic process by both clinicians and AI algorithms. While GAN-based enhancement methods have been commonly developed in recent years, delicate model training is necessary due to issues with artifacts, mode collapse, and instability. Diffusion models have shown promise in generating high-quality images superior to GANs, but challenges in training data collection and domain gaps hinder applying them for medical image enhancement. Additionally, preserving fine structures in enhancing medical images with diffusion models is still an area that requires further exploration. To overcome these challenges, we propose structure-preserved diffusion models for generalizable medical image enhancement (GEDM). GEDM leverages joint supervision from enhancement and segmentation to boost structure preservation and generalizability. Specifically, synthetic data is used to collect high-low quality paired training data with structure masks, and the Laplace transform is employed to reduce domain gaps and introduce multi-scale conditions. GEDM conducts medical image enhancement and segmentation jointly, supervised by high-quality references and structure masks from the training data. Four datasets of two medical imaging modalities were collected to implement the experiments, where GEDM outperformed state-of-the-art methods in image enhancement, as well as follow-up medical analysis tasks.

Mixed Modality Image Synthesis Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

[Advances in low-dose cone-beam computed tomography image reconstruction methods based on deep learning].

Computed tomography-derived quantitative imaging biomarkers enable the prediction of disease manifestations and survival in patients with systemic sclerosis.

U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs

How well do multimodal LLMs interpret CT scans? An auto-evaluation framework for analyses.

High-performance Open-source AI for Breast Cancer Detection and Localization in MRI.

Diagnostic Performance of Radiomics for Differentiating Intrahepatic Cholangiocarcinoma from Hepatocellular Carcinoma: A Systematic Review and Meta-analysis.

The evaluation of artificial intelligence in mammography-based breast cancer screening: Is breast-level analysis enough?

Comparative Analysis of Automated vs. Expert-Designed Machine Learning Models in Age-Related Macular Degeneration Detection and Classification.

Few-Shot Learning for Prostate Cancer Detection on MRI: Comparative Analysis with Radiologists' Performance.

Generalizable medical image enhancement using structure-preserved diffusion models.

Ready to Sharpen Your Edge?