Latest Papers on Radiology AI. Tags: Benchmark SOTA, Order: Best Match, Limit: 10.

Beyond Benchmarks: Towards Robust Artificial Intelligence Bone Segmentation in Socio-Technical Systems

Xie, K., Gruber, L. J., Crampen, M., Li, Y., Ferreira, A., Tappeiner, E., Gillot, M., Schepers, J., Xu, J., Pankert, T., Beyer, M., Shahamiri, N., ten Brink, R., Dot, G., Weschke, C., van Nistelrooij, N., Verhelst, P.-J., Guo, Y., Xu, Z., Bienzeisler, J., Rashad, A., Flügge, T., Cotton, R., Vinayahalingam, S., Ilesan, R., Raith, S., Madsen, D., Seibold, C., Xi, T., Berge, S., Nebelung, S., Kodym, O., Sundqvist, O., Thieringer, F., Lamecker, H., Coppens, A., Potrusil, T., Kraeima, J., Witjes, M., Wu, G., Chen, X., Lambrechts, A., Cevidanes, L. H. S., Zachow, S., Hermans, A., Truhn, D., Alves,

•preprint•Jun 13 2025

Despite the advances in automated medical image segmentation, AI models still underperform in various clinical settings, challenging real-world integration. In this multicenter evaluation, we analyzed 20 state-of-the-art mandibular segmentation models across 19,218 segmentations of 1,000 clinically resampled CT/CBCT scans. We show that segmentation accuracy varies by up to 25% depending on socio-technical factors such as voxel size, bone orientation, and patient conditions such as osteosynthesis or pathology. Higher sharpness, isotropic smaller voxels, and neutral orientation significantly improved results, while metallic osteosynthesis and anatomical complexity led to significant degradation. Our findings challenge the common view of AI models as "plug-and-play" tools and suggest evidence-based optimization recommendations for both clinicians and developers. This will in turn boost the integration of AI segmentation tools in routine healthcare.

CT Segmentation Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Does restrictive anorexia nervosa impact brain aging? A machine learning approach to estimate age based on brain structure.

Gupta Y, de la Cruz F, Rieger K, di Giuliano M, Gaser C, Cole J, Breithaupt L, Holsen LM, Eddy KT, Thomas JJ, Cetin-Karayumak S, Kubicki M, Lawson EA, Miller KK, Misra M, Schumann A, Bär KJ

•papers•Jun 13 2025

Anorexia nervosa (AN), a severe eating disorder marked by extreme weight loss and malnutrition, leads to significant alterations in brain structure. This study used machine learning (ML) to estimate brain age from structural MRI scans and investigated brain-predicted age difference (brain-PAD) as a potential biomarker in AN. Structural MRI scans were collected from female participants aged 10-40 years across two institutions (Boston, USA, and Jena, Germany), including acute AN (acAN; n=113), weight-restored AN (wrAN; n=35), and age-matched healthy controls (HC; n=90). The ML model was trained on 3487 healthy female participants (ages 5-45 years) from ten datasets, using 377 neuroanatomical features extracted from T1-weighted MRI scans. The model achieved strong performance with a mean absolute error (MAE) of 1.93 years and a correlation of r = 0.88 in HCs. In acAN patients, brain age was overestimated by an average of +2.25 years, suggesting advanced brain aging. In contrast, wrAN participants showed significantly lower brain-PAD than acAN (+0.26 years, p=0.0026) and did not differ from HC (p=0.98), suggesting normalization of brain age estimates following weight restoration. A significant group-by-age interaction effect on predicted brain age (p<0.001) indicated that brain age deviations were most pronounced in younger acAN participants. Brain-PAD in acAN was significantly negatively associated with BMI (r = -0.291, p<sub>fdr</sub> = 0.005), but not in wrAN or HC groups. Importantly, no significant associations were found between brain-PAD and clinical symptom severity. These findings suggest that acute AN is linked to advanced brain aging during the acute stage, and that may partially normalize following weight recovery.

MRI Registration Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Quantitative and qualitative assessment of ultra-low-dose paranasal sinus CT using deep learning image reconstruction: a comparison with hybrid iterative reconstruction.

Otgonbaatar C, Lee D, Choi J, Jang H, Shim H, Ryoo I, Jung HN, Suh S

•papers•Jun 13 2025

This study aimed to evaluate the quantitative and qualitative performances of ultra-low-dose computed tomography (CT) with deep learning image reconstruction (DLR) compared with those of hybrid iterative reconstruction (IR) for preoperative paranasal sinus (PNS) imaging. This retrospective analysis included 132 patients who underwent non-contrast ultra-low-dose sinus CT (0.03 mSv). Images were reconstructed using hybrid IR and DLR. Objective image quality metrics, including image noise, signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), noise power spectrum (NPS), and no-reference perceptual image sharpness, were assessed. Two board-certified radiologists independently performed subjective image quality evaluations. The ultra-low-dose CT protocol achieved a low radiation dose (effective dose: 0.03 mSv). DLR showed significantly lower image noise (28.62 ± 4.83 Hounsfield units) compared to hybrid IR (140.70 ± 16.04, p < 0.001), with DLR yielding smoother and more uniform images. DLR demonstrated significantly improved SNR (22.47 ± 5.82 vs 9.14 ± 2.45, p < 0.001) and CNR (71.88 ± 14.03 vs 11.81 ± 1.50, p < 0.001). NPS analysis revealed that DLR reduced the noise magnitude and NPS peak values. Additionally, DLR demonstrated significantly sharper images (no-reference perceptual sharpness metric: 0.56 ± 0.04) compared to hybrid IR (0.36 ± 0.01). Radiologists rated DLR as superior in overall image quality, bone structure visualization, and diagnostic confidence compared to hybrid IR at ultra-low-dose CT. DLR significantly outperformed hybrid IR in ultra-low-dose PNS CT by reducing image noise, improving SNR and CNR, enhancing image sharpness, and maintaining critical anatomical visualization, demonstrating its potential for effective preoperative planning with minimal radiation exposure. Question Ultra-low-dose CT for paranasal sinuses is essential for patients requiring repeated scans and functional endoscopic sinus surgery (FESS) planning to reduce cumulative radiation exposure. Findings DLR outperformed hybrid IR in ultra-low-dose paranasal sinus CT. Clinical relevance Ultra-low-dose CT with DLR delivers sufficient image quality for detailed surgical planning, effectively minimizing unnecessary radiation exposure to enhance patient safety.

CT Reconstruction Retrospective Clinical Clinical Pilot Academic Lab Benchmark SOTA

Prediction of NIHSS Scores and Acute Ischemic Stroke Severity Using a Cross-attention Vision Transformer Model with Multimodal MRI.

Tuxunjiang P, Huang C, Zhou Z, Zhao W, Han B, Tan W, Wang J, Kukun H, Zhao W, Xu R, Aihemaiti A, Subi Y, Zou J, Xie C, Chang Y, Wang Y

•papers•Jun 13 2025

This study aimed to develop and evaluate models for classifying the severity of neurological impairment in acute ischemic stroke (AIS) patients using multimodal MRI data. A retrospective cohort of 1227 AIS patients was collected and categorized into mild (NIHSS<5) and moderate-to-severe (NIHSS≥5) stroke groups based on NIHSS scores. Eight baseline models were constructed for performance comparison, including a clinical model, radiomics models using DWI or multiple MRI sequences, and deep learning (DL) models with varying fusion strategies (early fusion, later fusion, full cross-fusion, and DWI-centered cross-fusion). All DL models were based on the Vision Transformer (ViT) framework. Model performance was evaluated using metrics such as AUC and ACC, and robustness was assessed through subgroup analyses and visualization using Grad-CAM. Among the eight models, the DL model using DWI as the primary sequence with cross-fusion of other MRI sequences (Model 8) achieved the best performance. In the test cohort, Model 8 demonstrated an AUC of 0.914, ACC of 0.830, and high specificity (0.818) and sensitivity (0.853). Subgroup analysis shows that model 8 is robust in most subgroups with no significant prediction difference (p > 0.05), and the AUC value consistently exceeds 0.900. A significant predictive difference was observed in the BMI group (p < 0.001). The results of external validation showed that the AUC values of the model 8 in center 2 and center 3 reached 0.910 and 0.912, respectively. Visualization using Grad-CAM emphasized the infarct core as the most critical region contributing to predictions, with consistent feature attention across DWI, T1WI, T2WI, and FLAIR sequences, further validating the interpretability of the model. A ViT-based DL model with cross-modal fusion strategies provides a non-invasive and efficient tool for classifying AIS severity. Its robust performance across subgroups and interpretability make it a promising tool for personalized management and decision-making in clinical practice.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and developing a Transformer Implementation for Breast Cancer Treatment Response Prediction

Naomi Fridman, Bubby Solway, Tomer Fridman, Itamar Barnea, Anat Goldshtein

•preprint•Jun 13 2025

Breast cancer remains a leading cause of cancer-related mortality worldwide, making early detection and accurate treatment response monitoring critical priorities. We present BreastDCEDL, a curated, deep learning-ready dataset comprising pre-treatment 3D Dynamic Contrast-Enhanced MRI (DCE-MRI) scans from 2,070 breast cancer patients drawn from the I-SPY1, I-SPY2, and Duke cohorts, all sourced from The Cancer Imaging Archive. The raw DICOM imaging data were rigorously converted into standardized 3D NIfTI volumes with preserved signal integrity, accompanied by unified tumor annotations and harmonized clinical metadata including pathologic complete response (pCR), hormone receptor (HR), and HER2 status. Although DCE-MRI provides essential diagnostic information and deep learning offers tremendous potential for analyzing such complex data, progress has been limited by lack of accessible, public, multicenter datasets. BreastDCEDL addresses this gap by enabling development of advanced models, including state-of-the-art transformer architectures that require substantial training data. To demonstrate its capacity for robust modeling, we developed the first transformer-based model for breast DCE-MRI, leveraging Vision Transformer (ViT) architecture trained on RGB-fused images from three contrast phases (pre-contrast, early post-contrast, and late post-contrast). Our ViT model achieved state-of-the-art pCR prediction performance in HR+/HER2- patients (AUC 0.94, accuracy 0.93). BreastDCEDL includes predefined benchmark splits, offering a framework for reproducible research and enabling clinically meaningful modeling in breast cancer imaging.

MRI Classification Breast Dataset Release In Silico Academic Lab Open Dataset Benchmark SOTA Reproducibility

InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation

Daniya Najiha Abdul Kareem, Abdul Hannan, Mubashir Noman, Jean Lahoud, Mustansar Fiaz, Hisham Cholakkal

•preprint•Jun 13 2025

Accurate microscopic medical image segmentation plays a crucial role in diagnosing various cancerous cells and identifying tumors. Driven by advancements in deep learning, convolutional neural networks (CNNs) and transformer-based models have been extensively studied to enhance receptive fields and improve medical image segmentation task. However, they often struggle to capture complex cellular and tissue structures in challenging scenarios such as background clutter and object overlap. Moreover, their reliance on the availability of large datasets for improved performance, along with the high computational cost, limit their practicality. To address these issues, we propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features and offers both performance and computational efficiency. Specifically, we exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features to handle the blurred region boundaries (e.g., cell boundaries). These enriched features are input to a hybrid model that combines an Inception depth-wise convolution with a Mamba block, to maintain high efficiency and capture inherent variations in the scales and shapes of the regions of interest. These enriched features along with low-resolution features are fused to get the final segmentation mask. Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets (SegPC21 and GlaS) and two skin lesion segmentation datasets (ISIC2017 and ISIC2018), while reducing computational cost by about 5 times compared to the previous best performing method.

OCT Segmentation Methodology In Silico Academic Lab Benchmark SOTA

Recent Advances in sMRI and Artificial Intelligence for Presurgical Planning in Focal Cortical Dysplasia: A Systematic Review.

Mahmoudi A, Alizadeh A, Ganji Z, Zare H

•papers•Jun 13 2025

Focal Cortical Dysplasia (FCD) is a leading cause of drug-resistant epilepsy, particularly in children and young adults, necessitating precise presurgical planning. Traditional structural MRI often fails to detect subtle FCD lesions, especially in MRI-negative cases. Recent advancements in Artificial Intelligence (AI), particularly Machine Learning (ML) and Deep Learning (DL), have the potential to enhance FCD detection's sensitivity and specificity. This systematic review, following PRISMA guidelines, searched PubMed, Embase, Scopus, Web of Science, and Science Direct for articles published from 2020 onwards, using keywords related to "Focal Cortical Dysplasia," "MRI," and "Artificial Intelligence/Machine Learning/Deep Learning." Included were original studies employing AI and structural MRI (sMRI) for FCD detection in humans, reporting quantitative performance metrics, and published in English. Data extraction was performed independently by two reviewers, with discrepancies resolved by a third. The included studies demonstrated that AI significantly improved FCD detection, achieving sensitivity up to 97.1% and specificities up to 84.3% across various MRI sequences, including MPRAGE, MP2RAGE, and FLAIR. AI models, particularly deep learning models, matched or surpassed human radiologist performance, with combined AI-human expertise reaching up to 87% detection rates. Among 88 full-text articles reviewed, 27 met inclusion criteria. The studies emphasized the importance of advanced MRI sequences and multimodal MRI for enhanced detection, though model performance varied with FCD type and training datasets. Recent advances in sMRI and AI, especially deep learning, offer substantial potential to improve FCD detection, leading to better presurgical planning and patient outcomes in drug-resistant epilepsy. These methods enable faster, more accurate, and automated FCD detection, potentially enhancing surgical decision-making. Further clinical validation and optimization of AI algorithms across diverse datasets are essential for broader clinical translation.

MRI Detection Neurological Review In Silico Academic Lab Benchmark SOTA

The Machine Learning Models in Major Cardiovascular Adverse Events Prediction Based on Coronary Computed Tomography Angiography: Systematic Review.

Ma Y, Li M, Wu H

•papers•Jun 13 2025

Coronary computed tomography angiography (CCTA) has emerged as the first-line noninvasive imaging test for patients at high risk of coronary artery disease (CAD). When combined with machine learning (ML), it provides more valid evidence in diagnosing major adverse cardiovascular events (MACEs). Radiomics provides informative multidimensional features that can help identify high-risk populations and can improve the diagnostic performance of CCTA. However, its role in predicting MACEs remains highly debated. We evaluated the diagnostic value of ML models constructed using radiomic features extracted from CCTA in predicting MACEs, and compared the performance of different learning algorithms and models, thereby providing clinical recommendations for the diagnosis, treatment, and prognosis of MACEs. We comprehensively searched 5 online databases, Cochrane Library, Web of Science, Elsevier, CNKI, and PubMed, up to September 10, 2024, for original studies that used ML models among patients who underwent CCTA to predict MACEs and reported clinical outcomes and endpoints related to it. Risk of bias in the ML models was assessed by the Prediction Model Risk of Bias Assessment Tool, while the radiomics quality score (RQS) was used to evaluate the methodological quality of the radiomics prediction model development and validation. We also followed the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guidelines to ensure transparency of ML models included. Meta-analysis was performed using Meta-DiSc software (version 1.4), which included the I² score and Cochran Q test, along with StataMP 17 (StataCorp) to assess heterogeneity and publication bias. Due to the high heterogeneity observed, subgroup analysis was conducted based on different model groups. Ten studies were included in the analysis, 5 (50%) of which differentiated between training and testing groups, where the training set collected 17 kinds of models and the testing set gathered 26 models. The pooled area under the receiver operating characteristic (AUROC) curve for ML models predicting MACEs was 0.7879 in the training set and 0.7981 in the testing set. Logistic regression (LR), the most commonly used algorithm, achieved an AUROC of 0.8229 in the testing group and 0.7983 in the training group. Non-LR models yielded AUROCs of 0.7390 in the testing set and 0.7648 in the training set, while the random forest (RF) models reached an AUROC of 0.8444 in the training group. Study limitations included a limited number of studies, high heterogeneity, and the types of included studies. The performance of ML models for predicting MACEs was found to be superior to that of general models based on basic feature extraction and integration from CCTA. Specifically, LR-based ML diagnostic models demonstrated significant clinical potential, particularly when combined with clinical features, and are worth further validation through more clinical trials. PROSPERO CRD42024596364; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024596364.

CT Classification Cardiac Meta Analysis In Silico Academic Lab Benchmark SOTA

Deep-Learning Based Contrast Boosting Improves Lesion Visualization and Image Quality: A Multi-Center Multi-Reader Study on Clinical Performance with Standard Contrast Enhanced MRI of Brain Tumors

Pasumarthi, S., Campbell Arnold, T., Colombo, S., Rudie, J. D., Andre, J. B., Elor, R., Gulaka, P., Shankaranarayanan, A., Erb, G., Zaharchuk, G.

•preprint•Jun 13 2025

BackgroundGadolinium-based Contrast Agents (GBCAs) are used in brain MRI exams to improve the visualization of pathology and improve the delineation of lesions. Higher doses of GBCAs can improve lesion sensitivity but involve substantial deviation from standard-of-care procedures and may have safety implications, particularly in the light of recent findings on gadolinium retention and deposition. PurposeTo evaluate the clinical performance of an FDA cleared deep-learning (DL) based contrast boosting algorithm in routine clinical brain MRI exams. MethodsA multi-center retrospective database of contrast-enhanced brain MRI images (obtained from April 2017 to December 2023) was used to evaluate a DL-based contrast boosting algorithm. Pre-contrast and standard post-contrast (SC) images were processed with the algorithm to obtain contrast boosted (CB) images. Quantitative performance of CB images in comparison to SC images was compared using contrast-to-noise ratio (CNR), lesion-to-brain ratio (LBR) and contrast enhancement percentage (CEP). Three board-certified radiologists reviewed CB and SC images side-by-side for qualitative evaluation and rated them on a 4-point Likert scale for lesion contrast enhancement, border delineation, internal morphology, overall image quality, presence of artefacts, and changes in vessel conspicuity. The presence, cause, and severity of any false lesions was recorded. CB results were compared to SC using Wilcoxon signed rank test for statistical significance. ResultsBrain MRI images from 110 patients (47 {+/-} 22 years; 52 Females, 47 Males, 11 N/A) were evaluated. CB images had superior quantitative performance than SC images in terms of CNR (+634%), LBR (+70%) and CEP (+150%). In the qualitative assessment CB images showed better lesion visualization (3.73 vs 3.16) and had better image quality (3.55 vs 3.07). Readers were able to rule out all false lesions on CB by using SC for comparison. ConclusionsDeep learning based contrast boosting improves lesion visualization and image quality without increasing contrast dosage. Key ResultsO_LIIn a retrospective study of 110 patients, deep-learning based contrast boosted (CB) images showed better lesion visualization than standard post-contrast (SC) brain MRI images (3.73 vs 3.16; mean reader scores [4-point Likert scale]) C_LIO_LICB images had better overall image quality than SC images (3.55 vs 3.07) C_LIO_LIContrast-to-noise ratio, Lesion-to-brain Ratio and Contrast Enhancement Percentage for CB images were significantly higher than SC images (+729%, +88% and +165%; p < 0.001) C_LI Summary StatementDeep-learning based contrast boosting achieves better lesion visualization and overall image quality and provides more contrast information, without increasing the contrast dosage in contrast-enhanced brain MR protocols.

MRI Image Synthesis Neurological Retrospective Clinical FDA Cleared FDA 510(k)Startup Benchmark SOTA

Radiomic Analysis of Molecular Magnetic Resonance Imaging of Aortic Atherosclerosis in Rabbits.

Lee H

•papers•Jun 13 2025

Atherosclerosis involves not only the narrowing of blood vessels and plaque accumulation but also changes in plaque composition and stability, all of which are critical for disease progression. Conventional imaging techniques such as magnetic resonance angiography (MRA) and digital subtraction angiography (DSA) primarily assess luminal narrowing and plaque size, but have limited capability in identifying plaque instability and inflammation within the vascular muscle wall. This study aimed to develop and evaluate a novel imaging approach using ligand-modified nanomagnetic contrast (lmNMC) nanoprobes in combination with molecular magnetic resonance imaging (mMRI) to visualize and quantify vascular inflammation and plaque characteristics in a rabbit model of atherosclerosis. A rabbit model of atherosclerosis was established and underwent mMRI before and after administration of lmNMC nanoprobes. Radiomic features were extracted from segmented images using discrete wavelet transform (DWT) to assess spatial frequency changes and gray-level co-occurrence matrix (GLCM) analysis to evaluate textural properties. Further radiomic analysis was performed using neural network-based regression and clustering, including the application of self-organizing maps (SOMs) to validate the consistency of radiomic pattern between training and testing data. Radiomic analysis revealed significant changes in spatial frequency between pre- and post-contrast images in both the horizontal and vertical directions. GLCM analysis showed an increase in contrast from 0.08463 to 0.1021 and a slight decrease in homogeneity from 0.9593 to 0.9540. Energy values declined from 0.2256 to 0.2019, while correlation increased marginally from 0.9659 to 0.9708. Neural network regression demonstrated strong convergence between target and output coordinates. Additionally, SOM clustering revealed consistent weight locations and neighbor distances across datasets, supporting the reliability of the radiomic validation. The integration of lmNMC nanoprobes with mMRI enables detailed visualization of atherosclerotic plaques and surrounding vascular inflammation in a preclinical model. This method shows promise for enhancing the characterization of unstable plaques and may facilitate early detection of high-risk atherosclerotic lesions, potentially improving diagnostic and therapeutic strategies.

MRI Classification Vascular Retrospective Clinical Phantom/Animal Academic Lab Benchmark SOTA

Beyond Benchmarks: Towards Robust Artificial Intelligence Bone Segmentation in Socio-Technical Systems

Does restrictive anorexia nervosa impact brain aging? A machine learning approach to estimate age based on brain structure.

Quantitative and qualitative assessment of ultra-low-dose paranasal sinus CT using deep learning image reconstruction: a comparison with hybrid iterative reconstruction.

Prediction of NIHSS Scores and Acute Ischemic Stroke Severity Using a Cross-attention Vision Transformer Model with Multimodal MRI.

BreastDCEDL: Curating a Comprehensive DCE-MRI Dataset and developing a Transformer Implementation for Breast Cancer Treatment Response Prediction

InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation

Recent Advances in sMRI and Artificial Intelligence for Presurgical Planning in Focal Cortical Dysplasia: A Systematic Review.

The Machine Learning Models in Major Cardiovascular Adverse Events Prediction Based on Coronary Computed Tomography Angiography: Systematic Review.

Deep-Learning Based Contrast Boosting Improves Lesion Visualization and Image Quality: A Multi-Center Multi-Reader Study on Clinical Performance with Standard Contrast Enhanced MRI of Brain Tumors

Radiomic Analysis of Molecular Magnetic Resonance Imaging of Aortic Atherosclerosis in Rabbits.

Ready to Sharpen Your Edge?