Latest Papers on Radiology AI. Tags: Benchmark SOTA

Machine learning for myocarditis diagnosis using cardiovascular magnetic resonance: a systematic review, diagnostic test accuracy meta-analysis, and comparison with human physicians.

Łajczak P, Sahin OK, Matyja J, Puglla Sanchez LR, Sayudo IF, Ayesha A, Lopes V, Majeed MW, Krishna MM, Joseph M, Pereira M, Obi O, Silva R, Lecchi C, Schincariol M

•papers•Sep 9 2025

Myocarditis is an inflammation of heart tissue. Cardiovascular magnetic resonance imaging (CMR) has emerged as an important non-invasive imaging tool for diagnosing myocarditis, however, interpretation remains a challenge for novice physicians. Advancements in machine learning (ML) models have further improved diagnostic accuracy, demonstrating good performance. Our study aims to assess the diagnostic accuracy of ML in identifying myocarditis using CMR. A systematic search was performed using PubMed, Embase, Web of Science, Cochrane, and Scopus to identify studies reporting the diagnostic accuracy of ML in the detection of myocarditis using CMR. The included studies evaluated both image-based and report-based assessments using various ML models. Diagnostic accuracy was estimated using a Random-Effects model (R software). We found a total of 141 ML model results from a total of 12 studies, which were included in the systematic review. The best models achieved 0.93 (95% Confidence Interval (CI) 0.88-0.96) sensitivity and 0.95 (95% CI 0.89-0.97) specificity. Pooled area under the curve was 0.97 (95% CI 0.93-0.98). Comparisons with human physicians showed comparable results for diagnostic accuracy of myocarditis. Quality assessment concerns and heterogeneity were present. CMR augmented using ML models with advanced algorithms can provide high diagnostic accuracy for myocarditis, even surpassing novice CMR radiologists. However, high heterogeneity, quality assessment concerns, and lack of information on cost-effectiveness may limit the clinical implementation of ML. Future investigations should explore cost-effectiveness and minimize biases in their methodologies.

MRI Classification Cardiac Meta Analysis In Silico Benchmark SOTA

Spherical Harmonics Representation Learning for High-Fidelity and Generalizable Super-Resolution in Diffusion MRI.

Wu R, Cheng J, Li C, Zou J, Fan W, Ma X, Guo H, Liang Y, Wang S

•papers•Sep 9 2025

Diffusion magnetic resonance imaging (dMRI) often suffers from low spatial and angular resolution due to inherent limitations in imaging hardware and system noise, adversely affecting the accurate estimation of microstructural parameters with fine anatomical details. Deep learning-based super-resolution techniques have shown promise in enhancing dMRI resolution without increasing acquisition time. However, most existing methods are confined to either spatial or angular super-resolution, disrupting the information exchange between the two domains and limiting their effectiveness in capturing detailed microstructural features. Furthermore, traditional pixel-wise loss functions only consider pixel differences, and struggle to recover intricate image details essential for high-resolution reconstruction. We propose SHRL-dMRI, a novel Spherical Harmonics Representation Learning framework for high-fidelity, generalizable super-resolution in dMRI to address these challenges. SHRL-dMRI explores implicit neural representations and spherical harmonics to model continuous spatial and angular representations, simultaneously enhancing both spatial and angular resolution while improving the accuracy of microstructural parameter estimation. To further preserve image fidelity, a data-fidelity module and wavelet-based frequency loss are introduced, ensuring the super-resolved images preserve image consistency and retain fine details. Extensive experiments demonstrate that, compared to five other state-of-the-art methods, our method significantly enhances dMRI data resolution, improves the accuracy of microstructural parameter estimation, and provides better generalization capabilities. It maintains stable performance even under a 45× downsampling factor. The proposed method can effectively improve the resolution of dMRI data without increasing the acquisition time, providing new possibilities for future clinical applications.

MRI Reconstruction Neurological Methodology In Silico Academic Lab Benchmark SOTA

Automatic bone age assessment: a Turkish population study.

Öztürk S, Yüce M, Pamuk GG, Varlık C, Cimilli AT, Atay M

•papers•Sep 8 2025

Established methods for bone age assessment (BAA), such as the Greulich and Pyle atlas, suffer from variability due to population differences and observer discrepancies. Although automated BAA offers speed and consistency, limited research exists on its performance across different populations using deep learning. This study examines deep learning algorithms on the Turkish population to enhance bone age models by understanding demographic influences. We analyzed reports from Bağcılar Hospital's Health Information Management System between April 2012 and September 2023 using "bone age" as a keyword. Patient images were re-evaluated by an experienced radiologist and anonymized. A total of 2,730 hand radiographs from Bağcılar Hospital (Turkish population), 12,572 from the Radiological Society of North America (RSNA), and 6,185 from the Radiological Hand Pose Estimation (RHPE) public datasets were collected, along with corresponding bone ages and gender information. A random set of 546 radiographs (273 from Bağcılar, 273 from public datasets) was initially randomly split for an internal test set with bone age stratification; the remaining data were used for training and validation. BAAs were generated using a modified InceptionV3 model on 500 × 500-pixel images, selecting the model with the lowest mean absolute error (MAE) on the validation set. Three models were trained and tested based on dataset origin: Bağcılar (Turkish), public (RSNA-RHPE), and a Combined model. Internal test set predictions of the Combined model estimated bone age within less than 6, 12, 18, and 24 months at rates of 44%, 73%, 87%, and 94%, respectively. The MAE was 9.2 months in the overall internal test set, 7 months on the public test set, and 11.5 months on the Bağcılar internal test data. The Bağcılar-only model had an MAE of 12.7 months on the Bağcılar internal test data. Despite less training data, there was no significant difference between the combined and Bağcılar models on the Bağcılar dataset (<i>P</i> > 0.05). The public model showed an MAE of 16.5 months on the Bağcılar dataset, significantly worse than the other models (<i>P</i> < 0.05). We developed an automatic BAA model including the Turkish population, one of the few such studies using deep learning. Despite challenges from population differences and data heterogeneity, these models can be effectively used in various clinical settings. Model accuracy can improve over time with cumulative data, and publicly available datasets may further refine them. Our approach enables more accurate and efficient BAAs, supporting healthcare professionals where traditional methods are time-consuming and variable. The developed automated BAA model for the Turkish population offers a reliable and efficient alternative to traditional methods. By utilizing deep learning with diverse datasets from Bağcılar Hospital and publicly available sources, the model minimizes assessment time and reduces variability. This advancement enhances clinical decision-making, supports standardized BAA practices, and improves patient care in various healthcare settings.

X-Ray Registration Musculoskeletal Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

Automated Radiographic Total Sharp Score (ARTSS) in Rheumatoid Arthritis: A Solution to Reduce Inter-Intra Reader Variation and Enhancing Clinical Practice

Hajar Moradmand, Lei Ren

•preprint•Sep 8 2025

Assessing the severity of rheumatoid arthritis (RA) using the Total Sharp/Van Der Heijde Score (TSS) is crucial, but manual scoring is often time-consuming and subjective. This study introduces an Automated Radiographic Sharp Scoring (ARTSS) framework that leverages deep learning to analyze full-hand X-ray images, aiming to reduce inter- and intra-observer variability. The research uniquely accommodates patients with joint disappearance and variable-length image sequences. We developed ARTSS using data from 970 patients, structured into four stages: I) Image pre-processing and re-orientation using ResNet50, II) Hand segmentation using UNet.3, III) Joint identification using YOLOv7, and IV) TSS prediction using models such as VGG16, VGG19, ResNet50, DenseNet201, EfficientNetB0, and Vision Transformer (ViT). We evaluated model performance with Intersection over Union (IoU), Mean Average Precision (MAP), mean absolute error (MAE), Root Mean Squared Error (RMSE), and Huber loss. The average TSS from two radiologists was used as the ground truth. Model training employed 3-fold cross-validation, with each fold consisting of 452 training and 227 validation samples, and external testing included 291 unseen subjects. Our joint identification model achieved 99% accuracy. The best-performing model, ViT, achieved a notably low Huber loss of 0.87 for TSS prediction. Our results demonstrate the potential of deep learning to automate RA scoring, which can significantly enhance clinical practice. Our approach addresses the challenge of joint disappearance and variable joint numbers, offers timesaving benefits, reduces inter- and intra-reader variability, improves radiologist accuracy, and aids rheumatologists in making more informed decisions.

X-Ray Classification Musculoskeletal Methodology In Silico Academic Lab Benchmark SOTA

Curia: A Multi-Modal Foundation Model for Radiology

Corentin Dancette, Julien Khlaut, Antoine Saporta, Helene Philippe, Elodie Ferreres, Baptiste Callard, Théo Danielou, Léo Alberge, Léo Machado, Daniel Tordjman, Julie Dupuis, Korentin Le Floch, Jean Du Terrail, Mariam Moshiri, Laurent Dercle, Tom Boeken, Jules Gregory, Maxime Ronot, François Legou, Pascal Roux, Marc Sapoval, Pierre Manceron, Paul Hérent

•preprint•Sep 8 2025

AI-assisted radiological interpretation is based on predominantly narrow, single-task models. This approach is impractical for covering the vast spectrum of imaging modalities, diseases, and radiological findings. Foundation models (FMs) hold the promise of broad generalization across modalities and in low-data settings. However, this potential has remained largely unrealized in radiology. We introduce Curia, a foundation model trained on the entire cross-sectional imaging output of a major hospital over several years, which to our knowledge is the largest such corpus of real-world data-encompassing 150,000 exams (130 TB). On a newly curated 19-task external validation benchmark, Curia accurately identifies organs, detects conditions like brain hemorrhages and myocardial infarctions, and predicts outcomes in tumor staging. Curia meets or surpasses the performance of radiologists and recent foundation models, and exhibits clinically significant emergent properties in cross-modality, and low-data regimes. To accelerate progress, we release our base model's weights at https://huggingface.co/raidium/curia.

Mixed Modality Classification Whole Body Methodology In Silico Academic Lab Benchmark SOTA Open Code GenAI

Radiologist-AI Collaboration for Ischemia Diagnosis in Small Bowel Obstruction: Multicentric Development and External Validation of a Multimodal Deep Learning Model

Vanderbecq, Q., Xia, W. F., Chouzenoux, E., Pesquet, J.-c., Zins, M., Wagner, M.

•preprint•Sep 8 2025

PurposeTo develop and externally validate a multimodal AI model for detecting ischaemia complicating small-bowel obstruction (SBO). MethodsWe combined 3D CT data with routine laboratory markers (C-reactive protein, neutrophil count) and, optionally, radiology report text. From two centers, 1,350 CT examinations were curated; 771 confirmed SBO scans were used for model development with patient-level splits. Ischemia labels were defined by surgical confirmation within 24 hours of imaging. Models (MViT, ResNet-101, DaViT) were trained as unimodal and multimodal variants. External testing was used for 66 independent cases from a third center. Two radiologists (attending, resident) read the test set with and without AI assistance. Performance was assessed using AUC, sensitivity, specificity, and 95% bootstrap confidence intervals; predictions included a confidence score. ResultsThe image-plus-laboratory model performed best on external testing (AUC 0.69 [0.59-0.79], sensitivity 0.89 [0.76-1.00], and specificity 0.44 [0.35-0.54]). Adding report text improved internal validation but did not generalize externally; image+text and full multimodal variants did not exceed image+laboratory performance. Without AI, the attending outperformed the resident (AUC 0.745 [0.617-0.845] vs 0.706 [0.581-0.818]); with AI, both improved, attending 0.752 [0.637-0.853] and resident 0.752 [0.629-0.867], rising to 0.750 [0.631-0.839] and 0.773 [0.657-0.867] with confidence display; differences were not statistically significant. ConclusionA multimodal AI that combines CT images with routine laboratory markers outperforms single-modality approaches and boosts radiologist readers performance notably junior, supporting earlier, more consistent decisions within the first 24 hours. Key PointsA multimodal artificial intelligence (AI) model that combines CT images with laboratory markers detected ischemia in small-bowel obstruction with AUC 0.69 (95% CI 0.59-0.79) and sensitivity 0.89 (0.76-1.00) on external testing, outperforming single-modality models. Adding report text did not generalize across sites: the image+text model fell from AUC 0.82 (internal) to 0.53 (external), and adding text to image+biology left external AUC unchanged (0.69) with similar specificity (0.43-0.44). With AI assistance both junior and senior readers improved; the juniors AUC rose from 0.71 to 0.77, reaching senior-level performance. Summary StatementA multicentric AI model combining CT and routine laboratory data (CRP and neutrophilia) improved radiologists detection of ischemia in small-bowel obstruction. This tool supports earlier decision-making within the first 24 hours.

CT Detection Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

XBusNet: Text-Guided Breast Ultrasound Segmentation via Multimodal Vision-Language Learning

Raja Mallina, Bryar Shareef

•preprint•Sep 8 2025

Background: Precise breast ultrasound (BUS) segmentation supports reliable measurement, quantitative analysis, and downstream classification, yet remains difficult for small or low-contrast lesions with fuzzy margins and speckle noise. Text prompts can add clinical context, but directly applying weakly localized text-image cues (e.g., CAM/CLIP-derived signals) tends to produce coarse, blob-like responses that smear boundaries unless additional mechanisms recover fine edges. Methods: We propose XBusNet, a novel dual-prompt, dual-branch multimodal model that combines image features with clinically grounded text. A global pathway based on a CLIP Vision Transformer encodes whole-image semantics conditioned on lesion size and location, while a local U-Net pathway emphasizes precise boundaries and is modulated by prompts that describe shape, margin, and Breast Imaging Reporting and Data System (BI-RADS) terms. Prompts are assembled automatically from structured metadata, requiring no manual clicks. We evaluate on the Breast Lesions USG (BLU) dataset using five-fold cross-validation. Primary metrics are Dice and Intersection over Union (IoU); we also conduct size-stratified analyses and ablations to assess the roles of the global and local paths and the text-driven modulation. Results: XBusNet achieves state-of-the-art performance on BLU, with mean Dice of 0.8765 and IoU of 0.8149, outperforming six strong baselines. Small lesions show the largest gains, with fewer missed regions and fewer spurious activations. Ablation studies show complementary contributions of global context, local boundary modeling, and prompt-based modulation. Conclusions: A dual-prompt, dual-branch multimodal design that merges global semantics with local precision yields accurate BUS segmentation masks and improves robustness for small, low-contrast lesions.

Ultrasound Segmentation Breast Methodology In Silico Academic Lab Benchmark SOTA

Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts

Sukhdeep Bal, Emma Colbourne, Jasmine Gan, Ludovica Griffanti, Taylor Hanayik, Nele Demeyere, Jim Davies, Sarah T Pendlebury, Mark Jenkinson

•preprint•Sep 8 2025

Quantification of brain atrophy currently requires visual rating scales which are time consuming and automated brain image analysis is warranted. We validated our automated deep learning (DL) tool measuring the Global Cerebral Atrophy (GCA) score against trained human raters, and associations with age and cognitive impairment, in representative older (>65 years) patients. CT-brain scans were obtained from patients in acute medicine (ORCHARD-EPR), acute stroke (OCS studies) and a legacy sample. Scans were divided in a 60/20/20 ratio for training, optimisation and testing. CT-images were assessed by two trained raters (rater-1=864 scans, rater-2=20 scans). Agreement between DL tool-predicted GCA scores (range 0-39) and the visual ratings was evaluated using mean absolute error (MAE) and Cohen's weighted kappa. Among 864 scans (ORCHARD-EPR=578, OCS=200, legacy scans=86), MAE between the DL tool and rater-1 GCA scores was 3.2 overall, 3.1 for ORCHARD-EPR, 3.3 for OCS and 2.6 for the legacy scans and half had DL-predicted GCA error between -2 and 2. Inter-rater agreement was Kappa=0.45 between the DL-tool and rater-1, and 0.41 between the tool and rater- 2 whereas it was lower at 0.28 for rater-1 and rater-2. There was no difference in GCA scores from the DL-tool and the two raters (one-way ANOVA, p=0.35) or in mean GCA scores between the DL-tool and rater-1 (paired t-test, t=-0.43, p=0.66), the tool and rater-2 (t=1.35, p=0.18) or between rater-1 and rater-2 (t=0.99, p=0.32). DL-tool GCA scores correlated with age and cognitive scores (both p<0.001). Our DL CT-brain analysis tool measured GCA score accurately and without user input in real-world scans acquired from older patients. Our tool will enable extraction of standardised quantitative measures of atrophy at scale for use in health data research and will act as proof-of-concept towards a point-of-care clinically approved tool.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

AI-based response assessment and prediction in longitudinal imaging for brain metastases treated with stereotactic radiosurgery

Lorenz Achim Kuhn, Daniel Abler, Jonas Richiardi, Andreas F. Hottinger, Luis Schiappacasse, Vincent Dunet, Adrien Depeursinge, Vincent Andrearczyk

•preprint•Sep 8 2025

Brain Metastases (BM) are a large contributor to mortality of patients with cancer. They are treated with Stereotactic Radiosurgery (SRS) and monitored with Magnetic Resonance Imaging (MRI) at regular follow-up intervals according to treatment guidelines. Analyzing and quantifying this longitudinal imaging represents an intractable workload for clinicians. As a result, follow-up images are not annotated and merely assessed by observation. Response to treatment in longitudinal imaging is being studied, to better understand growth trajectories and ultimately predict treatment success or toxicity as early as possible. In this study, we implement an automated pipeline to curate a large longitudinal dataset of SRS treatment data, resulting in a cohort of 896 BMs in 177 patients who were monitored for >360 days at approximately two-month intervals at Lausanne University Hospital (CHUV). We use a data-driven clustering to identify characteristic trajectories. In addition, we predict 12 months lesion-level response using classical as well as graph machine learning Graph Machine Learning (GML). Clustering revealed 5 dominant growth trajectories with distinct final response categories. Response prediction reaches up to 0.90 AUC (CI95%=0.88-0.92) using only pre-treatment and first follow-up MRI with gradient boosting. Similarly, robust predictive performance of up to 0.88 AUC (CI95%=0.86-0.90) was obtained using GML, offering more flexibility with a single model for multiple input time-points configurations. Our results suggest potential automation and increased precision for the comprehensive assessment and prediction of BM response to SRS in longitudinal MRI. The proposed pipeline facilitates scalable data curation for the investigation of BM growth patterns, and lays the foundation for clinical decision support systems aiming at optimizing personalized care.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

Prediction of oncogene mutation status in non-small cell lung cancer: a systematic review and meta-analysis with a special focus on artificial intelligence-based methods.

Fuster-Matanzo A, Picó-Peris A, Bellvís-Bataller F, Jimenez-Pastor A, Weiss GJ, Martí-Bonmatí L, Lázaro Sánchez A, Bazaga D, Banna GL, Addeo A, Camps C, Seijo LM, Alberich-Bayarri Á

•papers•Sep 8 2025

In non-small cell lung cancer (NSCLC), non-invasive alternatives to biopsy-dependent driver mutation analysis are needed. We reviewed the effectiveness of radiomics alone or with clinical data and assessed the performance of artificial intelligence (AI) models in predicting oncogene mutation status. A PRISMA-compliant literature review for studies predicting oncogene mutation status in NSCLC patients using radiomics was conducted by a multidisciplinary team. Meta-analyses evaluating the performance of AI-based models developed with CT-derived radiomics features alone or combined with clinical data were performed. A meta-regression to analyze the influence of different predictors was also conducted. Of 890 studies identified, 124 evaluating models for the prediction of epidermal growth factor-1 (EGFR), anaplastic lymphoma kinase (ALK), and Kirsten rat sarcoma virus (KRAS) mutations were included in the systematic review, of which 51 were meta-analyzed. The AI algorithms' sensitivity/false positive rate (FPR) in predicting mutation status using radiomics-based models was 0.754 (95% CI 0.727-0.780)/0.344 (95% CI 0.308-0.381) for EGFR, 0.754 (95% CI 0.638-0.841)/0.225 (95% CI 0.163-0.302) for ALK and 0.475 (95% CI 0.153-0.820)/0.181 (95% CI 0.054-0.461) for KRAS. A meta-analysis of combined models was possible for EGFR mutation, revealing a sensitivity of 0.806 (95% CI 0.777-0.833) and a FPR of 0.315 (95% CI 0.270-0.364). No statistically significant results were obtained in the meta-regression. Radiomics-based models may offer a non-invasive alternative for determining oncogene mutation status in NSCLC. Further research is required to analyze whether clinical data might boost their performance. Question Can imaging-based radiomics and artificial intelligence non-invasively predict oncogene mutation status to improve diagnosis in non-small cell lung cancer (NSCLC)? Findings Radiomics-based models achieved high performance in predicting mutation status in NSCLC; adding clinical data showed limited improvement in predictive performance. Clinical relevance Radiomics and AI tools offer a non-invasive strategy to support molecular profiling in NSCLC. Validation studies addressing clinical and methodological aspects are essential to ensure their reliability and integration into routine clinical practice.

CT Classification Chest Meta Analysis In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Machine learning for myocarditis diagnosis using cardiovascular magnetic resonance: a systematic review, diagnostic test accuracy meta-analysis, and comparison with human physicians.

Spherical Harmonics Representation Learning for High-Fidelity and Generalizable Super-Resolution in Diffusion MRI.

Automatic bone age assessment: a Turkish population study.

Automated Radiographic Total Sharp Score (ARTSS) in Rheumatoid Arthritis: A Solution to Reduce Inter-Intra Reader Variation and Enhancing Clinical Practice

Curia: A Multi-Modal Foundation Model for Radiology

Radiologist-AI Collaboration for Ischemia Diagnosis in Small Bowel Obstruction: Multicentric Development and External Validation of a Multimodal Deep Learning Model

XBusNet: Text-Guided Breast Ultrasound Segmentation via Multimodal Vision-Language Learning

Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts

AI-based response assessment and prediction in longitudinal imaging for brain metastases treated with stereotactic radiosurgery

Prediction of oncogene mutation status in non-small cell lung cancer: a systematic review and meta-analysis with a special focus on artificial intelligence-based methods.

Ready to Sharpen Your Edge?