Latest Papers on Radiology AI. Tags: In Silico, Order: Best Match, Limit: 10.

Deep learning model based on ultrasound images predicts BRAF V600E mutation in papillary thyroid carcinoma.

Yu Y, Zhao C, Guo R, Zhang Y, Li X, Liu N, Lu Y, Han X, Tang X, Mao R, Peng C, Yu J, Zhou J

•papers•May 16 2025

BRAF V600E mutation status detection facilitates prognosis prediction in papillary thyroid carcinoma (PTC). We developed a deep-learning model to determine the BRAF V600E status in PTC. PTC from three centers were collected as the training set (1341 patients), validation set (148 patients), and external test set (135 patients). After testing the performance of the ResNeSt-50, Vision Transformer, and Swin Transformer V2 (SwinT) models, SwinT was chosen as the optimal backbone. An integrated BrafSwinT model was developed by combining the backbone with a radiomics feature branch and a clinical parameter branch. BrafSwinT demonstrated an AUC of 0.869 in the external test set, outperforming the original SwinT, Vision Transformer, and ResNeSt-50 models (AUC: 0.782-0.824; <i>p</i> value: 0.017-0.041). BrafSwinT showed promising results in determining BRAF V600E mutation status in PTC based on routinely acquired ultrasound images and basic clinical information, thus facilitating risk stratification.

Ultrasound Classification Retrospective Clinical In Silico Academic Lab

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Till T, Scherkl M, Stranger N, Singer G, Hankel S, Flucher C, Hržić F, Štajduhar I, Tschauner S

•papers•May 16 2025

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design. This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests. Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested. AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings. Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab Open Dataset

Characterizing ASD Subtypes Using Morphological Features from sMRI with Unsupervised Learning.

Raj A, Ratnaik R, Sengar SS, Fredo ARJ

•papers•May 15 2025

In this study, we attempted to identify the subtypes of autism spectrum disorder (ASD) with the help of anatomical alterations found in structural magnetic resonance imaging (sMRI) data of the ASD brain and machine learning tools. Initially, the sMRI data was preprocessed using the FreeSurfer toolbox. Further, the brain regions were segmented into 148 regions of interest using the Destrieux atlas. Features such as volume, thickness, surface area, and mean curvature were extracted for each brain region. We performed principal component analysis independently on the volume, thickness, surface area, and mean curvature features and identified the top 10 features. Further, we applied k-means clustering on these top 10 features and validated the number of clusters using Elbow and Silhouette method. Our study identified two clusters in the dataset which significantly shows the existence of two subtypes in ASD. We identified the features such as volume of scaled lh_G_front middle, thickness of scaled rh_S_temporal transverse, area of scaled lh_S_temporal sup, and mean curvature of scaled lh_G_precentral as the significant features discriminating the two clusters with statistically significant p-value (p<0.05). Thus, our proposed method is effective for the identification of ASD subtypes and can also be useful for the screening of other similar neurological disorders.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach.

Yi X, Ma M, Wang X, Zhang J, Wu F, Huang H, Xiao Q, Xie A, Liu P, Grecucci A

•papers•May 15 2025

Pediatric bipolar disorder (PBD) is a highly debilitating condition, characterized by alternating episodes of mania and depression, with intervening periods of remission. Limited information is available about the functional and structural abnormalities in PBD, particularly when comparing type I with type II subtypes. Resting-state brain activity and structural grey matter, assessed through MRI, may provide insight into the neurobiological biomarkers of this disorder. In this study, Resting state Regional Homogeneity (ReHo) and grey matter concentration (GMC) data of 58 PBD patients, and 21 healthy controls matched for age, gender, education and IQ, were analyzed in a data fusion unsupervised machine learning approach known as transposed Independent Vector Analysis. Two networks significantly differed between BPD and HC. The first network included fronto- medial regions, such as the medial and superior frontal gyrus, the cingulate, and displayed higher ReHo and GMC values in PBD compared to HC. The second network included temporo-posterior regions, as well as the insula, the caudate and the precuneus and displayed lower ReHo and GMC values in PBD compared to HC. Additionally, two networks differ between type-I vs type-II in PBD: an occipito-cerebellar network with increased ReHo and GMC in type-I compared to type-II, and a fronto-parietal network with decreased ReHo and GMC in type-I compared to type-II. Of note, the first network positively correlated with depression scores. These findings shed new light on the functional and structural abnormalities displayed by pediatric bipolar patients.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Jiao C, Ye J, Liao J, Li J, Liang J, He S

•papers•May 15 2025

Osteoarthritis (OA) is a major cause of disability worldwide, with symptoms like joint pain, limited functionality, and decreased quality of life, potentially leading to deformity and irreversible damage. Chemical changes in joint tissues precede imaging alterations, making early diagnosis challenging for conventional methods like X-rays. Although Raman imaging provides detailed chemical information, it is time-consuming. This paper aims to achieve rapid osteoarthritis diagnosis and grading using a self-developed Raman imaging system combined with deep learning denoising and acceleration algorithms. Our self-developed aberration-corrected line-scanning confocal Raman imaging device acquires a line of Raman spectra (hundreds of points) per scan using a galvanometer or displacement stage, achieving spatial and spectral resolutions of 2 μm and 0.2 nm, respectively. Deep learning algorithms enhance the imaging speed by over 4 times through effective spectrum denoising and signal-to-noise ratio (SNR) improvement. By leveraging the denoising capabilities of deep learning, we are able to acquire high-quality Raman spectral data with a reduced integration time, thereby accelerating the imaging process. Experiments on the tibial plateau of osteoarthritis patients compared three excitation wavelengths (532, 671, and 785 nm), with 671 nm chosen for optimal SNR and minimal fluorescence. Machine learning algorithms achieved a 98 % accuracy in distinguishing articular from calcified cartilage and a 97 % accuracy in differentiating osteoarthritis grades I to IV. Our fast Raman imaging system, combining an aberration-corrected line-scanning confocal Raman imager with deep learning denoising, offers improved imaging speed and enhanced spectral and spatial resolutions. It enables rapid, label-free detection of osteoarthritis severity and can identify early compositional changes before clinical imaging, allowing precise grading and tailored treatment, thus advancing orthopedic diagnostics and improving patient outcomes.

OCT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Breakthrough

Application of deep learning with fractal images to sparse-view CT.

Kawaguchi R, Minagawa T, Hori K, Hashimoto T

•papers•May 15 2025

Deep learning has been widely used in research on sparse-view computed tomography (CT) image reconstruction. While sufficient training data can lead to high accuracy, collecting medical images is often challenging due to legal or ethical concerns, making it necessary to develop methods that perform well with limited data. To address this issue, we explored the use of nonmedical images for pre-training. Therefore, in this study, we investigated whether fractal images could improve the quality of sparse-view CT images, even with a reduced number of medical images. Fractal images generated by an iterated function system (IFS) were used for nonmedical images, and medical images were obtained from the CHAOS dataset. Sinograms were then generated using 36 projections in sparse-view and the images were reconstructed by filtered back-projection (FBP). FBPConvNet and WNet (first module: learning fractal images, second module: testing medical images, and third module: learning output) were used as networks. The effectiveness of pre-training was then investigated for each network. The quality of the reconstructed images was evaluated using two indices: structural similarity (SSIM) and peak signal-to-noise ratio (PSNR). The network parameters pre-trained with fractal images showed reduced artifacts compared to the network trained exclusively with medical images, resulting in improved SSIM. WNet outperformed FBPConvNet in terms of PSNR. Pre-training WNet with fractal images produced the best image quality, and the number of medical images required for main-training was reduced from 5000 to 1000 (80% reduction). Using fractal images for network training can reduce the number of medical images required for artifact reduction in sparse-view CT. Therefore, fractal images can improve accuracy even with a limited amount of training data in deep learning.

CT Reconstruction Abdominal Methodology In Silico Academic Lab

Deep learning MRI-based radiomic models for predicting recurrence in locally advanced nasopharyngeal carcinoma after neoadjuvant chemoradiotherapy: a multi-center study.

Hu C, Xu C, Chen J, Huang Y, Meng Q, Lin Z, Huang X, Chen L

•papers•May 15 2025

Local recurrence and distant metastasis were a common manifestation of locoregionally advanced nasopharyngeal carcinoma (LA-NPC) after neoadjuvant chemoradiotherapy (NACT). To validate the clinical value of MRI radiomic models based on deep learning for predicting the recurrence of LA-NPC patients. A total of 328 NPC patients from four hospitals were retrospectively included and divided into the training(n = 229) and validation (n = 99) cohorts randomly. Extracting 975 traditional radiomic features and 1000 deep radiomic features from contrast enhanced T1-weighted (T1WI + C) and T2-weighted (T2WI) sequences, respectively. Least absolute shrinkage and selection operator (LASSO) was applied for feature selection. Five machine learning classifiers were conducted to develop three models for LA-NPC prediction in training cohort, namely Model I: traditional radiomic features, Model II: combined the deep radiomic features with Model I, and Model III: combined Model II with clinical features. The predictive performance of these models were evaluated by receive operating characteristic (ROC) curve analysis, area under the curve (AUC), accuracy, sensitivity and specificity in both cohorts. The clinical characteristics in two cohorts showed no significant differences. Choosing 15 radiomic features and 6 deep radiomic features from T1WI + C. Choosing 9 radiomic features and 6 deep radiomic features from T2WI. In T2WI, the Model II based on Random forest (RF) (AUC = 0.87) performed best compared with other models in validation cohort. Traditional radiomic model combined with deep radiomic features shows excellent predictive performance. It could be used assist clinical doctors to predict curative effect for LA-NPC patients after NACT.

MRI Classification Retrospective Clinical In Silico Academic Lab

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

Raman Dutt, Pedro Sanchez, Yongchen Yao, Steven McDonagh, Sotirios A. Tsaftaris, Timothy Hospedales

•preprint•May 15 2025

We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/

X-Ray Image Synthesis Chest Dataset Release In Silico Academic Lab Open Dataset Open Code Benchmark SOTA

Data-Agnostic Augmentations for Unknown Variations: Out-of-Distribution Generalisation in MRI Segmentation

Puru Vaish, Felix Meister, Tobias Heimann, Christoph Brune, Jelmer M. Wolterink

•preprint•May 15 2025

Medical image segmentation models are often trained on curated datasets, leading to performance degradation when deployed in real-world clinical settings due to mismatches between training and test distributions. While data augmentation techniques are widely used to address these challenges, traditional visually consistent augmentation strategies lack the robustness needed for diverse real-world scenarios. In this work, we systematically evaluate alternative augmentation strategies, focusing on MixUp and Auxiliary Fourier Augmentation. These methods mitigate the effects of multiple variations without explicitly targeting specific sources of distribution shifts. We demonstrate how these techniques significantly improve out-of-distribution generalization and robustness to imaging variations across a wide range of transformations in cardiac cine MRI and prostate MRI segmentation. We quantitatively find that these augmentation methods enhance learned feature representations by promoting separability and compactness. Additionally, we highlight how their integration into nnU-Net training pipelines provides an easy-to-implement, effective solution for enhancing the reliability of medical segmentation models in real-world applications.

MRI Segmentation Cardiac Methodology In Silico Academic Lab Benchmark SOTA

Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption.

Goh S, Goh RSJ, Chong B, Ng QX, Koh GCH, Ngiam KY, Hartman M

•papers•May 15 2025

Artificial intelligence (AI) studies show promise in enhancing accuracy and efficiency in mammographic screening programs worldwide. However, its integration into clinical workflows faces several challenges, including unintended errors, the need for professional training, and ethical concerns. Notably, specific frameworks for AI imaging in breast cancer screening are still lacking. This study aims to identify the challenges associated with implementing AI in breast screening programs and to apply the Consolidated Framework for Implementation Research (CFIR) to discuss a practical governance framework for AI in this context. Three electronic databases (PubMed, Embase, and MEDLINE) were searched using combinations of the keywords "artificial intelligence," "regulation," "governance," "breast cancer," and "screening." Original studies evaluating AI in breast cancer detection or discussing challenges related to AI implementation in this setting were eligible for review. Findings were narratively synthesized and subsequently mapped directly onto the constructs within the CFIR. A total of 1240 results were retrieved, with 20 original studies ultimately included in this systematic review. The majority (n=19) focused on AI-enhanced mammography, while 1 addressed AI-enhanced ultrasound for women with dense breasts. Most studies originated from the United States (n=5) and the United Kingdom (n=4), with publication years ranging from 2019 to 2023. The quality of papers was rated as moderate to high. The key challenges identified were reproducibility, evidentiary standards, technological concerns, trust issues, as well as ethical, legal, societal concerns, and postadoption uncertainty. By aligning these findings with the CFIR constructs, action plans targeting the main challenges were incorporated into the framework, facilitating a structured approach to addressing these issues. This systematic review identifies key challenges in implementing AI in breast cancer screening, emphasizing the need for consistency, robust evidentiary standards, technological advancements, user trust, ethical frameworks, legal safeguards, and societal benefits. These findings can serve as a blueprint for policy makers, clinicians, and AI developers to collaboratively advance AI adoption in breast cancer screening. PROSPERO CRD42024553889; https://tinyurl.com/mu4nwcxt.

Mammography Detection Breast Review In Silico Academic Lab Policy Ethics Reproducibility

Deep learning model based on ultrasound images predicts BRAF V600E mutation in papillary thyroid carcinoma.

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

Characterizing ASD Subtypes Using Morphological Features from sMRI with Unsupervised Learning.

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach.

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Application of deep learning with fractal images to sparse-view CT.

Deep learning MRI-based radiomic models for predicting recurrence in locally advanced nasopharyngeal carcinoma after neoadjuvant chemoradiotherapy: a multi-center study.

CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

Data-Agnostic Augmentations for Unknown Variations: Out-of-Distribution Generalisation in MRI Segmentation

Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption.

Ready to Sharpen Your Edge?