Latest Papers on Radiology AI. Sources: medrxiv, Order: Best Match, Limit: 10.

Lack of children in public medical imaging data points to growing age bias in biomedical AI

Hua, S. B. Z., Heller, N., He, P., Towbin, A. J., Chen, I., Lu, A., Erdman, L.

•preprint•Jun 7 2025

Artificial intelligence (AI) is rapidly transforming healthcare, but its benefits are not reaching all patients equally. Children remain overlooked with only 17% of FDA-approved medical AI devices labeled for pediatric use. In this work, we demonstrate that this exclusion may stem from a fundamental data gap. Our systematic review of 181 public medical imaging datasets reveals that children represent just under 1% of available data, while the majority of machine learning imaging conference papers we surveyed utilized publicly available data for methods development. Much like systematic biases of other kinds in model development, past studies have demonstrated the manner in which pediatric representation in data used for models intended for the pediatric population is essential for model performance in that population. We add to these findings, showing that adult-trained chest radiograph models exhibit significant age bias when applied to pediatric populations, with higher false positive rates in younger children. This work underscores the urgent need for increased pediatric representation in publicly accessible medical datasets. We provide actionable recommendations for researchers, policymakers, and data curators to address this age equity gap and ensure AI benefits patients of all ages. 1-2 sentence summaryOur analysis reveals a critical healthcare age disparity: children represent less than 1% of public medical imaging datasets. This gap in representation leads to biased predictions across medical image foundation models, with the youngest patients facing the highest risk of misdiagnosis.

X-Ray Classification Chest Review In Silico Academic Lab Ethics Policy Benchmark SOTA

Dual-stage AI system for Pathologist-Free Tumor Detectionand subtyping in Oral Squamous Cell Carcinoma

Chaudhary, N., Muddemanavar, P., Singh, D. K., Rai, A., Mishra, D., SV, S., Augustine, J., Chandra, A., Chaurasia, A., Ahmad, T.

•preprint•Jun 6 2025

BackgroundAccurate histological grading of oral squamous cell carcinoma (OSCC) is critical for prognosis and treatment planning. Current methods lack automation for OSCC detection, subtyping, and differentiation from high-risk pre-malignant conditions like oral submucous fibrosis (OSMF). Further, analysis of whole-slide image (WSI) analysis is time-consuming and variable, limiting consistency. We present a clinically relevant deep learning framework that leverages weakly supervised learning and attention-based multiple instance learning (MIL) to enable automated OSCC grading and early prediction of malignant transformation from OSMF. MethodsWe conducted a multi-institutional retrospective cohort study using a curated dataset of 1,925 whole-slide images (WSIs), including 1,586 OSCC cases stratified into well-, moderately-, and poorly-differentiated subtypes (WD, MD, and PD), 128 normal controls, and 211 OSMF and OSMF with OSCC cases. We developed a two-stage deep learning pipeline named OralPatho. In stage one, an attention-based multiple instance learning (MIL) model was trained to perform binary classification (normal vs OSCC). In stage two, a gated attention mechanism with top-K patch selection was employed to classify the OSCC subtypes. Model performance was assessed using stratified 3-fold cross-validation and external validation on an independent dataset. FindingsThe binary classifier demonstrated robust performance with a mean F1-score exceeding 0.93 across all validation folds. The multiclass model achieved consistent macro-F1 scores of 0.72, 0.70, and 0.68, along with AUCs of 0.79 for WD, 0.71 for MD, and 0.61 for PD OSCC subtypes. Model generalizability was validated using an independent external dataset. Attention maps reliably highlighted clinically relevant histological features, supporting the systems interpretability and diagnostic alignment with expert pathological assessment. InterpretationThis study demonstrates the feasibility of attention-based, weakly supervised learning for accurate OSCC grading from whole-slide images. OralPatho combines high diagnostic performance with real-time interpretability, making it a scalable solution for both advanced pathology labs and resource-limited settings.

OCT Classification Retrospective Clinical In Silico Academic Lab GenAI

Magnetic resonance imaging and the evaluation of vestibular schwannomas: a systematic review

Lee, K. S., Wijetilake, N., Connor, S., Vercauteren, T., Shapey, J.

•preprint•Jun 6 2025

IntroductionThe assessment of vestibular schwannoma (VS) requires a standardized measurement approach as growth is a key element in defining treatment strategy for VS. Volumetric measurements offer higher sensitivity and precision, but existing methods of segmentation, are labour-intensive, lack standardisation and are prone to variability and subjectivity. A new core set of measurement indicators reported consistently, will support clinical decision-making and facilitate evidence synthesis. This systematic review aimed to identify indicators used in 1) magnetic resonance imaging (MRI) acquisition and 2) measurement or 3) growth of VS. This work is expected to inform a Delphi consensus. MethodsSystematic searches of Medline, Embase and Cochrane Central were undertaken on 4th October 2024. Studies that assessed the evaluation of VS with MRI, between 2014 and 2024 were included. ResultsThe final dataset consisted of 102 studies and 19001 patients. Eighty-six (84.3%) studies employed post contrast T1 as the MRI acquisition of choice for evaluating VS. Nine (8.8%) studies additionally employed heavily weighted T2 sequences such as constructive interference in steady state (CISS) and FIESTA-C. Only 45 (44.1%) studies reported the slice thickness with the majority 38 (84.4%) choosing <3mm in thickness. Fifty-eight (56.8%) studies measured volume whilst 49 (48.0%) measured the largest linear dimension; 14 (13.7%) studies used both measurements. Four studies employed semi-automated or automated segmentation processes to measure the volumes of VS. Of 68 studies investigating growth, 54 (79.4%) provided a threshold. Significant variation in volumetric growth was observed but the threshold for significant percentage change reported by most studies was 20% (n = 18). ConclusionSubstantial variation in MRI acquisition, and methods for evaluating measurement and growth of VS, exists across the literature. This lack of standardization is likely attributed to resource constraints and the fact that currently available volumetric segmentation methods are very labour-intensive. Following the identification of the indicators employed in the literature, this study aims to develop a Delphi consensus for the standardized measurement of VS and uptake in employing a data-driven artificial intelligence-based measuring tools.

MRI Segmentation Neurological Review Post Market Academic Lab Benchmark SOTA

Deep learning-enabled MRI phenotyping uncovers regional body composition heterogeneity and disease associations in two European population cohorts

Mertens, C. J., Haentze, H., Ziegelmayer, S., Kather, J. N., Truhn, D., Kim, S. H., Busch, F., Weller, D., Wiestler, B., Graf, M., Bamberg, F., Schlett, C. L., Weiss, J. B., Ringhof, S., Can, E., Schulz-Menger, J., Niendorf, T., Lammert, J., Molwitz, I., Kader, A., Hering, A., Meddeb, A., Nawabi, J., Schulze, M. B., Keil, T., Willich, S. N., Krist, L., Hadamitzky, M., Hannemann, A., Bassermann, F., Rueckert, D., Pischon, T., Hapfelmeier, A., Makowski, M. R., Bressem, K. K., Adams, L. C.

•preprint•Jun 6 2025

Body mass index (BMI) does not account for substantial inter-individual differences in regional fat and muscle compartments, which are relevant for the prevalence of cardiometabolic and cancer conditions. We applied a validated deep learning pipeline for automated segmentation of whole-body MRI scans in 45,851 adults from the UK Biobank and German National Cohort, enabling harmonized quantification of visceral (VAT), gluteofemoral (GFAT), and abdominal subcutaneous adipose tissue (ASAT), liver fat fraction (LFF), and trunk muscle volume. Associations with clinical conditions were evaluated using compartment measures adjusted for age, sex, height, and BMI. Our analysis demonstrates that regional adiposity and muscle volume show distinct associations with cardiometabolic and cancer prevalence, and that substantial disease heterogeneity exists within BMI strata. The analytic framework and reference data presented here will support future risk stratification efforts and facilitate the integration of automated MRI phenotyping into large-scale population and clinical research.

MRI Segmentation Whole Body Retrospective Clinical In Silico Academic Lab Open Dataset

Detecting neurodegenerative changes in glaucoma using deep mean kurtosis-curve-corrected tractometry

Kasa, L. W., Schierding, W., Kwon, E., Holdsworth, S., Danesh-Meyer, H. V.

•preprint•Jun 6 2025

Glaucoma is increasingly recognized as a neurodegenerative condition involving both retinal and central nervous system structures. Here, we present an integrated framework that combines MK-Curve-corrected diffusion kurtosis imaging (DKI), tractometry, and deep autoencoder-based normative modeling to detect localized white matter abnormalities associated with glaucoma. Using UK Biobank diffusion MRI data, we show that MK-Curve approach corrects anatomically implausible values and improves the reliability of DKI metrics - particularly mean (MK), radial (RK), and axial kurtosis (AK) - in regions of complex fiber architecture. Tractometry revealed reduced MK in glaucoma patients along the optic radiation, inferior longitudinal fasciculus, and inferior fronto-occipital fasciculus, but not in a non-visual control tract, supporting disease specificity. These abnormalities were spatially localized, with significant changes observed at multiple points along the tracts. MK demonstrated greater sensitivity than MD and exhibited altered distributional features, reflecting microstructural heterogeneity not captured by standard metrics. Node-wise MK values in the right optic radiation showed weak but significant correlations with retinal OCT measures (ganglion cell layer and retinal nerve fiber layer thickness), reinforcing the biological relevance of these findings. Deep autoencoder-based modeling further enabled subject-level anomaly detection that aligned spatially with group-level changes and outperformed traditional approaches. Together, our results highlight the potential of advanced diffusion modeling and deep learning for sensitive, individualized detection of glaucomatous neurodegeneration and support their integration into future multimodal imaging pipelines in neuro-ophthalmology.

MRI Detection Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Clinically Interpretable Deep Learning via Sparse BagNets for Epiretinal Membrane and Related Pathology Detection

Ofosu Mensah, S., Neubauer, J., Ayhan, M. S., Djoumessi Donteu, K. R., Koch, L. M., Uzel, M. M., Gelisken, F., Berens, P.

•preprint•Jun 6 2025

Epiretinal membrane (ERM) is a vitreoretinal interface disease that, if not properly addressed, can lead to vision impairment and negatively affect quality of life. For ERM detection and treatment planning, Optical Coherence Tomography (OCT) has become the primary imaging modality, offering non-invasive, high-resolution cross-sectional imaging of the retina. Deep learning models have also led to good ERM detection performance on OCT images. Nevertheless, most deep learning models cannot be easily understood by clinicians, which limits their acceptance in clinical practice. Post-hoc explanation methods have been utilised to support the uptake of models, albeit, with partial success. In this study, we trained a sparse BagNet model, an inherently interpretable deep learning model, to detect ERM in OCT images. It performed on par with a comparable black-box model and generalised well to external data. In a multitask setting, it also accurately predicted other changes related to the ERM pathophysiology. Through a user study with ophthalmologists, we showed that the visual explanations readily provided by the sparse BagNet model for its decisions are well-aligned with clinical expertise. We propose potential directions for clinical implementation of the sparse BagNet model to guide clinical decisions in practice.

OCT Classification Methodology In Silico Academic Lab GenAI

Prediction of impulse control disorders in Parkinson's disease: a longitudinal machine learning study

Vamvakas, A., van Balkom, T., van Wingen, G., Booij, J., Weintraub, D., Berendse, H. W., van den Heuvel, O. A., Vriend, C.

•preprint•Jun 5 2025

BackgroundImpulse control disorders (ICD) in Parkinsons disease (PD) patients mainly occur as adverse effects of dopamine replacement therapy. Despite several known risk factors associated with ICD development, this cannot yet be accurately predicted at PD diagnosis. ObjectivesWe aimed to investigate the predictability of incident ICD by baseline measures of demographic, clinical, dopamine transporter single photon emission computed tomography (DAT-SPECT), and genetic variables. MethodsWe used demographic and clinical data of medication-free PD patients from two longitudinal datasets; Parkinsons Progression Markers Initiative (PPMI) (n=311) and Amsterdam UMC (n=72). We extracted radiomic and latent features from DAT-SPECT. We used single nucleotic polymorphisms (SNPs) from PPMIs NeuroX and Exome sequencing data. Four machine learning classifiers were trained on combinations of the input feature sets, to predict incident ICD at any follow-up assessment. Classification performance was measured with 10x5-fold cross-validation. ResultsICD prevalence at any follow-up was 0.32. The highest performance in predicting incident ICD (AUC=0.66) was achieved by the models trained on clinical features only. Anxiety severity and age of PD onset were identified as the most important features. Performance did not improve with adding features from DAT-SPECT or SNPs. We observed significantly higher performance (AUC=0.74) when classifying patients who developed ICD within four years from diagnosis compared with those tested negative for seven or more years. ConclusionsPrediction accuracy for later ICD development, at the time of PD diagnosis, is limited; however, it increases for shorter time-to-event predictions. Neither DAT-SPECT nor genetic data improve the predictability obtained using demographic and clinical variables alone.

SPECT Classification Neurological Retrospective Clinical In Silico Academic Lab

DWI and Clinical Characteristics Correlations in Acute Ischemic Stroke After Thrombolysis

Li, J., Huang, C., Liu, Y., Li, Y., Zhang, J., Xiao, M., yan, Z., zhao, H., Zeng, X., Mu, J.

•preprint•Jun 5 2025

ObjectiveMagnetic Resonance Diffusion-Weighted Imaging (DWI) is a crucial tool for diagnosing acute ischemic stroke, yet some patients present as DWI-negative. This study aims to analyze the imaging differences and associated clinical characteristics in acute ischemic stroke patients receiving intravenous thrombolysis, in order to enhance understanding of DWI-negative strokes. MethodsRetrospective collection of clinical data from acute ischemic stroke patients receiving intravenous thrombolysis at the Stroke Center of the First Affiliated Hospital of Chongqing Medical University from January 2017 to June 2023, categorized into DWI-positive and negative groups. Descriptive statistics, univariate analysis, binary logistic regression, and machine learning model were utilized to assess the predictive value of clinical features. Additionally, telephone follow-up was conducted for DWI-negative patients to record medication compliance, stroke recurrence, and mortality, with Fine-Gray competing risk model used to analyze recurrent risk factors. ResultsThe incidence rate of DWI-negative ischemic stroke is 22.74%. Factors positively associated with DWI-positive cases include onset to needle time (ONT), onset to first MRI time (OMT), NIHSS score at 1 week of hospitalization (NIHSS-1w), hyperlipidemia (HLP), and atrial fibrillation (AF) (p<0.05, OR>1). Conversely, recurrent ischemic stroke (RIS) and platelet count (PLT) are negatively correlated with DWI-positive cases (p<0.05, OR<1). Trial of Org 10172 in Acute Stroke Treatment (TOAST) classification significantly influences DWI presentation (p=0.01), but the specific impact of etiological subtypes remains unclear. Machine learning models suggest that the features with the highest predictive value, in descending order, are AF, HLP, OMT, ONT, NIHSS difference within 24 hours post-thrombolysis(NIHSS-d(0-24h)PT), and RIS. ConclusionsNIHSS-1w, OMT, ONT, HLP, and AF can predict DWI-positive findings, while platelet count and RIS are associated with DWI-negative cases. AF and HLP demonstrate the highest predictive value. DWI-negative patients have a higher risk of stroke recurrence than mortality in the short term, with a potential correlation between TOAST classification and recurrence risk.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Interpretable Machine Learning based Detection of Coeliac Disease

Jaeckle, F., Bryant, R., Denholm, J., Romero Diaz, J., Schreiber, B., Shenoy, V., Ekundayomi, D., Evans, S., Arends, M., Soilleux, E.

•preprint•Jun 4 2025

BackgroundCoeliac disease, an autoimmune disorder affecting approximately 1% of the global population, is typically diagnosed on a duodenal biopsy. However, inter-pathologist agreement on coeliac disease diagnosis is only around 80%. Existing machine learning solutions designed to improve coeliac disease diagnosis often lack interpretability, which is essential for building trust and enabling widespread clinical adoption. ObjectiveTo develop an interpretable AI model capable of segmenting key histological structures in duodenal biopsies, generating explainable segmentation masks, estimating intraepithelial lymphocyte (IEL)-to-enterocyte and villus-to-crypt ratios, and diagnosing coeliac disease. DesignSemantic segmentation models were trained to identify villi, crypts, IELs, and enterocytes using 49 annotated 2048x2048 patches at 40x magnification. IEL-to-enterocyte and villus-to-crypt ratios were calculated from segmentation masks, and a logistic regression model was trained on 172 images to diagnose coeliac disease based on these ratios. Evaluation was performed on an independent test set of 613 duodenal biopsy scans from a separate NHS Trust. ResultsThe villus-crypt segmentation model achieved a mean PR AUC of 80.5%, while the IEL-enterocyte model reached a PR AUC of 82%. The diagnostic model classified WSIs with 96% accuracy, 86% positive predictive value, and 98% negative predictive value on the independent test set. ConclusionsOur interpretable AI models accurately segmented key histological structures and diagnosed coeliac disease in unseen WSIs, demonstrating strong generalization performance. These models provide pathologists with reliable IEL-to-enterocyte and villus-to-crypt ratio estimates, enhancing diagnostic accuracy. Interpretable AI solutions like ours are essential for fostering trust among healthcare professionals and patients, complementing existing black-box methodologies. What is already known on this topicPathologist concordance in diagnosing coeliac disease from duodenal biopsies is consistently reported to be below 80%, highlighting diagnostic variability and the need for improved methods. Several recent studies have leveraged artificial intelligence (AI) to enhance coeliac disease diagnosis. However, most of these models operate as "black boxes," offering limited interpretability and transparency. The lack of explainability in AI-driven diagnostic tools prevents widespread adoption by healthcare professionals and reduces patient trust. What this study addsThis study presents an interpretable semantic segmentation algorithm capable of detecting the four key histological structures essential for diagnosing coeliac disease: crypts, villi, intraepithelial lymphocytes (IELs), and enterocytes. The model accurately estimates the IEL-to-enterocyte ratio and the villus-to-crypt ratio, the latter being an indicator of villous atrophy and crypt hyperplasia, thereby providing objective, reproducible metrics for diagnosis. The segmentation outputs allow for transparent, explainable decision-making, supporting pathologists in coeliac disease diagnosis with improved accuracy and confidence. This study presents an AI model that automates the estimation of the IEL-to-enterocyte ratio--a labour-intensive task currently performed manually by pathologists in limited biopsy regions. By minimising diagnostic variability and alleviating time constraints for pathologists, the model provides an efficient and practical solution to streamline the diagnostic workflow. Tested on an independent dataset from a previously unseen source, the model demonstrates explainability and generalizability, enhancing trust and encouraging adoption in routine clinical practice. Furthermore, this approach could set a new standard for AI-assisted duodenal biopsy evaluation, paving the way for the development of interpretable AI tools in pathology to address the critical challenges of limited pathologist availability and diagnostic inconsistencies.

Mixed Modality Segmentation Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Rad-Path Correlation of Deep Learning Models for Prostate Cancer Detection on MRI

Verde, A. S. C., de Almeida, J. G., Mendes, F., Pereira, M., Lopes, R., Brito, M. J., Urbano, M., Correia, P. S., Gaivao, A. M., Firpo-Betancourt, A., Fonseca, J., Matos, C., Regge, D., Marias, K., Tsiknakis, M., ProCAncer-I Consortium,, Conceicao, R. C., Papanikolaou, N.

•preprint•Jun 4 2025

While Deep Learning (DL) models trained on Magnetic Resonance Imaging (MRI) have shown promise for prostate cancer detection, their lack of direct biological validation often undermines radiologists trust and hinders clinical adoption. Radiologic-histopathologic (rad-path) correlation has the potential to validate MRI-based lesion detection using digital histopathology. This study uses automated and manually annotated digital histopathology slides as a standard of reference to evaluate the spatial extent of lesion annotations derived from both radiologist interpretations and DL models previously trained on prostate bi-parametric MRI (bp-MRI). 117 histopathology slides were used as reference. Prospective patients with clinically significant prostate cancer performed a bp-MRI examination before undergoing a robotic radical prostatectomy, and each prostate specimen was sliced using a 3D-printed patient-specific mold to ensure a direct comparison between pre-operative imaging and histopathology slides. The histopathology slides and their corresponding T2-weighted MRI images were co-registered. We trained DL models for cancer detection on large retrospective datasets of T2-w MRI only, bp-MRI and histopathology images and did inference in a prospective patient cohort. We evaluated the spatial extent between detected lesions and between detected lesions and the histopathological and radiological ground-truth, using the Dice similarity coefficient (DSC). The DL models trained on digital histopathology tiles and MRI images demonstrated promising capabilities in lesion detection. A low overlap was observed between the lesion detection masks generated by the histopathology and bp-MRI models, with a DSC = 0.10. However, the overlap was equivalent (DSC = 0.08) between radiologist annotations and histopathology ground truth. A rad-path correlation pipeline was established in a prospective patient cohort with prostate cancer undergoing surgery. The correlation between rad-path DL models was low but comparable to the overlap between annotations. While DL models show promise in prostate cancer detection, challenges remain in integrating MRI-based predictions with histopathological findings.

MRI Detection Abdominal Prospective In Silico Consortium

Lack of children in public medical imaging data points to growing age bias in biomedical AI

Dual-stage AI system for Pathologist-Free Tumor Detectionand subtyping in Oral Squamous Cell Carcinoma

Magnetic resonance imaging and the evaluation of vestibular schwannomas: a systematic review

Deep learning-enabled MRI phenotyping uncovers regional body composition heterogeneity and disease associations in two European population cohorts

Detecting neurodegenerative changes in glaucoma using deep mean kurtosis-curve-corrected tractometry

Clinically Interpretable Deep Learning via Sparse BagNets for Epiretinal Membrane and Related Pathology Detection

Prediction of impulse control disorders in Parkinson's disease: a longitudinal machine learning study

DWI and Clinical Characteristics Correlations in Acute Ischemic Stroke After Thrombolysis

Interpretable Machine Learning based Detection of Coeliac Disease

Rad-Path Correlation of Deep Learning Models for Prostate Cancer Detection on MRI

Ready to Sharpen Your Edge?