Latest Papers on Radiology AI. Sources: medrxiv, Tags: None.

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

Meyer, P., Rocca, A., Banus, J., Ogier, A. C., Georgantas, C., Calarnou, P., Fatima, A., Vallee, J.-P., Deux, J.-F., Thomas, A., Marquis, J., Monney, P., Lu, H., Ledoux, J.-B., Tillier, C., Crowe, L. A., Abdurashidova, T., Richiardi, J., Hullin, R., van Heeswijk, R. B.

•preprint•Sep 16 2025

Introduction Heart failure (HF) is a life-threatening syndrome with significant morbidity and mortality. While evidence-based drug treatments have effectively reduced morbidity and mortality in HF with reduced ejection fraction (HFrEF), few therapies have been demonstrated to improve outcomes in HF with preserved ejection fraction (HFpEF). The multifaceted clinical presentation is one of the main reasons why the current understanding of HFpEF remains limited. This may be caused by the existence of several HFpEF disease subtypes that each need different treatments. There is therefore an unmet need for a holistic approach that combines comprehensive imaging with metabolomic, transcriptomic and genomic mapping to subtype HFpEF patients. This protocol details the approach employed in the HeartMagic study to address this gap in understanding. Methods This prospective multi-center observational cohort study will include 500 consecutive patients with actual or recent hospitalization for treatment of HFpEF at two Swiss university hospitals, along with 50 age-matched HFrEF patients and 50 age-matched healthy controls. Diagnosis of heart failure is based on clinical signs and symptoms and subgrouping HF patients is based on the left-ventricular ejection fraction. In addition to routine clinical workup, participants undergo genomic, transcriptomic, and metabolomic analyses, while the anatomy, composition, and function of the heart are quantified by comprehensive echocardiography and magnetic resonance imaging (MRI). Quantitative MRI is also applied to characterize the kidney. The primary outcome is a composite of one-year cardiovascular mortality or rehospitalization. Machine learning (ML) based multi-modal clustering will be employed to identify distinct HFpEF subtypes in the holistic data. The clinical importance of these subtypes shall be evaluated based on their association with the primary outcome. Statistical analysis will include group comparisons across modalities, survival analysis for the primary outcome, and integrative multi-modal clustering combining clinical, imaging, ECG, genomic, transcriptomic, and metabolomic data to identify and validate HFpEF subtypes. Discussion The integration of comprehensive MRI with extensive genomic and metabolomic profiling in this study will result in an unprecedented panoramic view of HFpEF and should enable us to distinguish functional subgroups of HFpEF patients. This approach has the potential to provide unprecedented insights on HFpEF disease and should provide a basis for personalized therapies. Beyond this, identifying HFpEF subtypes with specific molecular and structural characteristics could lead to new targeted pharmacological interventions, with the potential to improve patient outcomes.

MRI Classification Cardiac Prospective Clinical Pilot Academic Lab GenAI

Deep Learning for Breast Mass Discrimination: Integration of B-Mode Ultrasound & Nakagami Imaging with Automatic Lesion Segmentation

Hassan, M. W., Hossain, M. M.

•preprint•Sep 15 2025

ObjectiveThis study aims to enhance breast cancer diagnosis by developing an automated deep learning framework for real-time, quantitative ultrasound imaging. Breast cancer is the second leading cause of cancer-related deaths among women, and early detection is crucial for improving survival rates. Conventional ultrasound, valued for its non-invasive nature and real-time capability, is limited by qualitative assessments and inter-observer variability. Quantitative ultrasound (QUS) methods, including Nakagami imaging--which models the statistical distribution of backscattered signals and lesion morphology--present an opportunity for more objective analysis. MethodsThe proposed framework integrates three convolutional neural networks (CNNs): (1) NakaSynthNet, synthesizing quantitative Nakagami parameter images from B-mode ultrasound; (2) SegmentNet, enabling automated lesion segmentation; and (3) FeatureNet, which combines anatomical and statistical features for classifying lesions as benign or malignant. Training utilized a diverse dataset of 110,247 images, comprising clinical B-mode scans and various simulated examples (fruit, mammographic lesions, digital phantoms). Quantitative performance was evaluated using mean squared error (MSE), structural similarity index (SSIM), segmentation accuracy, sensitivity, specificity, and area under the curve (AUC). ResultsNakaSynthNet achieved real-time synthesis at 21 frames/s, with MSE of 0.09% and SSIM of 98%. SegmentNet reached 98.4% accuracy, and FeatureNet delivered 96.7% overall classification accuracy, 93% sensitivity, 98% specificity, and an AUC of 98%. ConclusionThe proposed multi-parametric deep learning pipeline enables accurate, real-time breast cancer diagnosis from ultrasound data using objective quantitative imaging. SignificanceThis framework advances the clinical utility of ultrasound by reducing subjectivity and providing robust, multi-parametric information for improved breast cancer detection.

Ultrasound Segmentation Breast Methodology In Silico Academic Lab

Normative Modelling of Brain Volume for Diagnostic and Prognostic Stratification in Multiple Sclerosis

Korbmacher, M., Lie, I. A., Wesnes, K., Westman, E., Espeseth, T., Andreassen, O., Westlye, L., Wergeland, S., Harbo, H. F., Nygaard, G. O., Myhr, K.-M., Hogestol, E. A., Torkildsen, O.

•preprint•Sep 15 2025

BackgroundBrain atrophy is a hallmark of multiple sclerosis (MS). For clinical translatability and individual-level predictions, brain atrophy needs to be put into context of the broader population, using reference or normative models. MethodsReference models of MRI-derived brain volumes were established from a large healthy control (HC) multi-cohort dataset (N=63 115, 51% females). The reference models were applied to two independent MS cohorts (N=362, T1w-scans=953, follow-up time up to 12 years) to assess deviations from the reference, defined as Z-values. We assessed the overlap of deviation profiles and their stability over time using individual-level transitions towards or out of significant reference deviation states (|Z|>1{middle dot}96). A negative binomial model was used for case-control comparisons of the number of extreme deviations. Linear models were used to assess differences in Z-score deviations between MS and propensity-matched HCs, and associations with clinical scores at baseline and over time. The utilized normative BrainReference models, scripts and usage instructions are freely available. FindingsWe identified a temporally stable, brain morphometric phenotype of MS. The right and left thalami most consistently showed significantly lower-than-reference volumes in MS (25% and 26% overlap across the sample). The number of such extreme smaller-than-reference values was 2{middle dot}70 in MS compared to HC (4{middle dot}51 versus 1{middle dot}67). Additional deviations indicated stronger disability (Expanded Disability Status Scale: {beta}=0{middle dot}22, 95% CI 0{middle dot}12 to 0{middle dot}32), Paced Auditory Serial Addition Test score ({beta}=-0{middle dot}27, 95% CI -0{middle dot}52 to -0{middle dot}02), and Fatigue Severity Score ({beta}=0{middle dot}29, 95% CI 0{middle dot}05 to 0{middle dot}53) at baseline, and over time with EDSS ({beta}=0{middle dot}07, 95% CI 0{middle dot}02 to 0{middle dot}13). We additionally provide detailed maps of reference-deviations and their associations with clinical assessments. InterpretationWe present a heterogenous brain phenotype of MS which is associated with clinical manifestations, and particularly implicating the thalamus. The findings offer potential to aid diagnosis and prognosis of MS. FundingNorwegian MS-union, Research Council of Norway (#223273; #324252); the South-Eastern Norway Regional Health Authority (#2022080); and the European Unions Horizon2020 Research and Innovation Programme (#847776, #802998). Research in contextO_ST_ABSEvidence before this studyC_ST_ABSReference values and normative models have yet to be widely applied to neuroimaging assessments of neurological disorders such as multiple sclerosis (MS). We conducted a literature search in PubMed and Embase (Jan 1, 2000-September 12, 2025) using the terms "MRI" AND "multiple sclerosis", with and without the keywords "normative model*" and "atrophy", without language restrictions. While normative models have been applied in psychiatric and developmental disorders, few studies have addressed their use in neurological conditions. Existing MS research has largely focused on global atrophy and has not provided regional reference charts or established links to clinical and cognitive outcomes. Added value of this studyWe provide regionally detailed brain morphometry maps derived from a heterogeneous MS cohort spanning wide ranges of age, sex, clinical phenotype, disease duration, disability, and scanner characteristics. By leveraging normative modelling, our approach enables individualised brain phenotyping of MS in relation to a population based normative sample. The analyses reveal clinically meaningful and spatially consistent patterns of smaller brain volumes, particularly in the thalamus and frontal cortical regions, which are linked to disability, cognitive impairment, and fatigue. Robustness across scanners, centres, and longitudinal follow-up supports the stability and generalisability of these findings to real-world MS populations. Implications of all the available evidenceNormative modelling offers an individualised, sensitive, and interpretable approach to quantifying brain structure in MS by providing individual-specific reference values, supporting earlier detection of neurodegeneration and improved patient stratification. A consistent pattern of thalamic and fronto-parietal deviations defines a distinct morphometric profile of MS, with potential utility for early and personalised diagnosis and disease monitoring in clinical practice and clinical trials.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab Open Code Open Dataset

Multimodal Machine Learning for Diagnosis of Multiple Sclerosis Using Optical Coherence Tomography in Pediatric Cases

Chen, C., Soltanieh, S., Rajapaksa, S., Khalvati, F., Yeh, E. A.

•preprint•Sep 14 2025

Background and ObjectivesIdentifying MS in children early and distinguishing it from other neuroinflammatory conditions of childhood is critical, as early therapeutic intervention can improve outcomes. The anterior visual pathway has been demonstrated to be of central importance in diagnostic considerations for MS and has recently been identified as a fifth topography in the McDonald Diagnostic Criteria for MS. Optical coherence tomography (OCT) provides high-resolution retinal imaging and reflects the structural integrity of the retinal nerve fiber and ganglion cell inner plexiform layers. Whether multimodal deep learning models can use OCT alone to diagnose pediatric MS (POMS) is unknown. MethodsWe analyzed 3D OCT scans collected prospectively through the Neuroinflammatory Registry of the Hospital for Sick Children (REB#1000005356). Raw macular and optic nerve head images, and 52 automatically segmented features were included. We evaluated three classification approaches: (1) deep learning models (e.g. ResNet, DenseNet) for representation learning followed by classical ML classifiers, (2) ML models trained on OCT-derived features, and (3) multimodal models combining both via early and late fusion. ResultsScans from individuals with POMS (onset 16.0 {+/-} 3.1 years, 51.0%F; 211 scans) and 29 children with non-inflammatory neurological conditions (13.1 {+/-} 4.0 years, 69.0%F, 52 scans) were included. The early fusion model achieved the highest performance (AUC: 0.87, F1: 0.87, Accuracy: 90%), outperforming both unimodal and late fusion models. The best unimodal feature-based model (SVC) yielded an AUC of 0.84, F1 of 0.85 and an accuracy of 85%, while the best image-based model (ResNet101 with Random Forest) achieved an AUC of 0.87, F1 of 0.79, and accuracy of 84%. Late fusion underperformed, reaching 82% accuracy but failing in the minority class. DiscussionMultimodal learning with early fusion significantly enhances diagnostic performance by combining spatial retinal information with clinically relevant structural features. This approach captures complementary patterns associated with MS pathology and shows promise as an AI-driven tool to support pediatric neuroinflammatory diagnosis.

OCT Classification Neurological Retrospective Clinical In Silico Academic Lab

Dual-Branch Efficient Net Architecture for ACL Tear Detection in Knee MRI

kota, T., Garofalaki, K., Whitely, F., Evdokimenko, E., Smartt, E.

•preprint•Sep 13 2025

We propose a deep learning approach for detecting anterior cruciate ligament (ACL) tears from knee MRI using a dual-branch convolutional architecture. The model independently processes sagittal and coronal MRI sequences using EfficientNet-B2 backbones with spatial attention modules, followed by a late fusion classifier for binary prediction. MRI volumes are standardized to a fixed number of slices, and domain-specific normalization and data augmentation are applied to enhance model robustness. Trained on a stratified 80/20 split of the MRNet dataset, our best model--using the Adam optimizer and a learning rate of 1e-4--achieved a validation AUC of 0.98 and a test AUC of 0.93. These results show strong predictive performance while maintaining computational efficiency. This work demonstrates that accurate diagnosis is achievable using only two anatomical planes and sets the stage for further improvements through architectural enhancements and broader data integration.

MRI Classification Musculoskeletal Retrospective Clinical In Silico

Risk prediction for lung cancer screening: a systematic review and meta-regression

Rezaeianzadeh, R., Leung, C., Kim, S. J., Choy, K., Johnson, K. M., Kirby, M., Lam, S., Smith, B. M., Sadatsafavi, M.

•preprint•Sep 12 2025

BackgroundLung cancer (LC) is the leading cause of cancer mortality, often diagnosed at advanced stages. Screening reduces mortality in high-risk individuals, but its efficiency can improve with pre- and post-screening risk stratification. With recent LC screening guideline updates in Europe and the US, numerous novel risk prediction models have emerged since the last systematic review of such models. We reviewed risk-based models for selecting candidates for CT screening, and post-CT stratification. MethodsWe systematically reviewed Embase and MEDLINE (2020-2024), identifying studies proposing new LC risk models for screening selection or nodule classification. Data extraction included study design, population, model type, risk horizon, and internal/external validation metrics. In addition, we performed an exploratory meta-regression of AUCs to assess whether sample size, model class, validation type, and biomarker use were associated with discrimination. ResultsOf 1987 records, 68 were included: 41 models were for screening selection (20 without biomarkers, 21 with), and 27 for nodule classification. Regression-based models predominated, though machine learning and deep learning approaches were increasingly common. Discrimination ranged from moderate (AUC{approx}0.70) to excellent (>0.90), with biomarker and imaging-enhanced models often outperforming traditional ones. Model calibration was inconsistently reported, and fewer than half underwent external validation. Meta-regression suggested that, among pre-screening models, larger sample sizes were modestly associated with higher AUC. Conclusion75 models had been identified prior to 2020, we found 68 models since. This reflects growing interest in personalized LC screening. While many demonstrate strong discrimination, inconsistent calibration and limited external validation hinder clinical adoption. Future efforts should prioritize improving existing models rather than developing new ones, transparent evaluation, cost-effectiveness analysis, and real-world implementation.

CT Classification Chest Meta Analysis In Silico Benchmark SOTA

Artificial Intelligence in Early Detection of Autism Spectrum Disorder for Preschool ages: A Systematic Literature Review

Hasan, H. H.

•preprint•Sep 10 2025

BackgroundEarly detection of autism spectrum disorder (ASD) improves outcomes, yet clinical assessment is time-intensive. Artificial intelligence (AI) may support screening in preschool children by analysing behavioural, neurophysiological, imaging, and biomarker data. AimTo synthesise studies that applied AI in ASD assessment and evaluate whether the underlying data and AI approaches can distinguish ASD characteristics in early childhood. MethodsA systematic search of 15 databases was conducted on 30 November 2024 using predefined terms. Inclusion criteria were empirical studies applying AI to ASD detection in children aged 0-7 years. Reporting followed PRISMA 2020. ResultsTwelve studies met criteria. Reported performance (AUC) ranged from 0.65 to 0.997. Modalities included behavioural (eye-tracking, home videos), motor (tablet/reaching), EEG, diffusion MRI, and blood/epigenetic biomarkers. The largest archival dataset (M-CHAT-R) achieved near-perfect AUC with neural networks. Common limitations were small samples, male-skewed cohorts, and limited external validation. ConclusionsAI can aid early ASD screening in infants and preschoolers, but larger and more diverse datasets, rigorous external validation, and multimodal integration are needed before clinical deployment.

MRI Classification Neurological Review In Silico

Radiologist-AI Collaboration for Ischemia Diagnosis in Small Bowel Obstruction: Multicentric Development and External Validation of a Multimodal Deep Learning Model

Vanderbecq, Q., Xia, W. F., Chouzenoux, E., Pesquet, J.-c., Zins, M., Wagner, M.

•preprint•Sep 8 2025

PurposeTo develop and externally validate a multimodal AI model for detecting ischaemia complicating small-bowel obstruction (SBO). MethodsWe combined 3D CT data with routine laboratory markers (C-reactive protein, neutrophil count) and, optionally, radiology report text. From two centers, 1,350 CT examinations were curated; 771 confirmed SBO scans were used for model development with patient-level splits. Ischemia labels were defined by surgical confirmation within 24 hours of imaging. Models (MViT, ResNet-101, DaViT) were trained as unimodal and multimodal variants. External testing was used for 66 independent cases from a third center. Two radiologists (attending, resident) read the test set with and without AI assistance. Performance was assessed using AUC, sensitivity, specificity, and 95% bootstrap confidence intervals; predictions included a confidence score. ResultsThe image-plus-laboratory model performed best on external testing (AUC 0.69 [0.59-0.79], sensitivity 0.89 [0.76-1.00], and specificity 0.44 [0.35-0.54]). Adding report text improved internal validation but did not generalize externally; image+text and full multimodal variants did not exceed image+laboratory performance. Without AI, the attending outperformed the resident (AUC 0.745 [0.617-0.845] vs 0.706 [0.581-0.818]); with AI, both improved, attending 0.752 [0.637-0.853] and resident 0.752 [0.629-0.867], rising to 0.750 [0.631-0.839] and 0.773 [0.657-0.867] with confidence display; differences were not statistically significant. ConclusionA multimodal AI that combines CT images with routine laboratory markers outperforms single-modality approaches and boosts radiologist readers performance notably junior, supporting earlier, more consistent decisions within the first 24 hours. Key PointsA multimodal artificial intelligence (AI) model that combines CT images with laboratory markers detected ischemia in small-bowel obstruction with AUC 0.69 (95% CI 0.59-0.79) and sensitivity 0.89 (0.76-1.00) on external testing, outperforming single-modality models. Adding report text did not generalize across sites: the image+text model fell from AUC 0.82 (internal) to 0.53 (external), and adding text to image+biology left external AUC unchanged (0.69) with similar specificity (0.43-0.44). With AI assistance both junior and senior readers improved; the juniors AUC rose from 0.71 to 0.77, reaching senior-level performance. Summary StatementA multicentric AI model combining CT and routine laboratory data (CRP and neutrophilia) improved radiologists detection of ischemia in small-bowel obstruction. This tool supports earlier decision-making within the first 24 hours.

CT Detection Abdominal Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Predicting Rejection Risk in Heart Transplantation: An Integrated Clinical-Histopathologic Framework for Personalized Post-Transplant Care

Kim, D. D., Madabhushi, A., Margulies, K. B., Peyster, E. G.

•preprint•Sep 8 2025

BackgroundCardiac allograft rejection (CAR) remains the leading cause of early graft failure after heart transplantation (HT). Current diagnostics, including histologic grading of endomyocardial biopsy (EMB) and blood-based assays, lack accurate predictive power for future CAR risk. We developed a predictive model integrating routine clinical data with quantitative morphologic features extracted from routine EMBs to demonstrate the precision-medicine potential of mining existing data sources in post-HT care. MethodsIn a retrospective cohort of 484 HT recipients with 1,188 EMB encounters within 6 months post-transplant, we extracted 370 quantitative pathology features describing lymphocyte infiltration and stromal architecture from digitized H&E-stained slides. Longitudinal clinical data comprising 268 variables--including lab values, immunosuppression records, and prior rejection history--were aggregated per patient. Using the XGBoost algorithm with rigorous cross-validation, we compared models based on four different data sources: clinical-only, morphology-only, cross-sectional-only, and fully integrated longitudinal data. The top predictors informed the derivation of a simplified Integrated Rejection Risk Index (IRRI), which relies on just 4 clinical and 4 morphology risk facts. Model performance was evaluated by AUROC, AUPRC, and time-to-event hazard ratios. ResultsThe fully integrated longitudinal model achieved superior predictive accuracy (AUROC 0.86, AUPRC 0.74). IRRI stratified patients into risk categories with distinct future CAR hazards: high-risk patients showed a markedly increased CAR risk (HR=6.15, 95% CI: 4.17-9.09), while low-risk patients had significantly reduced risk (HR=0.52, 95% CI: 0.33-0.84). This performance exceeded models based on just cross-sectional or single-domain data, demonstrating the value of multi-modal, temporal data integration. ConclusionsBy integrating longitudinal clinical and biopsy morphologic features, IRRI provides a scalable, interpretable tool for proactive CAR risk assessment. This precision-based approach could support risk-adaptive surveillance and immunosuppression management strategies, offering a promising pathway toward safer, more personalized post-HT care with the potential to reduce unnecessary procedures and improve outcomes. Clinical PerspectiveWhat is new? O_LICurrent tools for cardiac allograft monitoring detect rejection only after it occurs and are not designed to forecast future risk. This leads to missed opportunities for early intervention, avoidable patient injury, unnecessary testing, and inefficiencies in care. C_LIO_LIWe developed a machine learning-based risk index that integrates clinical features, quantitative biopsy morphology, and longitudinal temporal trends to create a robust predictive framework. C_LIO_LIThe Integrated Rejection Risk Index (IRRI) provides highly accurate prediction of future allograft rejection, identifying both high- and low-risk patients up to 90 days in advance - a capability entirely absent from current transplant management. C_LI What are the clinical implications? O_LIIntegrating quantitative histopathology with clinical data provides a more precise, individualized estimate of rejection risk in heart transplant recipients. C_LIO_LIThis framework has the potential to guide post-transplant surveillance intensity, immunosuppressive management, and patient counseling. C_LIO_LIAutomated biopsy analysis could be incorporated into digital pathology workflows, enabling scalable, multicenter application in real-world transplant care. C_LI

Mixed Modality Classification Cardiac Retrospective Clinical In Silico Academic Lab GenAI

The Effect of Image Resolution on the Performance of Deep Learning Algorithms in Detecting Calcaneus Fractures on X-Ray

Yee, N. J., Taseh, A., Ghandour, S., Sirls, E., Halai, M., Whyne, C., DiGiovanni, C. W., Kwon, J. Y., Ashkani-Esfahani, S. J.

•preprint•Sep 7 2025

PurposeTo evaluate convolutional neural network (CNN) model training strategies that optimize the performance of calcaneus fracture detection on radiographs at different image resolutions. Materials and MethodsThis retrospective study included foot radiographs from a single hospital between 2015 and 2022 for a total of 1,775 x-ray series (551 fractures; 1,224 without) and was split into training (70%), validation (15%), and testing (15%). ImageNet pre-trained ResNet models were fine-tuned on the dataset. Three training strategies were evaluated: 1) single size: trained exclusively on 128x128, 256x256, 512x512, 640x640, or 900x900 radiographs (5 model sets); 2) curriculum learning: trained exclusively on 128x128 radiographs then exclusively on 256x256, then 512x512, then 640x640, and finally on 900x900 (5 model sets); and 3) multi-scale augmentation: trained on x-ray images resized along continuous dimensions between 128x128 to 900x900 (1 model set). Inference time and training time were compared. ResultsMulti-scale augmentation trained models achieved the highest average area under the Receiver Operating Characteristic curve of 0.938 [95% CI: 0.936 - 0.939] for a single model across image resolutions compared to the other strategies without prolonging training or inference time. Using the optimal model sets, curriculum learning had the highest sensitivity on in-distribution low-resolution images (85.4% to 90.1%) and on out-of-distribution high-resolution images (78.2% to 89.2%). However, curriculum learning models took significantly longer to train (11.8 [IQR: 11.1-16.4] hours; P<.001). ConclusioWhile 512x512 images worked well for fracture identification, curriculum learning and multi-scale augmentation training strategies algorithmically improved model robustness towards different image resolutions without requiring additional annotated data. Summary statementDifferent deep learning training strategies affect performance in detecting calcaneus fractures on radiographs across in- and out-of-distribution image resolutions, with a multi-scale augmentation strategy conferring the greatest overall performance improvement in a single model. Key pointsO_LITraining strategies addressing differences in radiograph image resolution (or pixel dimensions) could improve deep learning performance. C_LIO_LIThe highest average performance across different image resolutions in a single model was achieved by multi-scale augmentation, where the sampled training dataset is uniformly resized between square resolutions of 128x128 to 900x900. C_LIO_LICompared to model training on a single image resolution, sequentially training on increasingly higher resolution images up to 900x900 (i.e., curriculum learning) resulted in higher fracture detection performance on images resolutions between 128x128 and 2048x2048. C_LI

X-Ray Detection Musculoskeletal Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags