Latest Papers on Radiology AI. Category: preprint, Order: Best Match, Limit: 10.

Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation

Anne-Marie Rickmann, Stephanie L. Thorn, Shawn S. Ahn, Supum Lee, Selen Uman, Taras Lysyy, Rachel Burns, Nicole Guerrera, Francis G. Spinale, Jason A. Burdick, Albert J. Sinusas, James S. Duncan

•preprint•May 14 2025

Cardiac image segmentation is an important step in many cardiac image analysis and modeling tasks such as motion tracking or simulations of cardiac mechanics. While deep learning has greatly advanced segmentation in clinical settings, there is limited work on pre-clinical imaging, notably in porcine models, which are often used due to their anatomical and physiological similarity to humans. However, differences between species create a domain shift that complicates direct model transfer from human to pig data. Recently, foundation models trained on large human datasets have shown promise for robust medical image segmentation; yet their applicability to porcine data remains largely unexplored. In this work, we investigate whether foundation models can generate sufficiently accurate pseudo-labels for pig cardiac CT and propose a simple self-training approach to iteratively refine these labels. Our method requires no manually annotated pig data, relying instead on iterative updates to improve segmentation quality. We demonstrate that this self-training process not only enhances segmentation accuracy but also smooths out temporal inconsistencies across consecutive frames. Although our results are encouraging, there remains room for improvement, for example by incorporating more sophisticated self-training strategies and by exploring additional foundation models and other cardiac imaging technologies.

CT Segmentation Cardiac Methodology In Silico GenAI

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

•preprint•May 14 2025

Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping. Methods: We utilized a dataset provided by RSNA, comprising 192 NCCT volumes. The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer. Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation. Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively. For subtype classification, MLLMs also exhibit inferior performance compared to traditional deep learning models, with Gemini 2.0 Flash achieving an macro-averaged precision of 0.41 and a macro-averaged F1 score of 0.31. Conclusion: While MLLMs excel in interactive capabilities, their overall accuracy in ICH subtyping is inferior to deep networks. However, MLLMs enhance interpretability through language interactions, indicating potential in medical imaging analysis. Future efforts will focus on model refinement and developing more precise MLLMs to improve performance in three-dimensional medical image processing.

CT Classification Neurological Retrospective Clinical In Silico GenAI Benchmark SOTA

Fed-ComBat: A Generalized Federated Framework for Batch Effect Harmonization in Collaborative Studies

Silva, S., Lorenzi, M., Altmann, A., Oxtoby, N.

•preprint•May 14 2025

In neuroimaging research, the utilization of multi-centric analyses is crucial for obtaining sufficient sample sizes and representative clinical populations. Data harmonization techniques are typically part of the pipeline in multi-centric studies to address systematic biases and ensure the comparability of the data. However, most multi-centric studies require centralized data, which may result in exposing individual patient information. This poses a significant challenge in data governance, leading to the implementation of regulations such as the GDPR and the CCPA, which attempt to address these concerns but also hinder data access for researchers. Federated learning offers a privacy-preserving alternative approach in machine learning, enabling models to be collaboratively trained on decentralized data without the need for data centralization or sharing. In this paper, we present Fed-ComBat, a federated framework for batch effect harmonization on decentralized data. Fed-ComBat extends existing centralized linear methods, such as ComBat and distributed as d-ComBat, and nonlinear approaches like ComBat-GAM in accounting for potentially nonlinear and multivariate covariate effects. By doing so, Fed-ComBat enables the preservation of nonlinear covariate effects without requiring centralization of data and without prior knowledge of which variables should be considered nonlinear or their interactions, differentiating it from ComBat-GAM. We assessed Fed-ComBat and existing approaches on simulated data and multiple cohorts comprising healthy controls (CN) and subjects with various disorders such as Parkinson's disease (PD), Alzheimer's disease (AD), and autism spectrum disorder (ASD). The results of our study show that Fed-ComBat performs better than centralized ComBat when dealing with nonlinear effects and is on par with centralized methods like ComBat-GAM. Through experiments using synthetic data, Fed-ComBat demonstrates a superior ability to reconstruct the target unbiased function, achieving a 35% improvement (RMSE=0.5952) compared to d-ComBat (RMSE=0.9162) and a 12% improvement compared to our proposal to federate ComBat-GAM, d-ComBat-GAM (RMSE=0.6751). Additionally, Fed-ComBat achieves comparable results to centralized methods like ComBat-GAM for MRI-derived phenotypes without requiring prior knowledge of potential nonlinearities.

MRI Reconstruction Neurological Methodology In Silico Academic Lab Reproducibility

Explainability Through Human-Centric Design for XAI in Lung Cancer Detection

Amy Rafferty, Rishi Ramaesh, Ajitha Rajan

•preprint•May 14 2025

Deep learning models have shown promise in lung pathology detection from chest X-rays, but widespread clinical adoption remains limited due to opaque model decision-making. In prior work, we introduced ClinicXAI, a human-centric, expert-guided concept bottleneck model (CBM) designed for interpretable lung cancer diagnosis. We now extend that approach and present XpertXAI, a generalizable expert-driven model that preserves human-interpretable clinical concepts while scaling to detect multiple lung pathologies. Using a high-performing InceptionV3-based classifier and a public dataset of chest X-rays with radiology reports, we compare XpertXAI against leading post-hoc explainability methods and an unsupervised CBM, XCBs. We assess explanations through comparison with expert radiologist annotations and medical ground truth. Although XpertXAI is trained for multiple pathologies, our expert validation focuses on lung cancer. We find that existing techniques frequently fail to produce clinically meaningful explanations, omitting key diagnostic features and disagreeing with radiologist judgments. XpertXAI not only outperforms these baselines in predictive accuracy but also delivers concept-level explanations that better align with expert reasoning. While our focus remains on explainability in lung cancer detection, this work illustrates how human-centric model design can be effectively extended to broader diagnostic contexts - offering a scalable path toward clinically meaningful explainable AI in medical diagnostics.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab Ethics

Single View Echocardiographic Analysis for Left Ventricular Outflow Tract Obstruction Prediction in Hypertrophic Cardiomyopathy: A Deep Learning Approach

Kim, J., Park, J., Jeon, J., Yoon, Y. E., Jang, Y., Jeong, H., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Cho, G.-Y., Chang, H.-J.

•preprint•May 14 2025

BackgroundAccurate left ventricular outflow tract obstruction (LVOTO) assessment is crucial for hypertrophic cardiomyopathy (HCM) management and prognosis. Traditional methods, requiring multiple views, Doppler, and provocation, is often infeasible, especially where resources are limited. This study aimed to develop and validate a deep learning (DL) model capable of predicting severe LVOTO in HCM patients using only the parasternal long-axis (PLAX) view from transthoracic echocardiography (TTE). MethodsA DL model was trained on PLAX videos extracted from TTE examinations (developmental dataset, n=1,007) to capture both morphological and dynamic motion features, generating a DL index for LVOTO (DLi-LVOTO, range 0-100). Performance was evaluated in an internal test dataset (ITDS, n=87) and externally validated in the distinct hospital dataset (DHDS, n=1,334) and the LVOTO reduction treatment dataset (n=156). ResultsThe model achieved high accuracy in detecting severe LVOTO (pressure gradient[≥] 50mmHg), with area under the receiver operating characteristics curve (AUROC) of 0.97 (95% confidence interval: 0.92-1.00) in ITDS and 0.93 (0.92-0.95) in DHDS. At a DLi-LVOTO threshold of 70, the model demonstrated a specificity of 97.3% and negative predictive value (NPV) of 96.1% in ITDS. In DHDS, a cutoff of 60 yielded a specificity of 94.6% and NPV of 95.5%. DLi-LVOTO also decreased significantly after surgical myectomy or Mavacamten treatment, correlating with reductions in peak pressure gradient (p<0.001 for all). ConclusionsOur DL-based approach predicts severe LVOTO using only the PLAX view from TTE, serving as a complementary tool, particularly in resource-limited settings or when Doppler is unavailable, and for monitoring treatment response.

Ultrasound Classification Cardiac Retrospective Clinical In Silico Academic Lab

Multi-Task Deep Learning for Predicting Metabolic Syndrome from Retinal Fundus Images in a Japanese Health Checkup Dataset

Itoh, T., Nishitsuka, K., Fukuma, Y., Wada, S.

•preprint•May 14 2025

BackgroundRetinal fundus images provide a noninvasive window into systemic health, offering opportunities for early detection of metabolic disorders such as metabolic syndrome (METS). ObjectiveThis study aimed to develop a deep learning model to predict METS from fundus images obtained during routine health checkups, leveraging a multi-task learning approach. MethodsWe retrospectively analyzed 5,000 fundus images from Japanese health checkup participants. Convolutional neural network (CNN) models were trained to classify METS status, incorporating fundus-specific data augmentation strategies and auxiliary regression tasks targeting clinical parameters such as abdominal circumference (AC). Model performance was evaluated using validation accuracy, test accuracy, and the area under the receiver operating characteristic curve (AUC). ResultsModels employing fundus-specific augmentation demonstrated more stable convergence and superior validation accuracy compared to general-purpose augmentation. Incorporating AC as an auxiliary task further enhanced performance across architectures. The final ensemble model with test-time augmentation achieved a test accuracy of 0.696 and an AUC of 0.73178. ConclusionCombining multi-task learning, fundus-specific data augmentation, and ensemble prediction substantially improves deep learning-based METS classification from fundus images. This approach may offer a practical, noninvasive screening tool for metabolic syndrome in general health checkup settings.

OCT Classification Retrospective Clinical In Silico Academic Lab

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Meritxell Riera-Marin, Sikha O K, Julia Rodriguez-Comas, Matthias Stefan May, Zhaohong Pan, Xiang Zhou, Xiaokun Liang, Franciskus Xaverius Erick, Andrea Prenner, Cedric Hemon, Valentin Boussot, Jean-Louis Dillenseger, Jean-Claude Nunes, Abdul Qayyum, Moona Mazher, Steven A Niederer, Kaisar Kushibar, Carlos Martin-Isla, Petia Radeva, Karim Lekadir, Theodore Barfoot, Luis C. Garcia Peraza Herrera, Ben Glocker, Tom Vercauteren, Lucas Gago, Justin Englemann, Joy-Marie Kleiss, Anton Aubanell, Andreu Antolin, Javier Garcia-Lopez, Miguel A. Gonzalez Ballester, Adrian Galdran

•preprint•May 13 2025

Deep learning (DL) has become the dominant approach for medical image segmentation, yet ensuring the reliability and clinical applicability of these models requires addressing key challenges such as annotation variability, calibration, and uncertainty estimation. This is why we created the Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS), which highlights the critical role of multiple annotators in establishing a more comprehensive ground truth, emphasizing that segmentation is inherently subjective and that leveraging inter-annotator variability is essential for robust model evaluation. Seven teams participated in the challenge, submitting a variety of DL models evaluated using metrics such as Dice Similarity Coefficient (DSC), Expected Calibration Error (ECE), and Continuous Ranked Probability Score (CRPS). By incorporating consensus and dissensus ground truth, we assess how DL models handle uncertainty and whether their confidence estimates align with true segmentation performance. Our findings reinforce the importance of well-calibrated models, as better calibration is strongly correlated with the quality of the results. Furthermore, we demonstrate that segmentation models trained on diverse datasets and enriched with pre-trained knowledge exhibit greater robustness, particularly in cases deviating from standard anatomical structures. Notably, the best-performing models achieved high DSC and well-calibrated uncertainty estimates. This work underscores the need for multi-annotator ground truth, thorough calibration assessments, and uncertainty-aware evaluations to develop trustworthy and clinically reliable DL-based medical image segmentation models.

Mixed Modality Segmentation Whole Body Retrospective Clinical In Silico Consortium Benchmark SOTA Reproducibility

Congenital Heart Disease recognition using Deep Learning/Transformer models

Aidar Amangeldi, Vladislav Yarovenko, Angsar Taigonyrov

•preprint•May 13 2025

Congenital Heart Disease (CHD) remains a leading cause of infant morbidity and mortality, yet non-invasive screening methods often yield false negatives. Deep learning models, with their ability to automatically extract features, can assist doctors in detecting CHD more effectively. In this work, we investigate the use of dual-modality (sound and image) deep learning methods for CHD diagnosis. We achieve 73.9% accuracy on the ZCHSound dataset and 80.72% accuracy on the DICOM Chest X-ray dataset.

X-Ray Classification Cardiac Retrospective Clinical In Silico

Unsupervised Out-of-Distribution Detection in Medical Imaging Using Multi-Exit Class Activation Maps and Feature Masking

Yu-Jen Chen, Xueyang Li, Yiyu Shi, Tsung-Yi Ho

•preprint•May 13 2025

Out-of-distribution (OOD) detection is essential for ensuring the reliability of deep learning models in medical imaging applications. This work is motivated by the observation that class activation maps (CAMs) for in-distribution (ID) data typically emphasize regions that are highly relevant to the model's predictions, whereas OOD data often lacks such focused activations. By masking input images with inverted CAMs, the feature representations of ID data undergo more substantial changes compared to those of OOD data, offering a robust criterion for differentiation. In this paper, we introduce a novel unsupervised OOD detection framework, Multi-Exit Class Activation Map (MECAM), which leverages multi-exit CAMs and feature masking. By utilizing mult-exit networks that combine CAMs from varying resolutions and depths, our method captures both global and local feature representations, thereby enhancing the robustness of OOD detection. We evaluate MECAM on multiple ID datasets, including ISIC19 and PathMNIST, and test its performance against three medical OOD datasets, RSNA Pneumonia, COVID-19, and HeadCT, and one natural image OOD dataset, iSUN. Comprehensive comparisons with state-of-the-art OOD detection methods validate the effectiveness of our approach. Our findings emphasize the potential of multi-exit networks and feature masking for advancing unsupervised OOD detection in medical imaging, paving the way for more reliable and interpretable models in clinical practice.

Mixed Modality Classification Methodology In Silico Benchmark SOTA

A Deep Learning-Driven Framework for Inhalation Injury Grading Using Bronchoscopy Images

Yifan Li, Alan W Pang, Jo Woon Chong

•preprint•May 13 2025

Inhalation injuries face a challenge in clinical diagnosis and grading due to the limitations of traditional methods, such as Abbreviated Injury Score (AIS), which rely on subjective assessments and show weak correlations with clinical outcomes. This study introduces a novel deep learning-based framework for grading inhalation injuries using bronchoscopy images with the duration of mechanical ventilation as an objective metric. To address the scarcity of medical imaging data, we propose enhanced StarGAN, a generative model that integrates Patch Loss and SSIM Loss to improve synthetic images' quality and clinical relevance. The augmented dataset generated by enhanced StarGAN significantly improved classification performance when evaluated using the Swin Transformer, achieving an accuracy of 77.78%, an 11.11% improvement over the original dataset. Image quality was assessed using the Fr\'echet Inception Distance (FID), where Enhanced StarGAN achieved the lowest FID of 30.06, outperforming baseline models. Burn surgeons confirmed the realism and clinical relevance of the generated images, particularly the preservation of bronchial structures and color distribution. These results highlight the potential of enhanced StarGAN in addressing data limitations and improving classification accuracy for inhalation injury grading.

Mixed Modality Classification Chest Methodology In Silico Open Code GenAI

Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Fed-ComBat: A Generalized Federated Framework for Batch Effect Harmonization in Collaborative Studies

Explainability Through Human-Centric Design for XAI in Lung Cancer Detection

Single View Echocardiographic Analysis for Left Ventricular Outflow Tract Obstruction Prediction in Hypertrophic Cardiomyopathy: A Deep Learning Approach

Multi-Task Deep Learning for Predicting Metabolic Syndrome from Retinal Fundus Images in a Japanese Health Checkup Dataset

Calibration and Uncertainty for multiRater Volume Assessment in multiorgan Segmentation (CURVAS) challenge results

Congenital Heart Disease recognition using Deep Learning/Transformer models

Unsupervised Out-of-Distribution Detection in Medical Imaging Using Multi-Exit Class Activation Maps and Feature Masking

A Deep Learning-Driven Framework for Inhalation Injury Grading Using Bronchoscopy Images

Ready to Sharpen Your Edge?