Latest Papers on Radiology AI.

CT-Based deep learning platform combined with clinical parameters for predicting different discharge outcome in spontaneous intracerebral hemorrhage.

Wu TC, Chan MH, Lin KH, Liu CF, Chen JH, Chang RF

•papers•Sep 16 2025

This study aims to enhance the prognostic prediction of spontaneous intracerebral hemorrhage (sICH) by comparing the accuracy of three models: a CT-based deep learning model, a clinical variable-based machine learning model, and a hybrid model that integrates both approaches. The goal is to evaluate their performance across different outcome thresholds, including poor outcome (mRS 3-6), loss of independence (mRS 4-6), and severe disability or death (mRS 5-6). A retrospective analysis was conducted on 1,853 sICH patients from a stroke center database (2008-2021). Patients were divided into two datasets: Dataset A (958 patients) for training/testing the clinical and hybrid models, and Dataset B (895 patients) for training the deep learning model. The imaging model used a 3D ResNet-50 architecture with attention modules, while the clinical model incorporated 19 clinical variables. The hybrid model combined clinical data with prediction probability from the imaging model. Performance metrics were compared using the DeLong test. The hybrid model consistently outperformed the other models across all outcome thresholds. For predicting severe disability and death, loss of independence, and poor outcome, the hybrid model achieved accuracies of 82.6%, 79.5%, 80.6% with AUC values of 0.897, 0.871, 0.0873, respectively. GCS scores and imaging model prediction probability were the most significant predictors. The hybrid model, combining CT-based deep learning with clinical variables, offers superior prognostic prediction for sICH outcomes. This integrated approach shows promise for improving clinical decision-making, though further validation in prospective studies is needed. Not applicable because this is a retrospective study, not a clinical trial.

CT Classification Neurological Retrospective Clinical In Silico

More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era

Yingtai Li, Haoran Lai, Xiaoqian Zhou, Shuai Ming, Wenxin Ma, Wei Wei, Shaohua Kevin Zhou

•preprint•Sep 16 2025

The emergence of Large Language Models (LLMs) presents unprecedented opportunities to revolutionize medical contrastive vision-language pre-training. In this paper, we show how LLMs can facilitate large-scale supervised pre-training, thereby advancing vision-language alignment. We begin by demonstrate that modern LLMs can automatically extract diagnostic labels from radiology reports with remarkable precision (>96\% AUC in our experiments) without complex prompt engineering, enabling the creation of large-scale "silver-standard" datasets at a minimal cost (~\$3 for 50k CT image-report pairs). Further, we find that vision encoder trained on this "silver-standard" dataset achieves performance comparable to those trained on labels extracted by specialized BERT-based models, thereby democratizing the access to large-scale supervised pre-training. Building on this foundation, we proceed to reveal that supervised pre-training fundamentally improves contrastive vision-language alignment. Our approach achieves state-of-the-art performance using only a 3D ResNet-18 with vanilla CLIP training, including 83.8\% AUC for zero-shot diagnosis on CT-RATE, 77.3\% AUC on RAD-ChestCT, and substantial improvements in cross-modal retrieval (MAP@50=53.7\% for image-image, Recall@100=52.2\% for report-image). These results demonstrate the potential of utilizing LLMs to facilitate {\bf more performant and scalable} medical AI systems. Our code is avaiable at https://github.com/SadVoxel/More-performant-and-scalable.

CT Classification Chest Methodology In Silico Academic Lab Benchmark SOTA Open Code GenAI

Data Scaling Laws for Radiology Foundation Models

Maximilian Ilse, Harshita Sharma, Anton Schwaighofer, Sam Bond-Taylor, Fernando Pérez-García, Olesya Melnichenko, Anne-Marie G. Sykes, Kelly K. Horst, Ashish Khandelwal, Maxwell Reynolds, Maria T. Wetscherek, Noel C. F. Codella, Javier Alvarez-Valle, Korfiatis Panagiotis, Valentina Salvatelli

•preprint•Sep 16 2025

Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA GenAI

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

Can Peng, Yuyuan Liu, Yingyu Yang, Pramit Saha, Qianye Yang, J. Alison Noble

•preprint•Sep 16 2025

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy. However, the performance of deep learning often deteriorates in FL due to decentralized and heterogeneous data. This challenge is further amplified in multi-label scenarios, where data exhibit complex characteristics such as label co-occurrence, inter-label dependency, and discrepancies between local and global label relationships. While most existing FL research primarily focuses on single-label classification, many real-world applications, particularly in domains such as medical imaging, often involve multi-label settings. In this paper, we address this important yet underexplored scenario in FL, where clients hold multi-label data with skewed label distributions. Neural Collapse (NC) describes a geometric structure in the latent feature space where features of each class collapse to their class mean with vanishing intra-class variance, and the class means form a maximally separated configuration. Motivated by this theory, we propose a method to align feature distributions across clients and to learn high-quality, well-clustered representations. To make the NC-structure applicable to multi-label settings, where image-level features may contain multiple semantic concepts, we introduce a feature disentanglement module that extracts semantically specific features. The clustering of these disentangled class-wise features is guided by a predefined shared NC structure, which mitigates potential conflicts between client models due to diverse local data distributions. In addition, we design regularisation losses to encourage compact clustering in the latent feature space. Experiments conducted on four benchmark datasets across eight diverse settings demonstrate that our approach outperforms existing methods, validating its effectiveness in this challenging FL scenario.

Classification Methodology In Silico

A Computational Pipeline for Patient-Specific Modeling of Thoracic Aortic Aneurysm: From Medical Image to Finite Element Analysis

Jiasong Chen, Linchen Qian, Ruonan Gong, Christina Sun, Tongran Qin, Thuy Pham, Caitlin Martin, Mohammad Zafar, John Elefteriades, Wei Sun, Liang Liang

•preprint•Sep 16 2025

The aorta is the body's largest arterial vessel, serving as the primary pathway for oxygenated blood within the systemic circulation. Aortic aneurysms consistently rank among the top twenty causes of mortality in the United States. Thoracic aortic aneurysm (TAA) arises from abnormal dilation of the thoracic aorta and remains a clinically significant disease, ranking as one of the leading causes of death in adults. A thoracic aortic aneurysm ruptures when the integrity of all aortic wall layers is compromised due to elevated blood pressure. Currently, three-dimensional computed tomography (3D CT) is considered the gold standard for diagnosing TAA. The geometric characteristics of the aorta, which can be quantified from medical imaging, and stresses on the aortic wall, which can be obtained by finite element analysis (FEA), are critical in evaluating the risk of rupture and dissection. Deep learning based image segmentation has emerged as a reliable method for extracting anatomical regions of interest from medical images. Voxel based segmentation masks of anatomical structures are typically converted into structured mesh representation to enable accurate simulation. Hexahedral meshes are commonly used in finite element simulations of the aorta due to their computational efficiency and superior simulation accuracy. Due to anatomical variability, patient specific modeling enables detailed assessment of individual anatomical and biomechanics behaviors, supporting precise simulations, accurate diagnoses, and personalized treatment strategies. Finite element (FE) simulations provide valuable insights into the biomechanical behaviors of tissues and organs in clinical studies. Developing accurate FE models represents a crucial initial step in establishing a patient-specific, biomechanically based framework for predicting the risk of TAA.

CT Segmentation Cardiac Methodology Prototype

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

Meyer, P., Rocca, A., Banus, J., Ogier, A. C., Georgantas, C., Calarnou, P., Fatima, A., Vallee, J.-P., Deux, J.-F., Thomas, A., Marquis, J., Monney, P., Lu, H., Ledoux, J.-B., Tillier, C., Crowe, L. A., Abdurashidova, T., Richiardi, J., Hullin, R., van Heeswijk, R. B.

•preprint•Sep 16 2025

Introduction Heart failure (HF) is a life-threatening syndrome with significant morbidity and mortality. While evidence-based drug treatments have effectively reduced morbidity and mortality in HF with reduced ejection fraction (HFrEF), few therapies have been demonstrated to improve outcomes in HF with preserved ejection fraction (HFpEF). The multifaceted clinical presentation is one of the main reasons why the current understanding of HFpEF remains limited. This may be caused by the existence of several HFpEF disease subtypes that each need different treatments. There is therefore an unmet need for a holistic approach that combines comprehensive imaging with metabolomic, transcriptomic and genomic mapping to subtype HFpEF patients. This protocol details the approach employed in the HeartMagic study to address this gap in understanding. Methods This prospective multi-center observational cohort study will include 500 consecutive patients with actual or recent hospitalization for treatment of HFpEF at two Swiss university hospitals, along with 50 age-matched HFrEF patients and 50 age-matched healthy controls. Diagnosis of heart failure is based on clinical signs and symptoms and subgrouping HF patients is based on the left-ventricular ejection fraction. In addition to routine clinical workup, participants undergo genomic, transcriptomic, and metabolomic analyses, while the anatomy, composition, and function of the heart are quantified by comprehensive echocardiography and magnetic resonance imaging (MRI). Quantitative MRI is also applied to characterize the kidney. The primary outcome is a composite of one-year cardiovascular mortality or rehospitalization. Machine learning (ML) based multi-modal clustering will be employed to identify distinct HFpEF subtypes in the holistic data. The clinical importance of these subtypes shall be evaluated based on their association with the primary outcome. Statistical analysis will include group comparisons across modalities, survival analysis for the primary outcome, and integrative multi-modal clustering combining clinical, imaging, ECG, genomic, transcriptomic, and metabolomic data to identify and validate HFpEF subtypes. Discussion The integration of comprehensive MRI with extensive genomic and metabolomic profiling in this study will result in an unprecedented panoramic view of HFpEF and should enable us to distinguish functional subgroups of HFpEF patients. This approach has the potential to provide unprecedented insights on HFpEF disease and should provide a basis for personalized therapies. Beyond this, identifying HFpEF subtypes with specific molecular and structural characteristics could lead to new targeted pharmacological interventions, with the potential to improve patient outcomes.

MRI Classification Cardiac Prospective Clinical Pilot Academic Lab GenAI

Challenges and Limitations of Multimodal Large Language Models in Interpreting Pediatric Panoramic Radiographs.

Mine Y, Iwamoto Y, Okazaki S, Nishimura T, Tabata E, Takeda S, Peng TY, Nomura R, Kakimoto N, Murayama T

•papers•Sep 16 2025

Multimodal large language models (LLMs) have potential for medical image analysis, yet their reliability for pediatric panoramic radiographs remains uncertain. This study evaluated two multimodal LLMs (OpenAI o1, Claude 3.5 Sonnet) for detecting and counting teeth (including tooth germs) on pediatric panoramic radiographs. Eighty-seven pediatric panoramic radiographs from an open-source data set were analyzed. Two pediatric dentists annotated the presence or absence of each potential tooth position. Each image was processed five times by the LLMs using identical prompts, and the results were compared with the expert annotations. Standard performance metrics and Fleiss' kappa were calculated. Detailed examination revealed that subtle developmental stages and minor tooth loss were consistently misidentified. Claude 3.5 Sonnet had higher sensitivity but significantly lower specificity (29.8% ± 21.5%), resulting in many false positives. OpenAI o1 demonstrated superior specificity compared to Claude 3.5 Sonnet, but still failed to correctly detect subtle defects in certain mixed dentition cases. Both models showed large variability in repeated runs. Both LLMs failed to achieve clinically acceptable performance and cannot reliably identify nuanced discrepancies critical for pediatric dentistry. Further refinements and consistency improvements are essential before routine clinical use.

X-Ray Detection Retrospective Clinical In Silico Academic Lab Benchmark SOTA GenAI

AI-powered insights in pediatric nephrology: current applications and future opportunities.

Nada A, Ahmed Y, Hu J, Weidemann D, Gorman GH, Lecea EG, Sandokji IA, Cha S, Shin S, Bani-Hani S, Mannemuddhu SS, Ruebner RL, Kakajiwala A, Raina R, George R, Elchaki R, Moritz ML

•papers•Sep 16 2025

Artificial intelligence (AI) is rapidly emerging as a transformative force in pediatric nephrology, enabling improvements in diagnostic accuracy, therapeutic precision, and operational workflows. By integrating diverse datasets-including patient histories, genomics, imaging, and longitudinal clinical records-AI-driven tools can detect subtle kidney anomalies, predict acute kidney injury, and forecast disease progression. Deep learning models, for instance, have demonstrated the potential to enhance ultrasound interpretations, refine kidney biopsy assessments, and streamline pathology evaluations. Coupled with robust decision support systems, these innovations also optimize medication dosing and dialysis regimens, ultimately improving patient outcomes. AI-powered chatbots hold promise for improving patient engagement and adherence, while AI-assisted documentation solutions offer relief from administrative burdens, mitigating physician burnout. However, ethical and practical challenges remain. Healthcare professionals must receive adequate training to harness AI's capabilities, ensuring that such technologies bolster rather than erode the vital doctor-patient relationship. Safeguarding data privacy, minimizing algorithmic bias, and establishing standardized regulatory frameworks are critical for safe deployment. Beyond clinical care, AI can accelerate pediatric nephrology research by identifying biomarkers, enabling more precise patient recruitment, and uncovering novel therapeutic targets. As these tools evolve, interdisciplinary collaborations and ongoing oversight will be key to integrating AI responsibly. Harnessing AI's vast potential could revolutionize pediatric nephrology, championing a future of individualized, proactive, and empathetic care for children with kidney diseases. Through strategic collaboration and transparent development, these advanced technologies promise to minimize disparities, foster innovation, and sustain compassionate patient-centered care, shaping a new horizon in pediatric nephrology research and practice.

Ultrasound Classification Abdominal Review Prototype Academic Lab Ethics

Role of Artificial Intelligence in Lung Transplantation: Current State, Challenges, and Future Directions.

Duncheskie RP, Omari OA, Anjum F

•papers•Sep 16 2025

Lung transplantation remains a critical treatment for end-stage lung diseases, yet it continues to have 1 of the lowest survival rates among solid organ transplants. Despite its life-saving potential, the field faces several challenges, including organ shortages, suboptimal donor matching, and post-transplant complications. The rapidly advancing field of artificial intelligence (AI) offers significant promise in addressing these challenges. Traditional statistical models, such as linear and logistic regression, have been used to predict post-transplant outcomes but struggle to adapt to new trends and evolving data. In contrast, machine learning algorithms can evolve with new data, offering dynamic and updated predictions. AI holds the potential to enhance lung transplantation at multiple stages. In the pre-transplant phase, AI can optimize waitlist management, refine donor selection, and improve donor-recipient matching, and enhance diagnostic imaging by harnessing vast datasets. Post-transplant, AI can help predict allograft rejection, improve immunosuppressive management, and better forecast long-term patient outcomes, including quality of life. However, the integration of AI in lung transplantation also presents challenges, including data privacy concerns, algorithmic bias, and the need for external clinical validation. This review explores the current state of AI in lung transplantation, summarizes key findings from recent studies, and discusses the potential benefits, challenges, and ethical considerations in this rapidly evolving field, highlighting future research directions.

CT Classification Chest Review Concept Academic Lab Ethics Policy

Prediction of cerebrospinal fluid intervention in fetal ventriculomegaly via AI-powered normative modelling.

Zhou M, Rajan SA, Nedelec P, Bayona JB, Glenn O, Gupta N, Gano D, George E, Rauschecker AM

•papers•Sep 16 2025

Fetal ventriculomegaly (VM) is common and largely benign when isolated. However, it can occasionally progress to hydrocephalus, a more severe condition associated with increased mortality and neurodevelopmental delay that may require surgical postnatal intervention. Accurate differentiation between VM and hydrocephalus is essential but remains challenging, relying on subjective assessment and limited two-dimensional measurements. Deep learning-based segmentation offers a promising solution for objective and reproducible volumetric analysis. This work presents an AI-powered method for segmentation, volume quantification, and classification of the ventricles in fetal brain MRI to predict need for postnatal intervention. This retrospective study included 222 patients with singleton pregnancies. An nnUNet was trained to segment the fetal ventricles on 20 manually segmented, institutional fetal brain MRIs combined with 80 studies from a publicly available dataset. The validated model was then applied to 138 normal fetal brain MRIs to generate a normative reference range across a range of gestational ages (18-36 weeks). Finally it was applied to 64 fetal brains with VM (14 of which required postnatal intervention). ROC curves and AUC to predict VM and need for postnatal intervention were calculated. The nnUNet predicted segmentation of the fetal ventricles in the reference dataset were high quality and accurate (median Dice score 0.96, IQR 0.93-0.99). A normative reference range of ventricular volumes across gestational ages was developed using automated segmentation volumes. The optimal threshold for identifying VM was 2 standard deviations from normal with sensitivity of 92% and specificity of 93% (AUC 0.97, 95% CI 0.91-0.98). When normalized to intracranial volume, fetal ventricular volume was higher and subarachnoid volume lower among those who required postnatal intervention (p<0.001, p=0.003). The optimal threshold for identifying need for postnatal intervention was 11 standard deviations from normal with sensitivity of 86% and specificity of 100% (AUC 0.97, 95% CI 0.86-1.00). This work introduces a deep-learning based method for fast and accurate quantification of ventricular volumes in fetal brain MRI. A normative reference standard derived using this method can predict VM and need for postnatal CSF intervention. Increased ventricular volume is a strong predictor for postnatal intervention. VM = ventriculomegaly, 2D = two-dimensional, 3D = three-dimensional, ROC = receiver operating characteristics, AUC = area under curve.

MRI Segmentation Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA Open Dataset

Filter Papers

Tags

CT-Based deep learning platform combined with clinical parameters for predicting different discharge outcome in spontaneous intracerebral hemorrhage.

More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era

Data Scaling Laws for Radiology Foundation Models

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

A Computational Pipeline for Patient-Specific Modeling of Thoracic Aortic Aneurysm: From Medical Image to Finite Element Analysis

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

Challenges and Limitations of Multimodal Large Language Models in Interpreting Pediatric Panoramic Radiographs.

AI-powered insights in pediatric nephrology: current applications and future opportunities.

Role of Artificial Intelligence in Lung Transplantation: Current State, Challenges, and Future Directions.

Prediction of cerebrospinal fluid intervention in fetal ventriculomegaly via AI-powered normative modelling.

Ready to Sharpen Your Edge?