Latest Papers on Radiology AI. Tags: Classification

Data Scaling Laws for Radiology Foundation Models

Maximilian Ilse, Harshita Sharma, Anton Schwaighofer, Sam Bond-Taylor, Fernando Pérez-García, Olesya Melnichenko, Anne-Marie G. Sykes, Kelly K. Horst, Ashish Khandelwal, Maxwell Reynolds, Maria T. Wetscherek, Noel C. F. Codella, Javier Alvarez-Valle, Korfiatis Panagiotis, Valentina Salvatelli

•preprint•Sep 16 2025

Foundation vision encoders such as CLIP and DINOv2, trained on web-scale data, exhibit strong transfer performance across tasks and datasets. However, medical imaging foundation models remain constrained by smaller datasets, limiting our understanding of how data scale and pretraining paradigms affect performance in this setting. In this work, we systematically study continual pretraining of two vision encoders, MedImageInsight (MI2) and RAD-DINO representing the two major encoder paradigms CLIP and DINOv2, on up to 3.5M chest x-rays from a single institution, holding compute and evaluation protocols constant. We evaluate on classification (radiology findings, lines and tubes), segmentation (lines and tubes), and radiology report generation. While prior work has primarily focused on tasks related to radiology findings, we include lines and tubes tasks to counterbalance this bias and evaluate a model's ability to extract features that preserve continuity along elongated structures. Our experiments show that MI2 scales more effectively for finding-related tasks, while RAD-DINO is stronger on tube-related tasks. Surprisingly, continually pretraining MI2 with both reports and structured labels using UniCL improves performance, underscoring the value of structured supervision at scale. We further show that for some tasks, as few as 30k in-domain samples are sufficient to surpass open-weights foundation models. These results highlight the utility of center-specific continual pretraining, enabling medical institutions to derive significant performance gains by utilizing in-domain data.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA GenAI

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

Can Peng, Yuyuan Liu, Yingyu Yang, Pramit Saha, Qianye Yang, J. Alison Noble

•preprint•Sep 16 2025

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy. However, the performance of deep learning often deteriorates in FL due to decentralized and heterogeneous data. This challenge is further amplified in multi-label scenarios, where data exhibit complex characteristics such as label co-occurrence, inter-label dependency, and discrepancies between local and global label relationships. While most existing FL research primarily focuses on single-label classification, many real-world applications, particularly in domains such as medical imaging, often involve multi-label settings. In this paper, we address this important yet underexplored scenario in FL, where clients hold multi-label data with skewed label distributions. Neural Collapse (NC) describes a geometric structure in the latent feature space where features of each class collapse to their class mean with vanishing intra-class variance, and the class means form a maximally separated configuration. Motivated by this theory, we propose a method to align feature distributions across clients and to learn high-quality, well-clustered representations. To make the NC-structure applicable to multi-label settings, where image-level features may contain multiple semantic concepts, we introduce a feature disentanglement module that extracts semantically specific features. The clustering of these disentangled class-wise features is guided by a predefined shared NC structure, which mitigates potential conflicts between client models due to diverse local data distributions. In addition, we design regularisation losses to encourage compact clustering in the latent feature space. Experiments conducted on four benchmark datasets across eight diverse settings demonstrate that our approach outperforms existing methods, validating its effectiveness in this challenging FL scenario.

Classification Methodology In Silico

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

Meyer, P., Rocca, A., Banus, J., Ogier, A. C., Georgantas, C., Calarnou, P., Fatima, A., Vallee, J.-P., Deux, J.-F., Thomas, A., Marquis, J., Monney, P., Lu, H., Ledoux, J.-B., Tillier, C., Crowe, L. A., Abdurashidova, T., Richiardi, J., Hullin, R., van Heeswijk, R. B.

•preprint•Sep 16 2025

Introduction Heart failure (HF) is a life-threatening syndrome with significant morbidity and mortality. While evidence-based drug treatments have effectively reduced morbidity and mortality in HF with reduced ejection fraction (HFrEF), few therapies have been demonstrated to improve outcomes in HF with preserved ejection fraction (HFpEF). The multifaceted clinical presentation is one of the main reasons why the current understanding of HFpEF remains limited. This may be caused by the existence of several HFpEF disease subtypes that each need different treatments. There is therefore an unmet need for a holistic approach that combines comprehensive imaging with metabolomic, transcriptomic and genomic mapping to subtype HFpEF patients. This protocol details the approach employed in the HeartMagic study to address this gap in understanding. Methods This prospective multi-center observational cohort study will include 500 consecutive patients with actual or recent hospitalization for treatment of HFpEF at two Swiss university hospitals, along with 50 age-matched HFrEF patients and 50 age-matched healthy controls. Diagnosis of heart failure is based on clinical signs and symptoms and subgrouping HF patients is based on the left-ventricular ejection fraction. In addition to routine clinical workup, participants undergo genomic, transcriptomic, and metabolomic analyses, while the anatomy, composition, and function of the heart are quantified by comprehensive echocardiography and magnetic resonance imaging (MRI). Quantitative MRI is also applied to characterize the kidney. The primary outcome is a composite of one-year cardiovascular mortality or rehospitalization. Machine learning (ML) based multi-modal clustering will be employed to identify distinct HFpEF subtypes in the holistic data. The clinical importance of these subtypes shall be evaluated based on their association with the primary outcome. Statistical analysis will include group comparisons across modalities, survival analysis for the primary outcome, and integrative multi-modal clustering combining clinical, imaging, ECG, genomic, transcriptomic, and metabolomic data to identify and validate HFpEF subtypes. Discussion The integration of comprehensive MRI with extensive genomic and metabolomic profiling in this study will result in an unprecedented panoramic view of HFpEF and should enable us to distinguish functional subgroups of HFpEF patients. This approach has the potential to provide unprecedented insights on HFpEF disease and should provide a basis for personalized therapies. Beyond this, identifying HFpEF subtypes with specific molecular and structural characteristics could lead to new targeted pharmacological interventions, with the potential to improve patient outcomes.

MRI Classification Cardiac Prospective Clinical Pilot Academic Lab GenAI

AI-powered insights in pediatric nephrology: current applications and future opportunities.

Nada A, Ahmed Y, Hu J, Weidemann D, Gorman GH, Lecea EG, Sandokji IA, Cha S, Shin S, Bani-Hani S, Mannemuddhu SS, Ruebner RL, Kakajiwala A, Raina R, George R, Elchaki R, Moritz ML

•papers•Sep 16 2025

Artificial intelligence (AI) is rapidly emerging as a transformative force in pediatric nephrology, enabling improvements in diagnostic accuracy, therapeutic precision, and operational workflows. By integrating diverse datasets-including patient histories, genomics, imaging, and longitudinal clinical records-AI-driven tools can detect subtle kidney anomalies, predict acute kidney injury, and forecast disease progression. Deep learning models, for instance, have demonstrated the potential to enhance ultrasound interpretations, refine kidney biopsy assessments, and streamline pathology evaluations. Coupled with robust decision support systems, these innovations also optimize medication dosing and dialysis regimens, ultimately improving patient outcomes. AI-powered chatbots hold promise for improving patient engagement and adherence, while AI-assisted documentation solutions offer relief from administrative burdens, mitigating physician burnout. However, ethical and practical challenges remain. Healthcare professionals must receive adequate training to harness AI's capabilities, ensuring that such technologies bolster rather than erode the vital doctor-patient relationship. Safeguarding data privacy, minimizing algorithmic bias, and establishing standardized regulatory frameworks are critical for safe deployment. Beyond clinical care, AI can accelerate pediatric nephrology research by identifying biomarkers, enabling more precise patient recruitment, and uncovering novel therapeutic targets. As these tools evolve, interdisciplinary collaborations and ongoing oversight will be key to integrating AI responsibly. Harnessing AI's vast potential could revolutionize pediatric nephrology, championing a future of individualized, proactive, and empathetic care for children with kidney diseases. Through strategic collaboration and transparent development, these advanced technologies promise to minimize disparities, foster innovation, and sustain compassionate patient-centered care, shaping a new horizon in pediatric nephrology research and practice.

Ultrasound Classification Abdominal Review Prototype Academic Lab Ethics

Role of Artificial Intelligence in Lung Transplantation: Current State, Challenges, and Future Directions.

Duncheskie RP, Omari OA, Anjum F

•papers•Sep 16 2025

Lung transplantation remains a critical treatment for end-stage lung diseases, yet it continues to have 1 of the lowest survival rates among solid organ transplants. Despite its life-saving potential, the field faces several challenges, including organ shortages, suboptimal donor matching, and post-transplant complications. The rapidly advancing field of artificial intelligence (AI) offers significant promise in addressing these challenges. Traditional statistical models, such as linear and logistic regression, have been used to predict post-transplant outcomes but struggle to adapt to new trends and evolving data. In contrast, machine learning algorithms can evolve with new data, offering dynamic and updated predictions. AI holds the potential to enhance lung transplantation at multiple stages. In the pre-transplant phase, AI can optimize waitlist management, refine donor selection, and improve donor-recipient matching, and enhance diagnostic imaging by harnessing vast datasets. Post-transplant, AI can help predict allograft rejection, improve immunosuppressive management, and better forecast long-term patient outcomes, including quality of life. However, the integration of AI in lung transplantation also presents challenges, including data privacy concerns, algorithmic bias, and the need for external clinical validation. This review explores the current state of AI in lung transplantation, summarizes key findings from recent studies, and discusses the potential benefits, challenges, and ethical considerations in this rapidly evolving field, highlighting future research directions.

CT Classification Chest Review Concept Academic Lab Ethics Policy

Predicting cardiovascular events from routine mammograms using machine learning.

Barraclough JY, Gandomkar Z, Fletcher RA, Barbieri S, Kuo NI, Rodgers A, Douglas K, Poppe KK, Woodward M, Luxan BG, Neal B, Jorm L, Brennan P, Arnott C

•papers•Sep 16 2025

Cardiovascular risk is underassessed in women. Many women undergo screening mammography in midlife when the risk of cardiovascular disease rises. Mammographic features such as breast arterial calcification and tissue density are associated with cardiovascular risk. We developed and tested a deep learning algorithm for cardiovascular risk prediction based on routine mammography images. Lifepool is a cohort of women with at least one screening mammogram linked to hospitalisation and death databases. A deep learning model based on DeepSurv architecture was developed to predict major cardiovascular events from mammography images. Model performance was compared against standard risk prediction models using the concordance index, comparative to the Harrells C-statistic. There were 49 196 women included, with a median follow-up of 8.8 years (IQR 7.7-10.6), among whom 3392 experienced a first major cardiovascular event. The DeepSurv model using mammography features and participant age had a concordance index of 0.72 (95% CI 0.71 to 0.73), with similar performance to modern models containing age and clinical variables including the New Zealand 'PREDICT' tool and the American Heart Association 'PREVENT' equations. A deep learning algorithm based on only mammographic features and age predicted cardiovascular risk with performance comparable to traditional cardiovascular risk equations. Risk assessments based on mammography may be a novel opportunity for improving cardiovascular risk screening in women.

Mammography Classification Breast Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Cross-modality transformer model leveraging DCE-MRI and pathological images for predicting pathological complete response and lymph node metastasis in breast cancer.

Fan M, Zhu Z, Yu Z, Du J, Xie S, Pan X, Chen S, Li L

•papers•Sep 16 2025

Pathological diagnosis remains the gold standard for diagnosing breast cancer and is highly accurate and sensitive, which is crucial for assessing pathological complete response (pCR) and lymph node metastasis (LNM) following neoadjuvant chemotherapy (NACT). Dynamic contrast-enhanced MRI (DCE-MRI) is a noninvasive technique that provides detailed morphological and functional insights into tumors. The optimal complementarity of these two modalities, particularly in situations where one is unavailable, and their integration to enhance therapeutic predictions have not been fully explored. To this end, we propose a cross-modality image transformer (CMIT) model designed for feature synthesis and fusion to predict pCR and LNM in breast cancer. This model enables interaction and integration between the two modalities via a transformer's cross-attention module. A modality information transfer module is developed to produce synthetic pathological image features (sPIFs) from DCE-MRI data and synthetic DCE-MRI features (sMRIs) from pathological images. During training, the model leverages both real and synthetic imaging features to increase the predictive performance. In the prediction phase, the synthetic imaging features are fused with the corresponding real imaging feature to make predictions. The experimental results demonstrate that the proposed CMIT model, which integrates DCE-MRI with sPIFs or histopathological images with sMRI, outperforms (with AUCs of 0.809 and 0.852, respectively) the use of MRI or pathological images alone in predicting the pCR to NACT. Similar improvements were observed in LNM prediction. For LNM prediction, the DCE-MRI model's performance improved from an AUC of 0.637 to 0.712, while the DCE-MRI-guided histopathological model achieved an AUC of 0.792. Notably, our proposed model can predict treatment response effectively via DCE-MRI, regardless of the availability of actual histopathological images.

Mixed Modality Classification Breast Methodology In Silico

Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions.

Kim K, Kim BC

•papers•Sep 16 2025

Large language models (LLMs), such as ChatGPT and Gemini, are increasingly being used in medical domains, including dental diagnostics. Despite advancements in image-based deep learning systems, LLM diagnostic capabilities in oral and maxillofacial surgery (OMFS) for processing multi-modal imaging inputs remain underexplored. Radiolucent jaw lesions represent a particularly challenging diagnostic category due to their varied presentations and overlapping radiographic features. This study evaluated diagnostic performance of ChatGPT 4o and Gemini 2.5 Pro using real-world OMFS radiolucent jaw lesion cases, presented in multiple-choice (MCQ) and short-answer (SAQ) formats across 3 imaging conditions: panoramic radiography only, panoramic + CT, and panoramic + CT + pathology. Data from 100 anonymized patients at Wonkwang University Daejeon Dental Hospital were analyzed, including demographics, panoramic radiographs, CBCT images, histopathology slides, and confirmed diagnoses. Sample size was determined based on institutional case availability and statistical power requirements for comparative analysis. ChatGPT and Gemini diagnosed each case under 6 conditions using 3 imaging modalities (P, P+C, P+C+B) in MCQ and SAQ formats. Model accuracy was scored against expert-confirmed diagnoses by 2 independent evaluators. McNemar's and Cochran's Q tests evaluated statistical differences across models and imaging modalities. For MCQ tasks, ChatGPT achieved 66%, 73%, and 82% accuracies across the P, P+C, and P+C+B conditions, respectively, while Gemini achieved 57%, 62%, and 63%, respectively. In SAQ tasks, ChatGPT achieved 34%, 45%, and 48%; Gemini achieved 15%, 24%, and 28%, respectively. Accuracy improved significantly with additional imaging data for ChatGPT; ChatGPT consistently outperformed Gemini across all conditions (P < .001 for MCQ; P = .008 to < .001 for SAQ). MCQ format, which incorporates a human-in-the-loop (HITL) structure, showed higher overall performance than SAQ. ChatGPT demonstrated superior diagnostic performance compared to Gemini in OMFS diagnostic tasks when provided with richer multimodal inputs. Diagnostic accuracy increased with additional imaging data, especially in MCQ formats, suggesting LLMs can effectively synthesize radiographic and pathological data. LLMs have potential as diagnostic support tools for OMFS, especially in settings with limited specialist access. Presenting clinical cases in structured formats using curated imaging data enhances LLM accuracy and underscores HITL integration. Although current LLMs show promising results, further validation using larger datasets and hybrid AI systems are necessary for broader contextualised, clinical adoption.

Mixed Modality Classification Retrospective Clinical In Silico Academic Lab GenAI

Multi-filter stacking in inception V3 for enhanced Alzheimer's severity classification.

Iqbal A, Iqbal K, Shah YA, Ullah F, Khan J, Yaqoob S

•papers•Sep 16 2025

Alzheimer's disease, a progressive neurodegenerative disorder, is characterized by a decline in brain volume and neuronal loss, with early symptoms often presenting as short-term memory impairment. Automated classification of Alzheimer's disease remains a significant challenge due to inter-patient variability in brain morphology, aging effects, and overlapping anatomical features across different stages. While traditional machine learning techniques, such as Support Vector Machines (SVMs) and various Deep Neural Network (DNN) models, have been explored, the need for more accurate and efficient classification techniques persists. In this study, we propose a novel approach that integrates Multi-Filter Stacking with the Inception V3 architecture, referred to as CASFI (Classifying Alzheimer's Severity using Filter Integration). This method leverages diverse convolutional filter sizes to capture multiscale spatial features, enhancing the model's ability to detect subtle structural variations associated with different Alzheimer's disease stages. Applied to MRI data, CASFI achieved an accuracy of 97.27%, outperforming baseline deep learning models and traditional classifiers in both accuracy and robustness. This approach supports early diagnosis and informed clinical decision-making, providing a valuable tool to assist healthcare professionals in managing and planning treatment for Alzheimer's patients.

MRI Classification Neurological Methodology In Silico

Developing and Validation of a Multimodal-based Machine Learning Model for Diagnosis of Usual Interstitial Pneumonia: a Prospective Multicenter Study.

Wang H, Liu A, Ni Y, Wang J, Du J, Xi L, Qiang Y, Xie B, Ren Y, Wang S, Geng J, Deng Y, Huang S, Zhang R, Liu M, Dai H

•papers•Sep 16 2025

Usual Interstitial Pneumonia (UIP) indicates poor prognosis, and there is significant heterogeneity in the diagnosis of UIP, necessitating an auxiliary diagnostic tool. Can a machine learning (ML) classifier using radiomics features and clinical data accurately identify UIP from patients with interstitial lung diseases (ILD)? This dataset from a prospective cohort consists of 5321 sets of high-resolution computed tomography (HRCT) images from 2901 patients with ILD (male: 63.5%, age: 61.7 ± 10.8 years) across three medical centers. Multimodal data, including whole-lung radiomics features on HRCT and demographics, smoking, lung function, and comorbidity data, were extracted. An eXtreme Gradient Boosting (XGBoost) and logistic regression were used to design a nomogram predicting UIP or not. Area under the receiver operating characteristic curve (AUC) and Cox's regression for all-cause mortality were used to assess the diagnostic performance and prognostic value of models, respectively. 5213 HRCT image datasets were divided into the training group (n=3639), the internal testing group (n=785), and the external validation group (n=789). UIP prevalence was 43.7% across the whole dataset, with 42.7% and 41.3% for the internal validation set and external validation set. The radiomics-based classifier had an AUC of 0.790 in the internal testing set and 0.786 for the external validation dataset. Integrating multimodal data improved AUCs to 0.802 and 0.794, respectively. The performance of the integration model was comparable to pulmonologist with over 10 years of experience in ILD. Within 522 patients deceased during a median follow-up period of 3.37 years, the multimodal-based ML model-predicted UIP status was associated with high all-cause mortality risk (hazard ratio: 2.52, p<0.001). The classifier combining radiomics and clinical features showed strong performance across varied UIP prevalence. This multimodal-based ML model could serve as an adjunct in the diagnosis of UIP.

CT Classification Chest Prospective Clinical Pilot Academic Lab

Filter Papers

Tags

Data Scaling Laws for Radiology Foundation Models

Neural Collapse-Inspired Multi-Label Federated Learning under Label-Distribution Skew

The HeartMagic prospective observational study protocol - characterizing subtypes of heart failure with preserved ejection fraction

AI-powered insights in pediatric nephrology: current applications and future opportunities.

Role of Artificial Intelligence in Lung Transplantation: Current State, Challenges, and Future Directions.

Predicting cardiovascular events from routine mammograms using machine learning.

Cross-modality transformer model leveraging DCE-MRI and pathological images for predicting pathological complete response and lymph node metastasis in breast cancer.

Diagnostic Performance of Large Language Models in Multimodal Analysis of Radiolucent Jaw Lesions.

Multi-filter stacking in inception V3 for enhanced Alzheimer's severity classification.

Developing and Validation of a Multimodal-based Machine Learning Model for Diagnosis of Usual Interstitial Pneumonia: a Prospective Multicenter Study.

Ready to Sharpen Your Edge?