Latest Papers on Radiology AI. Tags: Classification

Automated and Interpretable Survival Analysis from Multimodal Data

Mafalda Malafaia, Peter A. N. Bosman, Coen Rasch, Tanja Alderliesten

•preprint•Sep 25 2025

Accurate and interpretable survival analysis remains a core challenge in oncology. With growing multimodal data and the clinical need for transparent models to support validation and trust, this challenge increases in complexity. We propose an interpretable multimodal AI framework to automate survival analysis by integrating clinical variables and computed tomography imaging. Our MultiFIX-based framework uses deep learning to infer survival-relevant features that are further explained: imaging features are interpreted via Grad-CAM, while clinical variables are modeled as symbolic expressions through genetic programming. Risk estimation employs a transparent Cox regression, enabling stratification into groups with distinct survival outcomes. Using the open-source RADCURE dataset for head and neck cancer, MultiFIX achieves a C-index of 0.838 (prediction) and 0.826 (stratification), outperforming the clinical and academic baseline approaches and aligning with known prognostic markers. These results highlight the promise of interpretable multimodal AI for precision oncology with MultiFIX.

CT Classification Methodology In Silico Academic Lab Benchmark SOTA Open Dataset

Machine Learning-Based Classification of White Matter Functional Changes in Stroke Patients Using Resting-State fMRI.

Liu LH, Wang CX, Huang X, Chen RB

•papers•Sep 25 2025

Neuroimaging studies of brain function are important research methods widely applied to stroke patients. Currently, a large number of studies have focused on functional imaging of the gray matter cortex. Relevant research indicates that certain areas of the gray matter cortex in stroke patients exhibit abnormal brain activity during resting state. However, studies on brain function based on white matter remain insufficient. The changes in functional connectivity caused by stroke in white matter, as well as the repair or compensation mechanisms of white matter function after stroke, are still unclear. The aim of this study is to investigate and demonstrate the changes in brain functional connectivity activity in the white matter region of stroke patients. Revealing the recombination characteristics of white matter functional networks after stroke, providing potential biomarkers for rehabilitation therapy Provide new clinical insights for the rehabilitation and treatment of stroke patients. We recruited 36 stroke patients and 36 healthy controls for resting-state functional magnetic resonance imaging (rs-fMRI). Regional Homogeneity (ReHo) and Degree Centrality (DC), which are sensitive to white matter functional abnormalities, were selected as feature vectors. ReHo reflects local neuronal synchrony, while DC quantifies global network hub properties. The combination of both effectively characterizes functional changes in white matter. ReHo evaluates the functional consistency of different white matter regions by calculating the activity similarity between adjacent brain regions. Additionally, DC analysis of white matter was used to investigate the connectivity patterns and organizational principles of functional networks between white matter regions. This was achieved by calculating the number of connections in each brain region to identify changes in neural activation of white matter regions that significantly impact the brain network. Furthermore, ReHo and DC metrics were used as feature vectors for classification using machine learning algorithms. The results indicated significant differences in white matter DC and ReHo values between stroke patients and healthy controls. In the two-sample t-test analysis of white matter DC, stroke patients showed a significant reduction in DC values in the corpus callosum genu (GCC), corpus callosum body (BCC), and left anterior corona radiata (ACRL) regions (GCC: 0.143 vs. 1.024; BCC: 0.238 vs. 1.143; ACRL: 0.143 vs. 0.821, p < 0.001). However, an increase in DC values was observed in the left superior longitudinal fasciculus (SLF_L) region (1.190 vs. 0.190, p < 0.001). In the two-sample t-test analysis of white matter ReHo, stroke patients exhibited a decrease in ReHo values in the GCC and BCC regions (GCC: 0.859 vs. 1.375; BCC: 1.156 vs. 1.687, p < 0.001), indicating values lower than those in the healthy control group. Using leave-one-out cross-validation (LOOCV) to evaluate the white matter DC and ReHo feature values through SVM classification models for stroke patients and healthy controls, the DC classification AUC was 0.89, and the ReHo classification AUC reached 0.98. These results suggest that the features possess validity and discriminative power. These findings suggest alterations in functional connectivity in specific white matter regions following stroke. Specifically, we observed a weakening of functional connectivity in the genu of the corpus callosum (GCC), the body of the corpus callosum (BCC), and the left anterior corona radiata (ACR_L) regions, while compensatory functional connectivity was enhanced in the left superior longitudinal fasciculus (SLF_L) region. These findings reveal the reorganization characteristics of white matter functional networks after stroke, which may provide potential biomarkers for the rehabilitation treatment of stroke patients and offer new clinical insights for their rehabilitation and treatment.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Robust Disease Prognosis via Diagnostic Knowledge Preservation: A Sequential Learning Approach

Rajamohan, H. R., Xu, Y., Zhu, W., Kijowski, R., Cho, K., Geras, K., Razavian, N., Deniz, C. M.

•preprint•Sep 25 2025

Accurate disease prognosis is essential for patient care but is often hindered by the lack of long-term data. This study explores deep learning training strategies that utilize large, accessible diagnostic datasets to pretrain models aimed at predicting future disease progression in knee osteoarthritis (OA), Alzheimers disease (AD), and breast cancer (BC). While diagnostic pretraining improves prognostic task performance, naive fine-tuning for prognosis can cause catastrophic forgetting, where the models original diagnostic accuracy degrades, a significant patient safety concern in real-world settings. To address this, we propose a sequential learning strategy with experience replay. We used cohorts with knee radiographs, brain MRIs, and digital mammograms to predict 4-year structural worsening in OA, 2-year cognitive decline in AD, and 5-year cancer diagnosis in BC. Our results showed that diagnostic pretraining on larger datasets improved prognosis model performance compared to standard baselines, boosting both the Area Under the Receiver Operating Characteristic curve (AUROC) (e.g., Knee OA external: 0.77 vs 0.747; Breast Cancer: 0.874 vs 0.848) and the Area Under the Precision-Recall Curve (AUPRC) (e.g., Alzheimers Disease: 0.752 vs 0.683). Additionally, a sequential learning approach with experience replay achieved prognostic performance comparable to dedicated single-task models (e.g., Breast Cancer AUROC 0.876 vs 0.874) while also preserving diagnostic ability. This method maintained high diagnostic accuracy (e.g., Breast Cancer Balanced Accuracy 50.4% vs 50.9% for a dedicated diagnostic model), unlike simpler multitask methods prone to catastrophic forgetting (e.g., 37.7%). Our findings show that leveraging large diagnostic datasets is a reliable and data-efficient way to enhance prognostic models while maintaining essential diagnostic skills. Author SummaryIn our research, we addressed a common problem in medical AI: how to accurately predict the future course of a disease when long-term patient data is rare. We focused on knee osteoarthritis, Alzheimers disease, and breast cancer. We found that we could significantly improve a models ability to predict disease progression by first training it on a much larger, more common type of data - diagnostic images used to assess a patients current disease state. We then developed a specialized training method that allows a single AI model to perform both diagnosis and prognosis tasks effectively. A key challenge is that models often "forget" their original diagnostic skills when they learn a new prognostic task. In a clinical setting, this poses a safety risk, as it could lead to missed diagnoses. We utilize experience replay to overcome this by continually refreshing the models diagnostic knowledge. This creates a more robust and efficient model that mirrors a clinicians workflow, offering the potential to improve patient care with limited amount of hard-to-get longitudinal data.

Mixed Modality Classification Methodology In Silico Academic Lab Reproducibility

A radiomics nomogram utilizing T2-weighted MRI for accurate diagnosis of rectocele.

Lai W, Wang G, Zhao Z

•papers•Sep 25 2025

Rectocele (RC) is a common pelvic organ prolapse (POP) that can cause obstructed defecation and reduced quality of life. Magnetic resonance defecography (MRD) offers high-resolution, radiation-free visualization of pelvic floor anatomy but relies on time-consuming, observer-dependent manual measurements. Our research constructs a nomogram model incorporating intra-ROI and habitat radiomics features to improve MRD-based RC diagnosis. We retrospectively analyzed 222 MRD patients (155 training, 67 testing). Clinical features were selected via univariate and multivariate logistic regression. The least absolute shrinkage and selection operator (LASSO) algorithm was applied, and features with non-zero coefficients were retained to construct the radiomics signatures. A support vector machine (SVM) learning algorithm was used to construct the intra-ROI combined with the habitat radiomics model. Clinical features were then combined with radiomics features using a multivariable logistic regression algorithm to generate a clinical-radiomics nomogram. Model performance was assessed using receiver operating characteristic curve (ROC) and decision curve analysis (DCA). The combined intra-ROI and habitat radiomics model outperformed intra-ROI or habitat radiomics models alone, achieving areas under the curve (AUCs) of 0.913 (training) and 0.805 (testing). The nomogram integrating radiomics features and gender showed strong calibration and discrimination, with AUCs of 0.930 and 0.852 in the training and testing cohorts, respectively. Our findings suggest that integrating intra-ROI with habitat radiomics features can aid RC assessment. While the clinical-radiomics nomogram showed the highest internal performance, this single-center retrospective study lacks external validation and includes a relatively small test cohort. Therefore, risk of model overfitting cannot be excluded. Prospective, multi-center validation and larger cohorts are warranted before routine clinical deployment.

MRI Classification Abdominal Retrospective Clinical In Silico Academic Lab

A Deep Learning-Based EffConvNeXt Model for Automatic Classification of Cystic Bronchiectasis: An Explainable AI Approach.

Tekin V, Tekinhatun M, Özçelik STA, Fırat H, Üzen H

•papers•Sep 25 2025

Cystic bronchiectasis and pneumonia are respiratory conditions that significantly impact morbidity and mortality worldwide. Diagnosing these diseases accurately is crucial, as early detection can greatly improve patient outcomes. These diseases are respiratory conditions that present with overlapping features on chest X-rays (CXR), making accurate diagnosis challenging. Recent advancements in deep learning (DL) have improved diagnostic accuracy in medical imaging. This study proposes the EffConvNeXt model, a hybrid approach combining EfficientNetB1 and ConvNeXtTiny, designed to enhance classification accuracy for cystic bronchiectasis, pneumonia, and normal cases in CXRs. The model effectively balances EfficientNetB1's efficiency with ConvNeXtTiny's advanced feature extraction, allowing for better identification of complex patterns in CXR images. Additionally, the EffConvNeXt model combines EfficientNetB1 and ConvNeXtTiny, addressing limitations of each model individually: EfficientNetB1's SE blocks improve focus on critical image areas while keeping the model lightweight and fast, and ConvNeXtTiny enhances detection of subtle abnormalities, making the combined model highly effective for rapid and accurate CXR image analysis in clinical settings. For the performance analysis of the EffConvNeXt model, experimental studies were conducted using 5899 CXR images collected from Dicle University Medical Faculty. When used individually, ConvNeXtTiny achieved an accuracy rate of 97.12%, while EfficientNetB1 reached 97.79%. By combining both models, the EffConvNeXt raised the accuracy to 98.25%, showing a 0.46% improvement. With this result, the other tested DL models fell behind. These findings indicate that EffConvNeXt provides a reliable, automated solution for distinguishing cystic bronchiectasis and pneumonia, supporting clinical decision-making with enhanced diagnostic accuracy.

X-Ray Classification Chest Methodology In Silico Academic Lab

CNN-based prediction using early post-radiotherapy MRI as a proxy for toxicity in the murine head and neck.

Huynh BN, Kakar M, Zlygosteva O, Juvkam IS, Edin N, Tomic O, Futsaether CM, Malinen E

•papers•Sep 25 2025

Radiotherapy (RT) of head and neck cancer can cause severe toxicities. Early identification of individuals at risk could enable personalized treatment. This study evaluated whether convolutional neural networks (CNNs) applied to Magnetic Resonance (MR) images acquired early after irradiation can predict radiation-induced tissue changes associated with toxicity in mice. Patient/material and methods: Twenty-nine C57BL/6JRj mice were included (irradiated: n = 14; control: n = 15). Irradiated mice received 65 Gy of fractionated RT to the oral cavity, swallowing muscles and salivary glands. T2-weighted MR images were acquired 3-5 days post-irradiation. CNN models (VGG, MobileNet, ResNet, EfficientNet) were trained to classify sagittal slices as irradiated or control (n = 586 slices). Predicted class probabilities were correlated with five toxicity endpoints assessed 8-105 days post-irradiation. Model explainability was assessed with VarGrad heatmaps, to verify that predictions relied on clinically relevant image regions. The best-performing model (EfficientNet B3) achieved 83% slice-level accuracy (ACC) and correctly classified 28 of 29 mice. Higher predicted probabilities of the irradiated class were strongly associated with oral mucositis, dermatitis, reduced saliva production, late submandibular gland fibrosis and atrophy of salivary gland acinar cells. Explainability heatmaps confirmed that CNNs focused on irradiated regions. The high CNN classification ACC, the regions highlighted by the explainability analysis and the strong correlations between model predictions and toxicity suggest that CNNs, together with post-irradiation magnetic resonance imaging, may identify individuals at risk of developing toxicity.

MRI Classification Neurological Retrospective Clinical Phantom/Animal Academic Lab

Single-centre, prospective cohort to predict optimal individualised treatment response in multiple sclerosis (POINT-MS): a cohort profile.

Christensen R, Cruciani A, Al-Araji S, Bianchi A, Chard D, Fourali S, Hamed W, Hammam A, He A, Kanber B, Maccarrone D, Moccia M, Mohamud S, Nistri R, Passalis A, Pozzilli V, Prados Carrasco F, Samdanidou E, Song J, Wingrove J, Yam C, Yiannakas M, Thompson AJ, Toosy A, Hacohen Y, Barkhof F, Ciccarelli O

•papers•Sep 25 2025

Multiple sclerosis (MS) is a chronic neurological condition that affects approximately 150 000 people in the UK and presents a significant healthcare burden, including the high costs of disease-modifying treatments (DMTs). DMTs have substantially reduced the risk of relapse and moderately reduced disability progression. Patients exhibit a wide range of responses to available DMTs. The Predicting Optimal INdividualised Treatment response in MS (POINT-MS) cohort was established to predict the individual treatment response by integrating comprehensive clinical phenotyping with imaging, serum and genetic biomarkers of disease activity and progression. Here, we present the baseline characteristics of the cohort and provide an overview of the study design, laying the groundwork for future analyses. POINT-MS is a prospective, observational research cohort and biobank of 781 adult participants with a diagnosis of MS who consented to study enrolment on initiation of a DMT at the Queen Square MS Centre (National Hospital of Neurology and Neurosurgery, University College London Hospital NHS Trust, London) between 01/07/2019 and 31/07/2024. All patients were invited for clinical assessments, including the expanded disability status scale (EDSS) score, brief international cognitive assessment for MS and various patient-reported outcome measures (PROMs). They additionally underwent MRI at 3T, optical coherence tomography and blood tests (for genotyping and serum biomarkers quantification), at baseline (i.e., within 3 months from commencing a DMT), and between 6-12 (re-baseline), 18-24, 30-36, 42-48 and 54-60 months after DMT initiation. 748 participants provided baseline data. They were mostly female (68%) and White (75%) participants, with relapsing-remitting MS (94.3%), and with an average age of 40.8 (±10.9) years and a mean disease duration of 7.9 (±7.4) years since symptom onset. Despite low disability (median EDSS 2.0), cognitive impairment was observed in 40% of participants. Most patients (98.4%) had at least one comorbidity. At study entry, 59.2% were treatment naïve, and 83.2% initiated a high-efficacy DMT. Most patients (76.4%) were in either full- or part-time employment. PROMs indicated heterogeneous impairments in physical and mental health, with a greater psychological than physical impact and with low levels of fatigue. When baseline MRI scans were compared with previous scans (available in 668 (89%) patients; mean time since last scan 9±8 months), 26% and 8.5% of patients had at least one new brain or spinal cord lesion at study entry, respectively. Patients showed a median volume of brain lesions of 6.14 cm<sup>3</sup>, with significant variability among patients (CI 1.1 to 34.1). When brain tissue volumes z-scores were obtained using healthy subjects (N=113, (mean age 42.3 (± 11.8) years, 61.9% female)) from a local MRI database, patients showed a slight reduction in the volumes of the whole grey matter (-0.16 (-0.22 to -0.09)), driven by the deep grey matter (-0.47 (-0.55 to -0.40)), and of the whole white matter (-0.18 (-0.28 to -0.09)), but normal cortical grey matter volumes (0.10 (0.05 to 0.15)). The mean upper cervical spinal cord cross-sectional area (CSA), as measured from volumetric brain scans, was 62.3 (SD 7.5) mm<sup>2</sup>. When CSA z-scores were obtained from the same healthy subjects used for brain measures, patients showed a slight reduction in CSA (-0.15 (-0.24 to -0.10)). Modelling with both standard statistics and machine learning approaches is currently planned to predict individualised treatment response by integrating all the demographic, socioeconomic, clinical data with imaging, genetic and serum biomarkers. The long-term output of this research is a stratification tool that will guide the selection of DMTs in clinical practice on the basis of the individual prognostic profile. We will complete long-term follow-up data in 4 years (January 2029). The biobank and MRI repository will be used for collaborative research on the mechanisms of disability in MS.

MRI Classification Neurological Prospective Concept Academic Lab

Multimodal text guided network for chest CT pneumonia classification.

Feng Y, Huang G, Ju F, Cui H

•papers•Sep 25 2025

Pneumonia is a prevalent and serious respiratory disease, responsible for a significant number of cases globally. With advancements in deep learning, the automatic diagnosis of pneumonia has attracted significant research attention in medical image classification. However, current methods still face several challenges. First, since lesions are often visible in only a few slices, slice-based classification algorithms may overlook critical spatial contextual information in CT sequences, and slice-level annotations are labor-intensive. Moreover, chest CT sequence-based pneumonia classification algorithms that rely solely on sequence-level coarse-grained labels remain limited, especially in integrating multi-modal information. To address these challenges, we propose a Multi-modal Text-Guided Network (MTGNet) for pneumonia classification using chest CT sequences. In this model, we design a sequential graph pooling network to encode the CT sequences by gradually selecting important slice features to obtain a sequence-level representation. Additionally, a CT description encoder is developed to learn representations from textual reports. To simulate the clinical diagnostic process, we employ multi-modal training and single-modal testing. A modal transfer module is proposed to generate simulated textual features from CT sequences. Cross-modal attention is then employed to fuse the sequence-level and simulated textual representations, thereby enhancing feature learning within the CT sequences by incorporating semantic information from textual descriptions. Furthermore, contrastive learning is applied to learn discriminative features by maximizing the similarity of positive sample pairs and minimizing the similarity of negative sample pairs. Extensive experiments on a self-constructed pneumonia CT sequences dataset demonstrate that the proposed model significantly improves classification performance.

CT Classification Chest Methodology In Silico GenAI

AI demonstrates comparable diagnostic performance to radiologists in MRI detection of anterior cruciate ligament tears: a systematic review and meta-analysis.

Gill SS, Haq T, Zhao Y, Ristic M, Amiras D, Gupte CM

•papers•Sep 25 2025

Anterior cruciate ligament (ACL) injuries are among the most common knee injuries, affecting 1 in 3500 people annually. With rising rates of ACL tears, particularly in children, timely diagnosis is critical. This study evaluates artificial intelligence (AI) effectiveness in diagnosing and classifying ACL tears on MRI through a systematic review and meta-analysis, comparing AI performance with clinicians and assessing radiomic and non-radiomic models. Major databases were searched for AI models diagnosing ACL tears via MRIs. 36 studies, representing 52 models, were included. Accuracy, sensitivity, and specificity metrics were extracted. Pooled estimates were calculated using a random-effects model. Subgroup analyses compared MRI sequences, ground truths, AI versus clinician performance, and radiomic versus non-radiomic models. This study was conducted in line with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocols. AI demonstrated strong diagnostic performance, with pooled accuracy, sensitivity, and specificity of 87.37%, 90.73%, and 91.34%, respectively. Classification models achieved pooled metrics of 90.46%, 88.68%, and 94.08%. Radiomic models outperformed non-radiomic models, and AI demonstrated comparable performance to clinicians in key metrics. Three-dimensional (3D) proton density fat suppression (PDFS) sequences with < 2 mm slice depth yielded the most promising results, despite small sample sizes, favouring arthroscopic benchmarks. Despite high heterogeneity (I² > 90%). AI models demonstrate diagnostic performance comparable to clinicians and may serve as valuable adjuncts in ACL tear detection, pending prospective validation. However, substantial heterogeneity and limited interpretability remain key challenges. Further research and standardised evaluation frameworks are needed to support clinical integration. Question Is AI effective and accurate in diagnosing and classifying anterior cruciate ligament (ACL) tears on MRI? Findings AI demonstrated high accuracy (87.37%), sensitivity (90.73%), and specificity (91.34%) in ACL tear diagnosis, matching or surpassing clinicians. Radiomic models outperformed non-radiomic approaches. Clinical relevance AI can enhance the accuracy of ACL tear diagnosis, reducing misdiagnoses and supporting clinicians, especially in resource-limited settings. Its integration into clinical workflows may streamline MRI interpretation, reduce diagnostic delays, and improve patient outcomes by optimising management.

MRI Classification Musculoskeletal Meta Analysis In Silico Benchmark SOTA

Mammo-CLIP Dissect: A Framework for Analysing Mammography Concepts in Vision-Language Models

Suaiba Amina Salahuddin, Teresa Dorszewski, Marit Almenning Martiniussen, Tone Hovda, Antonio Portaluri, Solveig Thrun, Michael Kampffmeyer, Elisabeth Wetzer, Kristoffer Wickstrøm, Robert Jenssen

•preprint•Sep 25 2025

Understanding what deep learning (DL) models learn is essential for the safe deployment of artificial intelligence (AI) in clinical settings. While previous work has focused on pixel-based explainability methods, less attention has been paid to the textual concepts learned by these models, which may better reflect the reasoning used by clinicians. We introduce Mammo-CLIP Dissect, the first concept-based explainability framework for systematically dissecting DL vision models trained for mammography. Leveraging a mammography-specific vision-language model (Mammo-CLIP) as a "dissector," our approach labels neurons at specified layers with human-interpretable textual concepts and quantifies their alignment to domain knowledge. Using Mammo-CLIP Dissect, we investigate three key questions: (1) how concept learning differs between DL vision models trained on general image datasets versus mammography-specific datasets; (2) how fine-tuning for downstream mammography tasks affects concept specialisation; and (3) which mammography-relevant concepts remain underrepresented. We show that models trained on mammography data capture more clinically relevant concepts and align more closely with radiologists' workflows than models not trained on mammography data. Fine-tuning for task-specific classification enhances the capture of certain concept categories (e.g., benign calcifications) but can reduce coverage of others (e.g., density-related features), indicating a trade-off between specialisation and generalisation. Our findings show that Mammo-CLIP Dissect provides insights into how convolutional neural networks (CNNs) capture mammography-specific knowledge. By comparing models across training data and fine-tuning regimes, we reveal how domain-specific training and task-specific adaptation shape concept learning. Code and concept set are available: https://github.com/Suaiba/Mammo-CLIP-Dissect.

Mammography Classification Breast Methodology In Silico Academic Lab Open Code

Filter Papers

Tags

Automated and Interpretable Survival Analysis from Multimodal Data

Machine Learning-Based Classification of White Matter Functional Changes in Stroke Patients Using Resting-State fMRI.

Robust Disease Prognosis via Diagnostic Knowledge Preservation: A Sequential Learning Approach

A radiomics nomogram utilizing T2-weighted MRI for accurate diagnosis of rectocele.

A Deep Learning-Based EffConvNeXt Model for Automatic Classification of Cystic Bronchiectasis: An Explainable AI Approach.

CNN-based prediction using early post-radiotherapy MRI as a proxy for toxicity in the murine head and neck.

Single-centre, prospective cohort to predict optimal individualised treatment response in multiple sclerosis (POINT-MS): a cohort profile.

Multimodal text guided network for chest CT pneumonia classification.

AI demonstrates comparable diagnostic performance to radiologists in MRI detection of anterior cruciate ligament tears: a systematic review and meta-analysis.

Mammo-CLIP Dissect: A Framework for Analysing Mammography Concepts in Vision-Language Models

Ready to Sharpen Your Edge?