Latest Papers on Radiology AI. Tags: Classification

Preoperative radiomics models using CT and MRI for microsatellite instability in colorectal cancer: a systematic review and meta-analysis.

Capello Ingold G, Martins da Fonseca J, Kolenda Zloić S, Verdan Moreira S, Kago Marole K, Finnegan E, Yoshikawa MH, Daugėlaitė S, Souza E Silva TX, Soato Ratti MA

•papers•May 10 2025

Microsatellite instability (MSI) is a novel predictive biomarker for chemotherapy and immunotherapy response, as well as prognostic indicator in colorectal cancer (CRC). The current standard for MSI identification is polymerase chain reaction (PCR) testing or the immunohistochemical analysis of tumor biopsy samples. However, tumor heterogeneity and procedure complications pose challenges to these techniques. CT and MRI-based radiomics models offer a promising non-invasive approach for this purpose. A systematic search of PubMed, Embase, Cochrane Library and Scopus was conducted to identify studies evaluating the diagnostic performance of CT and MRI-based radiomics models for detecting MSI status in CRC. Pooled area under the curve (AUC), sensitivity, and specificity were calculated in RStudio using a random-effects model. Forest plots and a summary ROC curve were generated. Heterogeneity was assessed using I² statistics and explored through sensitivity analyses, threshold effect assessment, subgroup analyses and meta-regression. 17 studies with a total of 6,045 subjects were included in the analysis. All studies extracted radiomic features from CT or MRI images of CRC patients with confirmed MSI status to train machine learning models. The pooled AUC was 0.815 (95% CI: 0.784-0.840) for CT-based studies and 0.900 (95% CI: 0.819-0.943) for MRI-based studies. Significant heterogeneity was identified and addressed through extensive analysis. Radiomics models represent a novel and promising tool for predicting MSI status in CRC patients. These findings may serve as a foundation for future studies aimed at developing and validating improved models, ultimately enhancing the diagnosis, treatment, and prognosis of colorectal cancer.

Mixed Modality Classification Abdominal Meta Analysis In Silico Academic Lab

Machine learning approaches for classifying major depressive disorder using biological and neuropsychological markers: A meta-analysis.

Zhang L, Jian L, Long Y, Ren Z, Calhoun VD, Passos IC, Tian X, Xiang Y

•papers•May 10 2025

Traditional diagnostic methods for major depressive disorder (MDD), which rely on subjective assessments, may compromise diagnostic accuracy. In contrast, machine learning models have the potential to classify and diagnose MDD more effectively, reducing the risk of misdiagnosis associated with conventional methods. The aim of this meta-analysis is to evaluate the overall classification accuracy of machine learning models in MDD and examine the effects of machine learning algorithms, biomarkers, diagnostic comparison groups, validation procedures, and participant age on classification performance. As of September 2024, a total of 176 studies were ultimately included in the meta-analysis, encompassing a total of 60,926 participants. A random-effects model was applied to analyze the extracted data, resulting in an overall classification accuracy of 0.825 (95% CI [0.810; 0.839]). Convolutional neural networks significantly outperformed support vector machines (SVM) when using electroencephalography and magnetoencephalography data. Additionally, SVM demonstrated significantly better performance with functional magnetic resonance imaging data compared to graph neural networks and gaussian process classification. The sample size was negatively correlated to classification accuracy. Furthermore, evidence of publication bias was also detected. Therefore, while this study indicates that machine learning models show high accuracy in distinguishing MDD from healthy controls and other psychiatric disorders, further research is required before these findings can be generalized to large-scale clinical practice.

Mixed Modality Classification Neurological Meta Analysis In Silico Academic Lab

Radiomics prediction of surgery in ulcerative colitis refractory to medical treatment.

Sakamoto K, Okabayashi K, Seishima R, Shigeta K, Kiyohara H, Mikami Y, Kanai T, Kitagawa Y

•papers•May 10 2025

The surgeries in drug-resistant ulcerative colitis are determined by complex factors. This study evaluated the predictive performance of radiomics analysis on the basis of whether patients with ulcerative colitis in hospital were in the surgical or medical treatment group by discharge from hospital. This single-center retrospective cohort study used CT at admission of patients with US admitted from 2015 to 2022. The target of prediction was whether the patient would undergo surgery by the time of discharge. Radiomics features were extracted using the rectal wall at the level of the tailbone tip of the CT as the region of interest. CT data were randomly classified into a training cohort and a validation cohort, and LASSO regression was performed using the training cohort to create a formula for calculating the radiomics score. A total of 147 patients were selected, and data from 184 CT scans were collected. Data from 157 CT scans matched the selection criteria and were included. Five features were used for the radiomics score. Univariate logistic regression analysis of clinical information detected a significant influence of severity (p < 0.001), number of drugs used until surgery (p < 0.001), Lichtiger score (p = 0.024), and hemoglobin (p = 0.010). Using a nomogram combining these items, we found that the discriminatory power in the surgery and medical treatment groups was AUC 0.822 (95% confidence interval (CI) 0.841-0.951) for the training cohort and AUC 0.868 (95% CI 0.729-1.000) for the validation cohort, indicating a good ability to discriminate the outcomes. Radiomics analysis of CT images of patients with US at the time of admission, combined with clinical data, showed high predictive ability regarding a treatment strategy of surgery or medical treatment.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

A novel framework for esophageal cancer grading: combining CT imaging, radiomics, reproducibility, and deep learning insights.

Alsallal M, Ahmed HH, Kareem RA, Yadav A, Ganesan S, Shankhyan A, Gupta S, Joshi KK, Sameer HN, Yaseen A, Athab ZH, Adil M, Farhood B

•papers•May 10 2025

This study aims to create a reliable framework for grading esophageal cancer. The framework combines feature extraction, deep learning with attention mechanisms, and radiomics to ensure accuracy, interpretability, and practical use in tumor analysis. This retrospective study used data from 2,560 esophageal cancer patients across multiple clinical centers, collected from 2018 to 2023. The dataset included CT scan images and clinical information, representing a variety of cancer grades and types. Standardized CT imaging protocols were followed, and experienced radiologists manually segmented the tumor regions. Only high-quality data were used in the study. A total of 215 radiomic features were extracted using the SERA platform. The study used two deep learning models-DenseNet121 and EfficientNet-B0-enhanced with attention mechanisms to improve accuracy. A combined classification approach used both radiomic and deep learning features, and machine learning models like Random Forest, XGBoost, and CatBoost were applied. These models were validated with strict training and testing procedures to ensure effective cancer grading. This study analyzed the reliability and performance of radiomic and deep learning features for grading esophageal cancer. Radiomic features were classified into four reliability levels based on their ICC (Intraclass Correlation) values. Most of the features had excellent (ICC > 0.90) or good (0.75 < ICC ≤ 0.90) reliability. Deep learning features extracted from DenseNet121 and EfficientNet-B0 were also categorized, and some of them showed poor reliability. The machine learning models, including XGBoost and CatBoost, were tested for their ability to grade cancer. XGBoost with Recursive Feature Elimination (RFE) gave the best results for radiomic features, with an AUC (Area Under the Curve) of 91.36%. For deep learning features, XGBoost with Principal Component Analysis (PCA) gave the best results using DenseNet121, while CatBoost with RFE performed best with EfficientNet-B0, achieving an AUC of 94.20%. Combining radiomic and deep features led to significant improvements, with XGBoost achieving the highest AUC of 96.70%, accuracy of 96.71%, and sensitivity of 95.44%. The combination of both DenseNet121 and EfficientNet-B0 models in ensemble models achieved the best overall performance, with an AUC of 95.14% and accuracy of 94.88%. This study improves esophageal cancer grading by combining radiomics and deep learning. It enhances diagnostic accuracy, reproducibility, and interpretability, while also helping in personalized treatment planning through better tumor characterization. Not applicable.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab Reproducibility

Evaluating an information theoretic approach for selecting multimodal data fusion methods.

Zhang T, Ding R, Luong KD, Hsu W

•papers•May 10 2025

Interest has grown in combining radiology, pathology, genomic, and clinical data to improve the accuracy of diagnostic and prognostic predictions toward precision health. However, most existing works choose their datasets and modeling approaches empirically and in an ad hoc manner. A prior study proposed four partial information decomposition (PID)-based metrics to provide a theoretical understanding of multimodal data interactions: redundancy, uniqueness of each modality, and synergy. However, these metrics have only been evaluated in a limited collection of biomedical data, and the existing work does not elucidate the effect of parameter selection when calculating the PID metrics. In this work, we evaluate PID metrics on a wider range of biomedical data, including clinical, radiology, pathology, and genomic data, and propose potential improvements to the PID metrics. We apply the PID metrics to seven different modality pairs across four distinct cohorts (datasets). We compare and interpret trends in the resulting PID metrics and downstream model performance in these multimodal cohorts. The downstream tasks being evaluated include predicting the prognosis (either overall survival or recurrence) of patients with non-small cell lung cancer, prostate cancer, and glioblastoma. We found that, while PID metrics are informative, solely relying on these metrics to decide on a fusion approach does not always yield a machine learning model with optimal performance. Of the seven different modality pairs, three had poor (0%), three had moderate (66%-89%), and only one had perfect (100%) consistency between the PID values and model performance. We propose two improvements to the PID metrics (determining the optimal parameters and uncertainty estimation) and identified areas where PID metrics could be further improved. The current PID metrics are not accurate enough for estimating the multimodal data interactions and need to be improved before they can serve as a reliable tool. We propose improvements and provide suggestions for future work. Code: https://github.com/zhtyolivia/pid-multimodal.

Mixed Modality Classification Methodology In Silico Academic Lab Open Code

Performance of fully automated deep-learning-based coronary artery calcium scoring in ECG-gated calcium CT and non-gated low-dose chest CT.

Kim S, Park EA, Ahn C, Jeong B, Lee YS, Lee W, Kim JH

•papers•May 10 2025

This study aimed to validate the agreement and diagnostic performance of a deep-learning-based coronary artery calcium scoring (DL-CACS) system for ECG-gated and non-gated low-dose chest CT (LDCT) across multivendor datasets. In this retrospective study, datasets from Seoul National University Hospital (SNUH, 652 paired ECG-gated and non-gated CT scans) and the Stanford public dataset (425 ECG-gated and 199 non-gated CT scans) were analyzed. Agreement metrics included intraclass correlation coefficient (ICC), coefficient of determination (R²), and categorical agreement (κ). Diagnostic performance was assessed using categorical accuracy and the area under the receiver operating characteristic curve (AUROC). DL-CACS demonstrated excellent performance for ECG-gated CT in both datasets (SNUH: R² = 0.995, ICC = 0.997, κ = 0.97, AUROC = 0.99; Stanford: R² = 0.989, ICC = 0.990, κ = 0.97, AUROC = 0.99). For non-gated CT using manual LDCT CAC scores as a reference, performance was similarly high (R² = 0.988, ICC = 0.994, κ = 0.96, AUROC = 0.98-0.99). When using ECG-gated CT scores as the reference, performance for non-gated CT was slightly lower but remained robust (SNUH: R² = 0.948, ICC = 0.968, κ = 0.88, AUROC = 0.98-0.99; Stanford: R² = 0.949, ICC = 0.948, κ = 0.71, AUROC = 0.89-0.98). DL-CACS provides a reliable and automated solution for CACS, potentially reducing workload while maintaining robust performance in both ECG-gated and non-gated CT settings. Question How accurate and reliable is deep-learning-based coronary artery calcium scoring (DL-CACS) in ECG-gated CT and non-gated low-dose chest CT (LDCT) across multivendor datasets? Findings DL-CACS showed near-perfect performance for ECG-gated CT. For non-gated LDCT, performance was excellent using manual scores as the reference and lower but reliable when using ECG-gated CT scores. Clinical relevance DL-CACS provides a reliable and automated solution for CACS, potentially reducing workload and improving diagnostic workflow. It supports cardiovascular risk stratification and broader clinical adoption, especially in settings where ECG-gated CT is unavailable.

CT Classification Cardiac Retrospective Clinical In Silico Academic Lab

Reproducing and Improving CheXNet: Deep Learning for Chest X-ray Disease Classification

Daniel Strick, Carlos Garcia, Anthony Huang

•preprint•May 10 2025

Deep learning for radiologic image analysis is a rapidly growing field in biomedical research and is likely to become a standard practice in modern medicine. On the publicly available NIH ChestX-ray14 dataset, containing X-ray images that are classified by the presence or absence of 14 different diseases, we reproduced an algorithm known as CheXNet, as well as explored other algorithms that outperform CheXNet's baseline metrics. Model performance was primarily evaluated using the F1 score and AUC-ROC, both of which are critical metrics for imbalanced, multi-label classification tasks in medical imaging. The best model achieved an average AUC-ROC score of 0.85 and an average F1 score of 0.39 across all 14 disease classifications present in the dataset.

X-Ray Classification Chest Methodology In Silico Benchmark SOTA Reproducibility Open Dataset

Shortcut learning leads to sex bias in deep learning models for photoacoustic tomography.

Knopp M, Bender CJ, Holzwarth N, Li Y, Kempf J, Caranovic M, Knieling F, Lang W, Rother U, Seitel A, Maier-Hein L, Dreher KK

•papers•May 9 2025

Shortcut learning has been identified as a source of algorithmic unfairness in medical imaging artificial intelligence (AI), but its impact on photoacoustic tomography (PAT), particularly concerning sex bias, remains underexplored. This study investigates this issue using peripheral artery disease (PAD) diagnosis as a specific clinical application. To examine the potential for sex bias due to shortcut learning in convolutional neural network (CNNs) and assess how such biases might affect diagnostic predictions, we created training and test datasets with varying PAD prevalence between sexes. Using these datasets, we explored (1) whether CNNs can classify the sex from imaging data, (2) how sex-specific prevalence shifts impact PAD diagnosis performance and underdiagnosis disparity between sexes, and (3) how similarly CNNs encode sex and PAD features. Our study with 147 individuals demonstrates that CNNs can classify the sex from calf muscle PAT images, achieving an AUROC of 0.75. For PAD diagnosis, models trained on data with imbalanced sex-specific disease prevalence experienced significant performance drops (up to 0.21 AUROC) when applied to balanced test sets. Additionally, greater imbalances in sex-specific prevalence within the training data exacerbated underdiagnosis disparities between sexes. Finally, we identify evidence of shortcut learning by demonstrating the effective reuse of learned feature representations between PAD diagnosis and sex classification tasks. CNN-based models trained on PAT data may engage in shortcut learning by leveraging sex-related features, leading to biased and unreliable diagnostic predictions. Addressing demographic-specific prevalence imbalances and preventing shortcut learning is critical for developing models in the medical field that are both accurate and equitable across diverse patient populations.

OCT Classification Vascular Retrospective Clinical In Silico Academic Lab Ethics

Robust & Precise Knowledge Distillation-based Novel Context-Aware Predictor for Disease Detection in Brain and Gastrointestinal

Saif Ur Rehman Khan, Muhammad Nabeel Asim, Sebastian Vollmer, Andreas Dengel

•preprint•May 9 2025

Medical disease prediction, particularly through imaging, remains a challenging task due to the complexity and variability of medical data, including noise, ambiguity, and differing image quality. Recent deep learning models, including Knowledge Distillation (KD) methods, have shown promising results in brain tumor image identification but still face limitations in handling uncertainty and generalizing across diverse medical conditions. Traditional KD methods often rely on a context-unaware temperature parameter to soften teacher model predictions, which does not adapt effectively to varying uncertainty levels present in medical images. To address this issue, we propose a novel framework that integrates Ant Colony Optimization (ACO) for optimal teacher-student model selection and a novel context-aware predictor approach for temperature scaling. The proposed context-aware framework adjusts the temperature based on factors such as image quality, disease complexity, and teacher model confidence, allowing for more robust knowledge transfer. Additionally, ACO efficiently selects the most appropriate teacher-student model pair from a set of pre-trained models, outperforming current optimization methods by exploring a broader solution space and better handling complex, non-linear relationships within the data. The proposed framework is evaluated using three publicly available benchmark datasets, each corresponding to a distinct medical imaging task. The results demonstrate that the proposed framework significantly outperforms current state-of-the-art methods, achieving top accuracy rates: 98.01% on the MRI brain tumor (Kaggle) dataset, 92.81% on the Figshare MRI dataset, and 96.20% on the GastroNet dataset. This enhanced performance is further evidenced by the improved results, surpassing existing benchmarks of 97.24% (Kaggle), 91.43% (Figshare), and 95.00% (GastroNet).

MRI Classification Neurological Methodology In Silico Benchmark SOTA

Multiparameter MRI-based model integrating radiomics and deep learning for preoperative staging of laryngeal squamous cell carcinoma.

Xie K, Jiang H, Chen X, Ning Y, Yu Q, Lv F, Liu R, Zhou Y, Xu L, Yue Q, Peng J

•papers•May 9 2025

The accurate preoperative staging of laryngeal squamous cell carcinoma (LSCC) provides valuable guidance for clinical decision-making. The objective of this study was to establish a multiparametric MRI model using radiomics and deep learning (DL) to preoperatively distinguish between Stages I-II and III-IV of LSCC. Data from 401 histologically confirmed LSCC patients were collected from two centers (training set: 213; internal test set: 91; external test set: 97). Radiomics features were extracted from the MRI images, and seven radiomics models based on single and combined sequences were developed via random forest (RF). A DL model was constructed via ResNet 18, where DL features were extracted from its final fully connected layer. These features were fused with crucial radiomics features to create a combined model. The performance of the models was assessed using the area under the receiver operating characteristic (ROC) curve (AUC) and compared with the radiologist performances. The predictive capability of the combined model for Progression-Free Survival (PFS) was evaluated via Kaplan-Meier survival analysis and the Harrell's Concordance Index (C-index). In the external test set, the combined model had an AUC of 0.877 (95% CI 0.807-0.946), outperforming the DL model (AUC: 0.811) and the optimal radiomics model (AUC: 0.835). The combined model significantly outperformed both the DL (p = 0.017) and the optimal radiomics models (p = 0.039), and the radiologists (both p < 0.050). Moreover, the combined model demonstrated great prognostic predictive value in patients with LSCC, achieving a C-index of 0.624 for PFS. This combined model enhances preoperative LSCC staging, aiding in making more informed clinical decisions.

MRI Classification Retrospective Clinical In Silico Academic Lab

Filter Papers

Tags

Preoperative radiomics models using CT and MRI for microsatellite instability in colorectal cancer: a systematic review and meta-analysis.

Machine learning approaches for classifying major depressive disorder using biological and neuropsychological markers: A meta-analysis.

Radiomics prediction of surgery in ulcerative colitis refractory to medical treatment.

A novel framework for esophageal cancer grading: combining CT imaging, radiomics, reproducibility, and deep learning insights.

Evaluating an information theoretic approach for selecting multimodal data fusion methods.

Performance of fully automated deep-learning-based coronary artery calcium scoring in ECG-gated calcium CT and non-gated low-dose chest CT.

Reproducing and Improving CheXNet: Deep Learning for Chest X-ray Disease Classification

Shortcut learning leads to sex bias in deep learning models for photoacoustic tomography.

Robust & Precise Knowledge Distillation-based Novel Context-Aware Predictor for Disease Detection in Brain and Gastrointestinal

Multiparameter MRI-based model integrating radiomics and deep learning for preoperative staging of laryngeal squamous cell carcinoma.

Ready to Sharpen Your Edge?