Latest Papers on Radiology AI. Tags: Classification, Order: Best Match, Limit: 10.

Identifying Primary Sites of Spinal Metastases: Expert-Derived Features vs. ResNet50 Model Using Nonenhanced MRI.

Liu K, Ning J, Qin S, Xu J, Hao D, Lang N

•papers•Jul 1 2025

The spinal column is a frequent site for metastases, affecting over 30% of solid tumor patients. Identifying the primary tumor is essential for guiding clinical decisions but often requires resource-intensive diagnostics. To develop and validate artificial intelligence (AI) models using noncontrast MRI to identify primary sites of spinal metastases, aiming to enhance diagnostic efficiency. Retrospective. A total of 514 patients with pathologically confirmed spinal metastases (mean age, 59.3 ± 11.2 years; 294 males) were included, split into a development set (360) and a test set (154). Noncontrast sagittal MRI sequences (T1-weighted, T2-weighted, and fat-suppressed T2) were acquired using 1.5 T and 3 T scanners. Two models were evaluated for identifying primary sites of spinal metastases: the expert-derived features (EDF) model using radiologist-identified imaging features and a ResNet50-based deep learning (DL) model trained on noncontrast MRI. Performance was assessed using accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve (ROC-AUC) for top-1, top-2, and top-3 indicators. Statistical analyses included Shapiro-Wilk, t tests, Mann-Whitney U test, and chi-squared tests. ROC-AUCs were compared via DeLong tests, with 95% confidence intervals from 1000 bootstrap replications and significance at P < 0.05. The EDF model outperformed the DL model in top-3 accuracy (0.88 vs. 0.69) and AUC (0.80 vs. 0.71). Subgroup analysis showed superior EDF performance for common sites like lung and kidney (e.g., kidney F1: 0.94 vs. 0.76), while the DL model had higher recall for rare sites like thyroid (0.80 vs. 0.20). SHapley Additive exPlanations (SHAP) analysis identified sex (SHAP: -0.57 to 0.68), age (-0.48 to 0.98), T1WI signal intensity (-0.29 to 0.72), and pathological fractures (-0.76 to 0.25) as key features. AI techniques using noncontrast MRI improve diagnostic efficiency for spinal metastases. The EDF model outperformed the DL model, showing greater clinical potential. Spinal metastases, or cancer spreading to the spine, are common in patients with advanced cancer, often requiring extensive tests to determine the original tumor site. Our study explored whether artificial intelligence could make this process faster and more accurate using noncontrast MRI scans. We tested two methods: one based on radiologists' expertise in identifying imaging features and another using a deep learning model trained to analyze MRI images. The expert-based method was more reliable, correctly identifying the tumor site in 88% of cases when considering the top three likely diagnoses. This approach may help doctors reduce diagnostic time and improve patient care. 3 TECHNICAL EFFICACY: Stage 2.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

A multimodal deep-learning model based on multichannel CT radiomics for predicting pathological grade of bladder cancer.

Zhao T, He J, Zhang L, Li H, Duan Q

•papers•Jul 1 2025

To construct a predictive model using deep-learning radiomics and clinical risk factors for assessing the preoperative histopathological grade of bladder cancer according to computed tomography (CT) images. A retrospective analysis was conducted involving 201 bladder cancer patients with definite pathological grading results after surgical excision at the organization between January 2019 and June 2023. The cohort was classified into a test set of 81 cases and a training set of 120 cases. Hand-crafted radiomics (HCR) and features derived from deep-learning (DL) were obtained from computed tomography (CT) images. The research builds a prediction model using 12 machine-learning classifiers, which integrate HCR, DL features, and clinical data. Model performance was estimated utilizing decision-curve analysis (DCA), the area under the curve (AUC), and calibration curves. Among the classifiers tested, the logistic regression model that combined DL and HCR characteristics demonstrated the finest performance. The AUC values were 0.912 (training set) and 0.777 (test set). The AUC values of clinical model achieved 0.850 (training set) and 0.804 (test set). The AUC values of the combined model were 0.933 (training set) and 0.824 (test set), outperforming both the clinical and HCR-only models. The CT-based combined model demonstrated considerable diagnostic capability in differentiating high-grade from low-grade bladder cancer, serving as a valuable noninvasive instrument for preoperative pathological evaluation.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Does alignment alone predict mechanical complications after adult spinal deformity surgery? A machine learning comparison of alignment, bone quality, and soft tissue.

Sundrani S, Doss DJ, Johnson GW, Jain H, Zakieh O, Wegner AM, Lugo-Pico JG, Abtahi AM, Stephens BF, Zuckerman SL

•papers•Jul 1 2025

Mechanical complications are a vexing occurrence after adult spinal deformity (ASD) surgery. While achieving ideal spinal alignment in ASD surgery is critical, alignment alone may not fully explain all mechanical complications. The authors sought to determine which combination of inputs produced the most sensitive and specific machine learning model to predict mechanical complications using postoperative alignment, bone quality, and soft tissue data. A retrospective cohort study was performed in patients undergoing ASD surgery from 2009 to 2021. Inclusion criteria were a fusion ≥ 5 levels, sagittal/coronal deformity, and at least 2 years of follow-up. The primary exposure variables were 1) alignment, evaluated in both the sagittal and coronal planes using the L1-pelvic angle ± 3°, L4-S1 lordosis, sagittal vertical axis, pelvic tilt, and coronal vertical axis; 2) bone quality, evaluated by the T-score from a dual-energy x-ray absorptiometry scan; and 3) soft tissue, evaluated by the paraspinal muscle-to-vertebral body ratio and fatty infiltration. The primary outcome was mechanical complications. Alongside demographic data in each model, 7 machine learning models with all combinations of domains (alignment, bone quality, and soft tissue) were trained. The positive predictive value (PPV) was calculated for each model. Of 231 patients (24% male) undergoing ASD surgery with a mean age of 64 ± 17 years, 147 (64%) developed at least one mechanical complication. The model with alignment alone performed poorly, with a PPV of 0.85. However, the model with alignment, bone quality, and soft tissue achieved a high PPV of 0.90, sensitivity of 0.67, and specificity of 0.84. Moreover, the model with alignment alone failed to predict 15 complications of 100, whereas the model with all three domains only failed to predict 10 of 100. These results support the notion that not every mechanical failure is explained by alignment alone. The authors found that a combination of alignment, bone quality, and soft tissue provided the most accurate prediction of mechanical complications after ASD surgery. While achieving optimal alignment is essential, additional data including bone and soft tissue are necessary to minimize mechanical complications.

X-Ray Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Association between antithrombotic medications and intracranial hemorrhage among older patients with mild traumatic brain injury: a multicenter cohort study.

Benhamed A, Crombé A, Seux M, Frassin L, L'Huillier R, Mercier E, Émond M, Millon D, Desmeules F, Tazarourte K, Gorincour G

•papers•Jul 1 2025

To measure the association between antithrombotic (AT) medications (anticoagulant and antiplatelet) and risk for traumatic intracranial hemorrhage (ICH) in older adults with a mild traumatic brain injury (mTBI). We conducted a retrospective multicenter study across 103 emergency departments affiliated with a teleradiology company dedicated to emergency imaging between 2020 and 2022. Older adults (≥65 years old) with mTBI, with a head computed tomography scan, were included. Natural language processing models were used to label-free texts of emergency physician forms and radiology reports; and a multivariable logistic regression model to measure the association between AT medications and occurrence of ICH. A total of 5948 patients [median age 84.6 (74.3-89.1) years, 58.1% females] were included, of whom 781 (13.1%) had an ICH. Among them, 3177 (53.4%) patients were treated with at least one AT agent. No AT medication was associated with a higher risk for ICH: antiplatelet odds ratio 0.98 95% confidence interval (0.81-1.18), direct oral anticoagulant 0.82 (0.60-1.09), and vitamin K antagonist 0.66 (0.37-1.10). Conversely, a high-level fall [1.68 (1.15-2.4)], a Glasgow coma scale of 14 [1.83 (1.22-2.68)], a cutaneous head impact [1.5 (1.17-1.92)], vomiting [1.59 (1.18-2.14)], amnesia [1.35 (1.02-1.79)], a suspected skull vault fracture [9.3 (14.2-26.5)] or of facial bones fracture [1.34 (1.02-1.75)] were associated with a higher risk for ICH. This study found no association between AT medications and an increased risk of ICH among older patients with mTBI suggesting that routine neuroimaging in this population may offer limited benefit and that additional variables should be considered in the imaging decision.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab

Orbital CT deep learning models in thyroid eye disease rival medical specialists' performance in optic neuropathy prediction in a quaternary referral center and revealed impact of the bony walls.

Kheok SW, Hu G, Lee MH, Wong CP, Zheng K, Htoon HM, Lei Z, Tan ASM, Chan LL, Ooi BC, Seah LL

•papers•Jul 1 2025

To develop and evaluate orbital CT deep learning (DL) models in optic neuropathy (ON) prediction in patients diagnosed with thyroid eye disease (TED), using partial versus entire 2D versus 3D images for input. Patients with TED ±ON diagnosed at a quaternary-level practice and who underwent orbital CT between 2002 and 2017 were included. DL models were developed using annotated CT data. The DL models were used to evaluate the hold-out test set. ON classification performances were compared between models and medical specialists, and saliency maps applied to randomized cases. 36/252 orbits in 126 TED patients (mean age, 51 years; 81 women) had clinically confirmed ON. With 2D image input for ON prediction, our models achieved (a) sensitivity 89%, AUC 0.86 on entire coronal orbital apex including bony walls, and (b) specificity 92%, AUC 0.79 on partial axial lateral orbital wall only annotations. ON classification performance was similar (<i>p</i> = 0.58) between DL model and medical specialists. DL models trained on 2D CT annotations rival medical specialists in ON classification, with potential to objectively enhance clinical triage for sight-saving intervention and incorporate model variants in the workflow to harness differential performance metrics.

CT Classification Neurological Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Development and validation of a fusion model based on multi-phase contrast CT radiomics combined with clinical features for predicting Ki-67 expression in gastric cancer.

Song T, Xue B, Liu M, Chen L, Cao A, Du P

•papers•Jul 1 2025

The present study aimed to develop and validate a fusion model based on multi-phase contrast-enhanced computed tomography (CECT) radiomics features combined with clinical features to preoperatively predict the expression levels of Ki-67 in patients with gastric cancer (GC). A total of 164 patients with GC who underwent surgical treatment at our hospital between September 2015 and September 2023 were retrospectively included and were randomly divided into a training set (n=114) and a testing set (n=50). Using Pyradiomics, radiomics features were extracted from multi-phase CECT images and were combined with significant clinical features through various machine learning algorithms [support vector machine (SVM), random forest (RandomForest), K-nearest neighbors (KNN), LightGBM and XGBoost] to build a fusion model. Receiver operating characteristic, area under the curve (AUC), calibration curve and decision curve analysis (DCA) were used to evaluate, validate and compare the predictive performance and clinical utility of the model. Among the three single-phase models, for the arterial phase model, the SVM radiomics model had the highest AUC value in the training set, which was 0.697; and the RandomForest radiomics model had the highest AUC value in the testing set, which was 0.658. For the venous phase model, the SVM radiomics model had the highest AUC value in the training set, which was 0.783; and the LightGBM radiomics model had the highest AUC value in the testing set, which was 0.747. For the delayed phase model, the KNN radiomics model had the highest AUC value in the training set, which was 0.772; and the SVM radiomics model had the highest AUC in the testing set, which was 0.719. The clinical feature model had the lowest AUC values in both the training set and the testing set, which were 0.614 and 0.520, respectively. Notably, the multi-phase model and the fusion model, which were constructed by combining the clinical features and the multi-phase features, demonstrated excellent discriminative performance, with the fusion model achieving AUC values of 0.933 and 0.817 in the training and testing sets, thus outperforming other models (DeLong test, both P<0.05). The calibration curve showed that the fusion model had goodness of fit (Hosmer-Lemeshow test, >0.5 in the training and validation sets). The DCA showed that the net benefit of the fusion model in identifying high expression of Ki-67 was improved compared with that of other models. Furthermore, the fusion model achieved an AUC value of 0.805 in the external validation data from The Cancer Imaging Archive. In conclusion, the fusion model established in the present study was revealed to have excellent performance and is expected to serve as a non-invasive tool for predicting Ki-67 status and guiding clinical treatment.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

Identifying threshold of CT-defined muscle loss after radiotherapy for survival in oral cavity cancer using machine learning.

Lee J, Lin JB, Lin WC, Jan YT, Leu YS, Chen YJ, Wu KP

•papers•Jul 1 2025

Muscle loss after radiotherapy is associated with poorer survival in patients with oral cavity squamous cell carcinoma (OCSCC). However, the threshold of muscle loss remains unclear. This study aimed to utilize explainable artificial intelligence to identify the threshold of muscle loss associated with survival in OCSCC. We enrolled 1087 patients with OCSCC treated with surgery and adjuvant radiotherapy at two tertiary centers (660 in the derivation cohort and 427 in the external validation cohort). Skeletal muscle index (SMI) was measured using pre- and post-radiotherapy computed tomography (CT) at the C3 vertebral level. Random forest (RF), eXtreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) models were developed to predict all-cause mortality, and their performances were evaluated using the area under the curve (AUC). Muscle loss threshold was identified using the SHapley Additive exPlanations (SHAP) method and validated using Cox regression analysis. In the external validation cohort, the RF, XGBoost, and CatBoost models achieved favorable performance in predicting all-cause mortality (AUC: 0.898, 0.859, and 0.842). The SHAP method demonstrated that SMI change after radiotherapy was the most important feature for predicting all-cause mortality and consistently identified SMI loss ≥ 4.2% as the threshold in all three models. In multivariable analysis, SMI loss ≥ 4.2% was independently associated with increased all-cause mortality risk in both cohorts (derivation cohort: hazard ratio: 6.66, p < 0.001; external validation cohort: hazard ratio: 8.46, p < 0.001). This study can assist clinicians in identifying patients with considerable muscle loss after treatment and guide interventions to improve muscle mass. Question Muscle loss after radiotherapy is associated with poorer survival in patients with oral cavity cancer; however, the threshold of muscle loss remains unclear. Findings Explainable artificial intelligence identified muscle loss ≥ 4.2% as the threshold of increased all-cause mortality risk in both derivation and external validation cohorts. Clinical Relevance Muscle loss ≥ 4.2% may be the optimal threshold for survival in patients who receive adjuvant radiotherapy for oral cavity cancer. This threshold can guide clinicians in improving muscle mass after radiotherapy.

CT Classification Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Improving Tuberculosis Detection in Chest X-Ray Images Through Transfer Learning and Deep Learning: Comparative Study of Convolutional Neural Network Architectures.

Mirugwe A, Tamale L, Nyirenda J

•papers•Jul 1 2025

Tuberculosis (TB) remains a significant global health challenge, as current diagnostic methods are often resource-intensive, time-consuming, and inaccessible in many high-burden communities, necessitating more efficient and accurate diagnostic methods to improve early detection and treatment outcomes. This study aimed to evaluate the performance of 6 convolutional neural network architectures-Visual Geometry Group-16 (VGG16), VGG19, Residual Network-50 (ResNet50), ResNet101, ResNet152, and Inception-ResNet-V2-in classifying chest x-ray (CXR) images as either normal or TB-positive. The impact of data augmentation on model performance, training times, and parameter counts was also assessed. The dataset of 4200 CXR images, comprising 700 labeled as TB-positive and 3500 as normal cases, was used to train and test the models. Evaluation metrics included accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve. The computational efficiency of each model was analyzed by comparing training times and parameter counts. VGG16 outperformed the other architectures, achieving an accuracy of 99.4%, precision of 97.9%, recall of 98.6%, F1-score of 98.3%, and area under the receiver operating characteristic curve of 98.25%. This superior performance is significant because it demonstrates that a simpler model can deliver exceptional diagnostic accuracy while requiring fewer computational resources. Surprisingly, data augmentation did not improve performance, suggesting that the original dataset's diversity was sufficient. Models with large numbers of parameters, such as ResNet152 and Inception-ResNet-V2, required longer training times without yielding proportionally better performance. Simpler models like VGG16 offer a favorable balance between diagnostic accuracy and computational efficiency for TB detection in CXR images. These findings highlight the need to tailor model selection to task-specific requirements, providing valuable insights for future research and clinical implementations in medical image classification.

X-Ray Classification Chest Methodology In Silico Academic Lab

Lessons learned from RadiologyNET foundation models for transfer learning in medical radiology.

Napravnik M, Hržić F, Urschler M, Miletić D, Štajduhar I

•papers•Jul 1 2025

Deep learning models require large amounts of annotated data, which are hard to obtain in the medical field, as the annotation process is laborious and depends on expert knowledge. This data scarcity hinders a model's ability to generalise effectively on unseen data, and recently, foundation models pretrained on large datasets have been proposed as a promising solution. RadiologyNET is a custom medical dataset that comprises 1,902,414 medical images covering various body parts and modalities of image acquisition. We used the RadiologyNET dataset to pretrain several popular architectures (ResNet18, ResNet34, ResNet50, VGG16, EfficientNetB3, EfficientNetB4, InceptionV3, DenseNet121, MobileNetV3Small and MobileNetV3Large). We compared the performance of ImageNet and RadiologyNET foundation models against training from randomly initialiased weights on several publicly available medical datasets: (i) Segmentation-LUng Nodule Analysis Challenge, (ii) Regression-RSNA Pediatric Bone Age Challenge, (iii) Binary classification-GRAZPEDWRI-DX and COVID-19 datasets, and (iv) Multiclass classification-Brain Tumor MRI dataset. Our results indicate that RadiologyNET-pretrained models generally perform similarly to ImageNet models, with some advantages in resource-limited settings. However, ImageNet-pretrained models showed competitive performance when fine-tuned on sufficient data. The impact of modality diversity on model performance was tested, with the results varying across tasks, highlighting the importance of aligning pretraining data with downstream applications. Based on our findings, we provide guidelines for using foundation models in medical applications and publicly release our RadiologyNET-pretrained models to support further research and development in the field. The models are available at https://github.com/AIlab-RITEH/RadiologyNET-TL-models .

Mixed Modality Classification Whole Body Methodology In Silico Academic Lab Open Code Open Dataset

Accuracy of machine learning models for pre-diagnosis and diagnosis of pancreatic ductal adenocarcinoma in contrast-CT images: a systematic review and meta-analysis.

Lopes Costa GL, Tasca Petroski G, Machado LG, Eulalio Santos B, de Oliveira Ramos F, Feuerschuette Neto LM, De Luca Canto G

•papers•Jul 1 2025

To evaluate the diagnostic ability and methodological quality of ML models in detecting Pancreatic Ductal Adenocarcinoma (PDAC) in Contrast CT images. Included studies assessed adults diagnosed with PDAC, confirmed by histopathology. Metrics of tests were interpreted by ML algorithms. Studies provided data on sensitivity and specificity. Studies that did not meet the inclusion criteria, segmentation-focused studies, multiple classifiers or non-diagnostic studies were excluded. PubMed, Cochrane Central Register of Controlled Trials, and Embase were searched without restrictions. Risk of bias was assessed using QUADAS-2, methodological quality was evaluated using Radiomics Quality Score (RQS) and a Checklist for AI in Medical Imaging (CLAIM). Bivariate random-effects models were used for meta-analysis of sensitivity and specificity, I<sup>2</sup> values and subgroup analysis used to assess heterogeneity. Nine studies were included and 12,788 participants were evaluated, of which 3,997 were included in the meta-analysis. AI models based on CT scans showed an accuracy of 88.7% (IC 95%, 87.7%-89.7%), sensitivity of 87.9% (95% CI, 82.9%-91.6%), and specificity of 92.2% (95% CI, 86.8%-95.5%). The average score of six radiomics studies was 17.83 RQS points. Nine ML methods had an average CLAIM score of 30.55 points. Our study is the first to quantitatively interpret various independent research, offering insights for clinical application. Despite favorable sensitivity and specificity results, the studies were of low quality, limiting definitive conclusions. Further research is necessary to validate these models before widespread adoption.

CT Classification Abdominal Meta Analysis In Silico Academic Lab Benchmark SOTA

Identifying Primary Sites of Spinal Metastases: Expert-Derived Features vs. ResNet50 Model Using Nonenhanced MRI.

A multimodal deep-learning model based on multichannel CT radiomics for predicting pathological grade of bladder cancer.

Does alignment alone predict mechanical complications after adult spinal deformity surgery? A machine learning comparison of alignment, bone quality, and soft tissue.

Association between antithrombotic medications and intracranial hemorrhage among older patients with mild traumatic brain injury: a multicenter cohort study.

Orbital CT deep learning models in thyroid eye disease rival medical specialists' performance in optic neuropathy prediction in a quaternary referral center and revealed impact of the bony walls.

Development and validation of a fusion model based on multi-phase contrast CT radiomics combined with clinical features for predicting Ki-67 expression in gastric cancer.

Identifying threshold of CT-defined muscle loss after radiotherapy for survival in oral cavity cancer using machine learning.

Improving Tuberculosis Detection in Chest X-Ray Images Through Transfer Learning and Deep Learning: Comparative Study of Convolutional Neural Network Architectures.

Lessons learned from RadiologyNET foundation models for transfer learning in medical radiology.

Accuracy of machine learning models for pre-diagnosis and diagnosis of pancreatic ductal adenocarcinoma in contrast-CT images: a systematic review and meta-analysis.

Ready to Sharpen Your Edge?