Latest Papers on Radiology AI. Tags: Classification, Order: Best Match, Limit: 10.

Machine learning for grading prediction and survival analysis in high grade glioma.

Li X, Huang X, Shen Y, Yu S, Zheng L, Cai Y, Yang Y, Zhang R, Zhu L, Wang E

•papers•May 15 2025

We developed and validated a magnetic resonance imaging (MRI)-based radiomics model for the classification of high-grade glioma (HGG) and determined the optimal machine learning (ML) approach. This retrospective analysis included 184 patients (59 grade III lesions and 125 grade IV lesions). Radiomics features were extracted from MRI with T1-weighted imaging (T1WI). The least absolute shrinkage and selection operator (LASSO) feature selection method and seven classification methods including logistic regression, XGBoost, Decision Tree, Random Forest (RF), Adaboost, Gradient Boosting Decision Tree, and Stacking fusion model were used to differentiate HGG. Performance was compared on AUC, sensitivity, accuracy, precision and specificity. In the non-fusion models, the best performance was achieved by using the XGBoost classifier, and using SMOTE to deal with the data imbalance to improve the performance of all the classifiers. The Stacking fusion model performed the best, with an AUC = 0.95 (sensitivity of 0.84; accuracy of 0.85; F1 score of 0.85). MRI-based quantitative radiomics features have good performance in identifying the classification of HGG. The XGBoost method outperforms the classifiers in the non-fusion model and the Stacking fusion model outperforms the non-fusion model.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Enhancing medical explainability in deep learning for age-related macular degeneration diagnosis.

Shi L

•papers•May 15 2025

Deep learning models hold significant promise for disease diagnosis but often lack transparency in their decision-making processes, limiting trust and hindering clinical adoption. This study introduces a novel multi-task learning framework to enhance the medical explainability of deep learning models for diagnosing age-related macular degeneration (AMD) using fundus images. The framework simultaneously performs AMD classification and lesion segmentation, allowing the model to support its diagnoses with AMD-associated lesions identified through segmentation. In addition, we perform an in-depth interpretability analysis of the model, proposing the Medical Explainability Index (MXI), a novel metric that quantifies the medical relevance of the generated heatmaps by comparing them with the model's lesion segmentation output. This metric provides a measurable basis to evaluate whether the model's decisions are grounded in clinically meaningful information. The proposed method was trained and evaluated on the Automatic Detection Challenge on Age-Related Macular Degeneration (ADAM) dataset. Experimental results demonstrate robust performance, achieving an area under the curve (AUC) of 0.96 for classification and a Dice similarity coefficient (DSC) of 0.59 for segmentation, outperforming single-task models. By offering interpretable and clinically relevant insights, our approach aims to foster greater trust in AI-driven disease diagnosis and facilitate its adoption in clinical practice.

OCT Classification Other Methodology In Silico Academic Lab Benchmark SOTA

Accuracy and Reliability of Multimodal Imaging in Diagnosing Knee Sports Injuries.

Zhu D, Zhang Z, Li W

•papers•May 15 2025

Due to differences in subjective experience and professional level among doctors, as well as inconsistent diagnostic criteria, there are issues with the accuracy and reliability of single imaging diagnosis results for knee joint injuries. To address these issues, magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound (US) are adopted in this article for ensemble learning, and deep learning (DL) is combined for automatic analysis. By steps such as image enhancement, noise elimination, and tissue segmentation, the quality of image data is improved, and then convolutional neural networks (CNN) are used to automatically identify and classify injury types. The experimental results show that the DL model exhibits high sensitivity and specificity in the diagnosis of different types of injuries, such as anterior cruciate ligament tear, meniscus injury, cartilage injury, and fracture. The diagnostic accuracy of anterior cruciate ligament tear exceeds 90%, and the highest diagnostic accuracy of cartilage injury reaches 95.80%. In addition, compared with traditional manual image interpretation, the DL model has significant advantages in time efficiency, with a significant reduction in average interpretation time per case. The diagnostic consistency experiment shows that the DL model has high consistency with doctors' diagnosis results, with an overall error rate of less than 2%. The model has high accuracy and strong generalization ability when dealing with different types of joint injuries. These data indicate that combining multiple imaging technologies and the DL algorithm can effectively improve the accuracy and efficiency of diagnosing sports injuries of knee joints.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Deep learning MRI-based radiomic models for predicting recurrence in locally advanced nasopharyngeal carcinoma after neoadjuvant chemoradiotherapy: a multi-center study.

Hu C, Xu C, Chen J, Huang Y, Meng Q, Lin Z, Huang X, Chen L

•papers•May 15 2025

Local recurrence and distant metastasis were a common manifestation of locoregionally advanced nasopharyngeal carcinoma (LA-NPC) after neoadjuvant chemoradiotherapy (NACT). To validate the clinical value of MRI radiomic models based on deep learning for predicting the recurrence of LA-NPC patients. A total of 328 NPC patients from four hospitals were retrospectively included and divided into the training(n = 229) and validation (n = 99) cohorts randomly. Extracting 975 traditional radiomic features and 1000 deep radiomic features from contrast enhanced T1-weighted (T1WI + C) and T2-weighted (T2WI) sequences, respectively. Least absolute shrinkage and selection operator (LASSO) was applied for feature selection. Five machine learning classifiers were conducted to develop three models for LA-NPC prediction in training cohort, namely Model I: traditional radiomic features, Model II: combined the deep radiomic features with Model I, and Model III: combined Model II with clinical features. The predictive performance of these models were evaluated by receive operating characteristic (ROC) curve analysis, area under the curve (AUC), accuracy, sensitivity and specificity in both cohorts. The clinical characteristics in two cohorts showed no significant differences. Choosing 15 radiomic features and 6 deep radiomic features from T1WI + C. Choosing 9 radiomic features and 6 deep radiomic features from T2WI. In T2WI, the Model II based on Random forest (RF) (AUC = 0.87) performed best compared with other models in validation cohort. Traditional radiomic model combined with deep radiomic features shows excellent predictive performance. It could be used assist clinical doctors to predict curative effect for LA-NPC patients after NACT.

MRI Classification Other Retrospective Clinical In Silico Academic Lab

A computed tomography-based radiomics prediction model for BRAF mutation status in colorectal cancer.

Zhou B, Tan H, Wang Y, Huang B, Wang Z, Zhang S, Zhu X, Wang Z, Zhou J, Cao Y

•papers•May 15 2025

The aim of this study was to develop and validate CT venous phase image-based radiomics to predict BRAF gene mutation status in preoperative colorectal cancer patients. In this study, 301 patients with pathologically confirmed colorectal cancer were retrospectively enrolled, comprising 225 from Centre I (73 mutant and 152 wild-type) and 76 from Centre II (36 mutant and 40 wild-type). The Centre I cohort was randomly divided into a training set (n = 158) and an internal validation set (n = 67) in a 7:3 ratio, while Centre II served as an independent external validation set (n = 76). The whole tumor region of interest was segmented, and radiomics characteristics were extracted. To explore whether tumor expansion could improve the performance of the study objectives, the tumor contour was extended by 3 mm in this study. Finally, a t-test, Pearson correlation, and LASSO regression were used to screen out features strongly associated with BRAF mutations. Based on these features, six classifiers-Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost)-were constructed. The model performance and clinical utility were evaluated using receiver operating characteristic (ROC) curves, decision curve analysis, accuracy, sensitivity, and specificity. Gender was an independent predictor of BRAF mutations. The unexpanded RF model, constructed using 11 imaging histologic features, demonstrated the best predictive performance. For the training cohort, it achieved an AUC of 0.814 (95% CI 0.732-0.895), an accuracy of 0.810, and a sensitivity of 0.620. For the internal validation cohort, it achieved an AUC of 0.798 (95% CI 0.690-0.907), an accuracy of 0.761, and a sensitivity of 0.609. For the external validation cohort, it achieved an AUC of 0.737 (95% CI 0.616-0.847), an accuracy of 0.658, and a sensitivity of 0.667. A machine learning model based on CT radiomics can effectively predict BRAF mutations in patients with colorectal cancer. The unexpanded RF model demonstrated optimal predictive performance.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab

MRI-derived deep learning models for predicting 1p/19q codeletion status in glioma patients: a systematic review and meta-analysis of diagnostic test accuracy studies.

Ahmadzadeh AM, Broomand Lomer N, Ashoobi MA, Elyassirad D, Gheiji B, Vatanparast M, Rostami A, Abouei Mehrizi MA, Tabari A, Bathla G, Faghani S

•papers•May 15 2025

We conducted a systematic review and meta-analysis to evaluate the performance of magnetic resonance imaging (MRI)-derived deep learning (DL) models in predicting 1p/19q codeletion status in glioma patients. The literature search was performed in four databases: PubMed, Web of Science, Embase, and Scopus. We included the studies that evaluated the performance of end-to-end DL models in predicting the status of glioma 1p/19q codeletion. The quality of the included studies was assessed by the Quality assessment of diagnostic accuracy studies-2 (QUADAS-2) METhodological RadiomICs Score (METRICS). We calculated diagnostic pooled estimates and heterogeneity was evaluated using I<sup>2</sup>. Subgroup analysis and sensitivity analysis were conducted to explore sources of heterogeneity. Publication bias was evaluated by Deeks' funnel plots. Twenty studies were included in the systematic review. Only two studies had a low quality. A meta-analysis of the ten studies demonstrated a pooled sensitivity of 0.77 (95% CI: 0.63-0.87), a specificity of 0.85 (95% CI: 0.74-0.92), a positive diagnostic likelihood ratio (DLR) of 5.34 (95% CI: 2.88-9.89), a negative DLR of 0.26 (95% CI: 0.16-0.45), a diagnostic odds ratio of 20.24 (95% CI: 8.19-50.02), and an area under the curve of 0.89 (95% CI: 0.86-0.91). The subgroup analysis identified a significant difference between groups depending on the segmentation method used. DL models can predict glioma 1p/19q codeletion status with high accuracy and may enhance non-invasive tumor characterization and aid in the selection of optimal therapeutic strategies.

MRI Classification Neurological Meta Analysis In Silico Academic Lab Benchmark SOTA

Recent advancements in personalized management of prostate cancer biochemical recurrence after radical prostatectomy.

Falkenbach F, Ekrutt J, Maurer T

•papers•May 15 2025

Biochemical recurrence (BCR) after radical prostatectomy exhibits heterogeneous prognostic implications. Recent advancements in imaging and biomarkers have high potential for personalizing care. Prostate-specific membrane antigen imaging (PSMA)-PET/CT has revolutionized the BCR management in prostate cancer by detecting microscopic lesions earlier than conventional staging, leading to improved cancer control outcomes and changes in treatment plans in approximately two-thirds of cases. Salvage radiotherapy, often combined with androgen deprivation therapy, remains the standard treatment for high-risk BCR postprostatectomy, with PSMA-PET/CT guiding treatment adjustments, such as the radiation field, and improving progression-free survival. Advancements in biomarkers, genomic classifiers, and artificial intelligence-based models have enhanced risk stratification and personalized treatment planning, resulting in both treatment intensification and de-escalation. While conventional risk grouping relying on Gleason score and PSA level and kinetics remain the foundation for BCR management, PSMA-PET/CT, novel biomarkers, and artificial intelligence may enable more personalized treatment strategies.

PET Classification Abdominal Review In Silico Academic Lab

Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Murto N, Lund T, Kautiainen H, Luoma K, Kerttula L

•papers•May 15 2025

To assess the agreement between lumbar disc degeneration (DD) grading by the convolutional neural network model SpineNet and radiologist's visual grading. In a 14-year follow-up MRI study involving 19 male volunteers, lumbar DD was assessed by SpineNet and two radiologists using the Pfirrmann classification at baseline (age 37) and after 14 years (age 51). Pfirrmann summary scores (PSS) were calculated by summing individual disc grades. The agreement between the first radiologist and SpineNet was analyzed, with the second radiologist's grading used for inter-observer agreement. Significant differences were observed in the Pfirrmann grades and PSS assigned by the radiologist and SpineNet at both time points. SpineNet assigned Pfirrmann grade 1 to several discs and grade 5 to more discs compared to the radiologists. The concordance correlation coefficients (CCC) of PSS between the radiologist and SpineNet were 0.54 (95% CI: 0.28 to 0.79) at baseline and 0.54 (0.27 to 0.80) at follow-up. The average kappa (κ) values of 0.74 (0.68 to 0.81) at baseline and 0.68 (0.58 to 0.77) at follow-up. CCC of PSS between the radiologists was 0.83 (0.69 to 0.97) at baseline and 0.78 (0.61 to 0.95) at follow-up, with κ values ranging from 0.73 to 0.96. We found fair to substantial agreement in DD grading between SpineNet and the radiologist, albeit with notable discrepancies. These findings indicate that AI-based systems like SpineNet hold promise as complementary tools in radiological evaluation, including in longitudinal studies, but emphasize the need for ongoing refinement of AI algorithms.

MRI Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H

•papers•May 15 2025

Hepatocellular carcinoma (HCC) is an aggressive cancer with limited biomarkers for predicting immunotherapy response. Recent advancements in large language models (LLMs) like GPT-4, GPT-4o, and Gemini offer the potential for enhancing clinical decision-making through multimodal data analysis. However, their effectiveness in predicting immunotherapy response, especially compared to human experts, remains unclear. This study assessed the performance of GPT-4, GPT-4o, and Gemini in predicting immunotherapy response in unresectable HCC, compared to radiologists and oncologists of varying expertise. A retrospective analysis of 186 patients with unresectable HCC utilized multimodal data (clinical and CT images). LLMs were evaluated with zero-shot prompting and two strategies: the 'voting method' and the 'OR rule method' for improved sensitivity. Performance metrics included accuracy, sensitivity, area under the curve (AUC), and agreement across LLMs and physicians.GPT-4o, using the 'OR rule method,' achieved 65% accuracy and 47% sensitivity, comparable to intermediate physicians but lower than senior physicians (accuracy: 72%, p = 0.045; sensitivity: 70%, p < 0.0001). Gemini-GPT, combining GPT-4, GPT-4o, and Gemini, achieved an AUC of 0.69, similar to senior physicians (AUC: 0.72, p = 0.35), with 68% accuracy, outperforming junior and intermediate physicians while remaining comparable to senior physicians (p = 0.78). However, its sensitivity (58%) was lower than senior physicians (p = 0.0097). LLMs demonstrated higher inter-model agreement (κ = 0.59-0.70) than inter-physician agreement, especially among junior physicians (κ = 0.15). This study highlights the potential of LLMs, particularly Gemini-GPT, as valuable tools in predicting immunotherapy response for HCC.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Leoncini A, Trimboli P

•papers•May 15 2025

As ultrasound (US) is the most accurate tool for assessing the thyroid nodule (TN) risk of malignancy (RoM), international societies have published various Thyroid Imaging and Reporting Data Systems (TIRADSs). With the recent advent of artificial intelligence (AI), clinicians and researchers should ask themselves how AI could interpret the terminology of the TIRADSs and whether or not AIs agree in the risk assessment of TNs. The study aim was to analyze the interobserver agreement (IOA) between AIs in assessing the RoM of TNs across various TIRADSs categories using a cases series created combining TIRADSs descriptors. ChatGPT, Google Gemini, and Claude were compared. ACR-TIRADS, EU-TIRADS, and K-TIRADS, were employed to evaluate the AI assessment. Multiple written scenarios for the three TIRADS were created, the cases were evaluated by the three AIs, and their assessments were analyzed and compared. The IOA was estimated by comparing the kappa (κ) values. Ninety scenarios were created. With ACR-TIRADS the IOA analysis gave κ = 0.58 between ChatGPT and Gemini, 0.53 between ChatGPT and Claude, and 0.90 between Gemini and Claude. With EU-TIRADS it was observed κ value = 0.73 between ChatGPT and Gemini, 0.62 between ChatGPT and Claude, and 0.72 between Gemini and Claude. With K-TIRADS it was found κ = 0.88 between ChatGPT and Gemini, 0.70 between ChatGPT and Claude, and 0.61 between Gemini and Claude. This study found that there were non-negligible variability between the three AIs. Clinicians and patients should be aware of these new findings.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

Machine learning for grading prediction and survival analysis in high grade glioma.

Enhancing medical explainability in deep learning for age-related macular degeneration diagnosis.

Accuracy and Reliability of Multimodal Imaging in Diagnosing Knee Sports Injuries.

Deep learning MRI-based radiomic models for predicting recurrence in locally advanced nasopharyngeal carcinoma after neoadjuvant chemoradiotherapy: a multi-center study.

A computed tomography-based radiomics prediction model for BRAF mutation status in colorectal cancer.

MRI-derived deep learning models for predicting 1p/19q codeletion status in glioma patients: a systematic review and meta-analysis of diagnostic test accuracy studies.

Recent advancements in personalized management of prostate cancer biochemical recurrence after radical prostatectomy.

Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Ready to Sharpen Your Edge?