Latest Papers on Radiology AI. Tags: Classification, Order: Best Match, Limit: 10.

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H

•papers•May 15 2025

Hepatocellular carcinoma (HCC) is an aggressive cancer with limited biomarkers for predicting immunotherapy response. Recent advancements in large language models (LLMs) like GPT-4, GPT-4o, and Gemini offer the potential for enhancing clinical decision-making through multimodal data analysis. However, their effectiveness in predicting immunotherapy response, especially compared to human experts, remains unclear. This study assessed the performance of GPT-4, GPT-4o, and Gemini in predicting immunotherapy response in unresectable HCC, compared to radiologists and oncologists of varying expertise. A retrospective analysis of 186 patients with unresectable HCC utilized multimodal data (clinical and CT images). LLMs were evaluated with zero-shot prompting and two strategies: the 'voting method' and the 'OR rule method' for improved sensitivity. Performance metrics included accuracy, sensitivity, area under the curve (AUC), and agreement across LLMs and physicians.GPT-4o, using the 'OR rule method,' achieved 65% accuracy and 47% sensitivity, comparable to intermediate physicians but lower than senior physicians (accuracy: 72%, p = 0.045; sensitivity: 70%, p < 0.0001). Gemini-GPT, combining GPT-4, GPT-4o, and Gemini, achieved an AUC of 0.69, similar to senior physicians (AUC: 0.72, p = 0.35), with 68% accuracy, outperforming junior and intermediate physicians while remaining comparable to senior physicians (p = 0.78). However, its sensitivity (58%) was lower than senior physicians (p = 0.0097). LLMs demonstrated higher inter-model agreement (κ = 0.59-0.70) than inter-physician agreement, especially among junior physicians (κ = 0.15). This study highlights the potential of LLMs, particularly Gemini-GPT, as valuable tools in predicting immunotherapy response for HCC.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Leoncini A, Trimboli P

•papers•May 15 2025

As ultrasound (US) is the most accurate tool for assessing the thyroid nodule (TN) risk of malignancy (RoM), international societies have published various Thyroid Imaging and Reporting Data Systems (TIRADSs). With the recent advent of artificial intelligence (AI), clinicians and researchers should ask themselves how AI could interpret the terminology of the TIRADSs and whether or not AIs agree in the risk assessment of TNs. The study aim was to analyze the interobserver agreement (IOA) between AIs in assessing the RoM of TNs across various TIRADSs categories using a cases series created combining TIRADSs descriptors. ChatGPT, Google Gemini, and Claude were compared. ACR-TIRADS, EU-TIRADS, and K-TIRADS, were employed to evaluate the AI assessment. Multiple written scenarios for the three TIRADS were created, the cases were evaluated by the three AIs, and their assessments were analyzed and compared. The IOA was estimated by comparing the kappa (κ) values. Ninety scenarios were created. With ACR-TIRADS the IOA analysis gave κ = 0.58 between ChatGPT and Gemini, 0.53 between ChatGPT and Claude, and 0.90 between Gemini and Claude. With EU-TIRADS it was observed κ value = 0.73 between ChatGPT and Gemini, 0.62 between ChatGPT and Claude, and 0.72 between Gemini and Claude. With K-TIRADS it was found κ = 0.88 between ChatGPT and Gemini, 0.70 between ChatGPT and Claude, and 0.61 between Gemini and Claude. This study found that there were non-negligible variability between the three AIs. Clinicians and patients should be aware of these new findings.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

Deep Learning-Based Chronic Obstructive Pulmonary Disease Exacerbation Prediction Using Flow-Volume and Volume-Time Curve Imaging: Retrospective Cohort Study.

Jeon ET, Park H, Lee JK, Heo EY, Lee CH, Kim DK, Kim DH, Lee HW

•papers•May 15 2025

Chronic obstructive pulmonary disease (COPD) is a common and progressive respiratory condition characterized by persistent airflow limitation and symptoms such as dyspnea, cough, and sputum production. Acute exacerbations (AE) of COPD (AE-COPD) are key determinants of disease progression; yet, existing predictive models relying mainly on spirometric measurements, such as forced expiratory volume in 1 second, reflect only a fraction of the physiological information embedded in respiratory function tests. Recent advances in artificial intelligence (AI) have enabled more sophisticated analyses of full spirometric curves, including flow-volume loops and volume-time curves, facilitating the identification of complex patterns associated with increased exacerbation risk. This study aimed to determine whether a predictive model that integrates clinical data and spirometry images with the use of AI improves accuracy in predicting moderate-to-severe and severe AE-COPD events compared to a clinical-only model. A retrospective cohort study was conducted using COPD registry data from 2 teaching hospitals from January 2004 to December 2020. The study included a total of 10,492 COPD cases, divided into a development cohort (6870 cases) and an external validation cohort (3622 cases). The AI-enhanced model (AI-PFT-Clin) used a combination of clinical variables (eg, history of AE-COPD, dyspnea, and inhaled treatments) and spirometry image data (flow-volume loop and volume-time curves). In contrast, the Clin model used only clinical variables. The primary outcomes were moderate-to-severe and severe AE-COPD events within a year of spirometry. In the external validation cohort, the AI-PFT-Clin model outperformed the Clin model, showing an area under the receiver operating characteristic curve of 0.755 versus 0.730 (P<.05) for moderate-to-severe AE-COPD and 0.713 versus 0.675 (P<.05) for severe AE-COPD. The AI-PFT-Clin model demonstrated reliable predictive capability across subgroups, including younger patients and those without previous exacerbations. Higher AI-PFT-Clin scores correlated with elevated AE-COPD risk (adjusted hazard ratio for Q4 vs Q1: 4.21, P<.001), with sustained predictive stability over a 10-year follow-up period. The AI-PFT-Clin model, by integrating clinical data with spirometry images, offers enhanced predictive accuracy for AE-COPD events compared to a clinical-only approach. This AI-based framework facilitates the early identification of high-risk individuals through the detection of physiological abnormalities not captured by conventional metrics. The model's robust performance and long-term predictive stability suggest its potential utility in proactive COPD management and personalized intervention planning. These findings highlight the promise of incorporating advanced AI techniques into routine COPD management, particularly in populations traditionally seen as lower risk, supporting improved management of COPD through tailored patient care.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Jiao C, Ye J, Liao J, Li J, Liang J, He S

•papers•May 15 2025

Osteoarthritis (OA) is a major cause of disability worldwide, with symptoms like joint pain, limited functionality, and decreased quality of life, potentially leading to deformity and irreversible damage. Chemical changes in joint tissues precede imaging alterations, making early diagnosis challenging for conventional methods like X-rays. Although Raman imaging provides detailed chemical information, it is time-consuming. This paper aims to achieve rapid osteoarthritis diagnosis and grading using a self-developed Raman imaging system combined with deep learning denoising and acceleration algorithms. Our self-developed aberration-corrected line-scanning confocal Raman imaging device acquires a line of Raman spectra (hundreds of points) per scan using a galvanometer or displacement stage, achieving spatial and spectral resolutions of 2 μm and 0.2 nm, respectively. Deep learning algorithms enhance the imaging speed by over 4 times through effective spectrum denoising and signal-to-noise ratio (SNR) improvement. By leveraging the denoising capabilities of deep learning, we are able to acquire high-quality Raman spectral data with a reduced integration time, thereby accelerating the imaging process. Experiments on the tibial plateau of osteoarthritis patients compared three excitation wavelengths (532, 671, and 785 nm), with 671 nm chosen for optimal SNR and minimal fluorescence. Machine learning algorithms achieved a 98 % accuracy in distinguishing articular from calcified cartilage and a 97 % accuracy in differentiating osteoarthritis grades I to IV. Our fast Raman imaging system, combining an aberration-corrected line-scanning confocal Raman imager with deep learning denoising, offers improved imaging speed and enhanced spectral and spatial resolutions. It enables rapid, label-free detection of osteoarthritis severity and can identify early compositional changes before clinical imaging, allowing precise grading and tailored treatment, thus advancing orthopedic diagnostics and improving patient outcomes.

OCT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Breakthrough

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach.

Yi X, Ma M, Wang X, Zhang J, Wu F, Huang H, Xiao Q, Xie A, Liu P, Grecucci A

•papers•May 15 2025

Pediatric bipolar disorder (PBD) is a highly debilitating condition, characterized by alternating episodes of mania and depression, with intervening periods of remission. Limited information is available about the functional and structural abnormalities in PBD, particularly when comparing type I with type II subtypes. Resting-state brain activity and structural grey matter, assessed through MRI, may provide insight into the neurobiological biomarkers of this disorder. In this study, Resting state Regional Homogeneity (ReHo) and grey matter concentration (GMC) data of 58 PBD patients, and 21 healthy controls matched for age, gender, education and IQ, were analyzed in a data fusion unsupervised machine learning approach known as transposed Independent Vector Analysis. Two networks significantly differed between BPD and HC. The first network included fronto- medial regions, such as the medial and superior frontal gyrus, the cingulate, and displayed higher ReHo and GMC values in PBD compared to HC. The second network included temporo-posterior regions, as well as the insula, the caudate and the precuneus and displayed lower ReHo and GMC values in PBD compared to HC. Additionally, two networks differ between type-I vs type-II in PBD: an occipito-cerebellar network with increased ReHo and GMC in type-I compared to type-II, and a fronto-parietal network with decreased ReHo and GMC in type-I compared to type-II. Of note, the first network positively correlated with depression scores. These findings shed new light on the functional and structural abnormalities displayed by pediatric bipolar patients.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Single View Echocardiographic Analysis for Left Ventricular Outflow Tract Obstruction Prediction in Hypertrophic Cardiomyopathy: A Deep Learning Approach

Kim, J., Park, J., Jeon, J., Yoon, Y. E., Jang, Y., Jeong, H., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Cho, G.-Y., Chang, H.-J.

•preprint•May 14 2025

BackgroundAccurate left ventricular outflow tract obstruction (LVOTO) assessment is crucial for hypertrophic cardiomyopathy (HCM) management and prognosis. Traditional methods, requiring multiple views, Doppler, and provocation, is often infeasible, especially where resources are limited. This study aimed to develop and validate a deep learning (DL) model capable of predicting severe LVOTO in HCM patients using only the parasternal long-axis (PLAX) view from transthoracic echocardiography (TTE). MethodsA DL model was trained on PLAX videos extracted from TTE examinations (developmental dataset, n=1,007) to capture both morphological and dynamic motion features, generating a DL index for LVOTO (DLi-LVOTO, range 0-100). Performance was evaluated in an internal test dataset (ITDS, n=87) and externally validated in the distinct hospital dataset (DHDS, n=1,334) and the LVOTO reduction treatment dataset (n=156). ResultsThe model achieved high accuracy in detecting severe LVOTO (pressure gradient[≥] 50mmHg), with area under the receiver operating characteristics curve (AUROC) of 0.97 (95% confidence interval: 0.92-1.00) in ITDS and 0.93 (0.92-0.95) in DHDS. At a DLi-LVOTO threshold of 70, the model demonstrated a specificity of 97.3% and negative predictive value (NPV) of 96.1% in ITDS. In DHDS, a cutoff of 60 yielded a specificity of 94.6% and NPV of 95.5%. DLi-LVOTO also decreased significantly after surgical myectomy or Mavacamten treatment, correlating with reductions in peak pressure gradient (p<0.001 for all). ConclusionsOur DL-based approach predicts severe LVOTO using only the PLAX view from TTE, serving as a complementary tool, particularly in resource-limited settings or when Doppler is unavailable, and for monitoring treatment response.

Ultrasound Classification Cardiac Retrospective Clinical In Silico Academic Lab

A multi-layered defense against adversarial attacks in brain tumor classification using ensemble adversarial training and feature squeezing.

Yinusa A, Faezipour M

•papers•May 14 2025

Deep learning, particularly convolutional neural networks (CNNs), has proven valuable for brain tumor classification, aiding diagnostic and therapeutic decisions in medical imaging. Despite their accuracy, these models are vulnerable to adversarial attacks, compromising their reliability in clinical settings. In this research, we utilized a VGG16-based CNN model to classify brain tumors, achieving 96% accuracy on clean magnetic resonance imaging (MRI) data. To assess robustness, we exposed the model to Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks, which reduced accuracy to 32% and 13%, respectively. We then applied a multi-layered defense strategy, including adversarial training with FGSM and PGD examples and feature squeezing techniques such as bit-depth reduction and Gaussian blurring. This approach improved model resilience, achieving 54% accuracy on FGSM and 47% on PGD adversarial examples. Our results highlight the importance of proactive defense strategies for maintaining the reliability of AI in medical imaging under adversarial conditions.

MRI Classification Neurological Methodology In Silico Academic Lab

Total radius BMD correlates with the hip and lumbar spine BMD among post-menopausal patients with fragility wrist fracture in a machine learning model.

Ruotsalainen T, Panfilov E, Thevenot J, Tiulpin A, Saarakkala S, Niinimäki J, Lehenkari P, Valkealahti M

•papers•May 14 2025

Osteoporosis screening should be systematic in the group of over 50-year-old females with a radius fracture. We tested a phantom combined with machine learning model and studied osteoporosis-related variables. This machine learning model for screening osteoporosis using plain radiographs requires further investigation in larger cohorts to assess its potential as a replacement for DXA measurements in settings where DXA is not available. The main purpose of this study was to improve osteoporosis screening, especially in post-menopausal patients with fragility wrist fractures. The secondary objective was to increase understanding of the connection between osteoporosis and aging, as well as other risk factors. We collected data on 83 females > 50 years old with a distal radius fracture treated at Oulu University Hospital in 2019-2020. The data included basic patient information, WHO FRAX tool, blood tests, X-ray imaging of the fractured wrist, and DXA scanning of the non-fractured forearm, both hips, and the lumbar spine. Machine learning was used in combination with a custom phantom. Eighty-five percent of the study population had osteopenia or osteoporosis. Only 28.4% of patients had increased bone resorption activity measured by ICTP values. Total radius BMD correlated with other osteoporosis-related variables (age r = - 0.494, BMI r = 0.273, FRAX osteoporotic fracture risk r = - 0.419, FRAX hip fracture risk r = - 0.433, hip BMD r = 0.435, and lumbar spine BMD r = 0.645), but the ultra distal (UD) radius BMD did not. Our custom phantom combined with a machine learning model showed potential for screening osteoporosis, with the class-wise accuracies for "Osteoporotic vs. osteopenic & normal bone" of 76% and 75%, respectively. We suggest osteoporosis screening for all females over 50 years old with wrist fractures. We found that the total radius BMD correlates with the central BMD. Due to the limited sample size in the phantom and machine learning parts of the study, further research is needed to make a clinically useful tool for screening osteoporosis.

X-Ray Classification Musculoskeletal Retrospective Clinical Prototype Academic Lab

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

•preprint•May 14 2025

Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping. Methods: We utilized a dataset provided by RSNA, comprising 192 NCCT volumes. The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer. Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation. Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively. For subtype classification, MLLMs also exhibit inferior performance compared to traditional deep learning models, with Gemini 2.0 Flash achieving an macro-averaged precision of 0.41 and a macro-averaged F1 score of 0.31. Conclusion: While MLLMs excel in interactive capabilities, their overall accuracy in ICH subtyping is inferior to deep networks. However, MLLMs enhance interpretability through language interactions, indicating potential in medical imaging analysis. Future efforts will focus on model refinement and developing more precise MLLMs to improve performance in three-dimensional medical image processing.

CT Classification Neurological Retrospective Clinical In Silico GenAI Benchmark SOTA

DCSNet: A Lightweight Knowledge Distillation-Based Model with Explainable AI for Lung Cancer Diagnosis from Histopathological Images

Sadman Sakib Alif, Nasim Anzum Promise, Fiaz Al Abid, Aniqua Nusrat Zereen

•preprint•May 14 2025

Lung cancer is a leading cause of cancer-related deaths globally, where early detection and accurate diagnosis are critical for improving survival rates. While deep learning, particularly convolutional neural networks (CNNs), has revolutionized medical image analysis by detecting subtle patterns indicative of early-stage lung cancer, its adoption faces challenges. These models are often computationally expensive and require significant resources, making them unsuitable for resource constrained environments. Additionally, their lack of transparency hinders trust and broader adoption in sensitive fields like healthcare. Knowledge distillation addresses these challenges by transferring knowledge from large, complex models (teachers) to smaller, lightweight models (students). We propose a knowledge distillation-based approach for lung cancer detection, incorporating explainable AI (XAI) techniques to enhance model transparency. Eight CNNs, including ResNet50, EfficientNetB0, EfficientNetB3, and VGG16, are evaluated as teacher models. We developed and trained a lightweight student model, Distilled Custom Student Network (DCSNet) using ResNet50 as the teacher. This approach not only ensures high diagnostic performance in resource-constrained settings but also addresses transparency concerns, facilitating the adoption of AI-driven diagnostic tools in healthcare.

Mixed Modality Classification Chest Methodology In Silico Ethics

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Deep Learning-Based Chronic Obstructive Pulmonary Disease Exacerbation Prediction Using Flow-Volume and Volume-Time Curve Imaging: Retrospective Cohort Study.

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach.

Single View Echocardiographic Analysis for Left Ventricular Outflow Tract Obstruction Prediction in Hypertrophic Cardiomyopathy: A Deep Learning Approach

A multi-layered defense against adversarial attacks in brain tumor classification using ensemble adversarial training and feature squeezing.

Total radius BMD correlates with the hip and lumbar spine BMD among post-menopausal patients with fragility wrist fracture in a machine learning model.

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

DCSNet: A Lightweight Knowledge Distillation-Based Model with Explainable AI for Lung Cancer Diagnosis from Histopathological Images

Ready to Sharpen Your Edge?