Latest Papers on Radiology AI. Tags: Classification, Order: Best Match, Limit: 10.

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Leoncini A, Trimboli P

•papers•May 15 2025

As ultrasound (US) is the most accurate tool for assessing the thyroid nodule (TN) risk of malignancy (RoM), international societies have published various Thyroid Imaging and Reporting Data Systems (TIRADSs). With the recent advent of artificial intelligence (AI), clinicians and researchers should ask themselves how AI could interpret the terminology of the TIRADSs and whether or not AIs agree in the risk assessment of TNs. The study aim was to analyze the interobserver agreement (IOA) between AIs in assessing the RoM of TNs across various TIRADSs categories using a cases series created combining TIRADSs descriptors. ChatGPT, Google Gemini, and Claude were compared. ACR-TIRADS, EU-TIRADS, and K-TIRADS, were employed to evaluate the AI assessment. Multiple written scenarios for the three TIRADS were created, the cases were evaluated by the three AIs, and their assessments were analyzed and compared. The IOA was estimated by comparing the kappa (κ) values. Ninety scenarios were created. With ACR-TIRADS the IOA analysis gave κ = 0.58 between ChatGPT and Gemini, 0.53 between ChatGPT and Claude, and 0.90 between Gemini and Claude. With EU-TIRADS it was observed κ value = 0.73 between ChatGPT and Gemini, 0.62 between ChatGPT and Claude, and 0.72 between Gemini and Claude. With K-TIRADS it was found κ = 0.88 between ChatGPT and Gemini, 0.70 between ChatGPT and Claude, and 0.61 between Gemini and Claude. This study found that there were non-negligible variability between the three AIs. Clinicians and patients should be aware of these new findings.

Ultrasound Classification Abdominal Retrospective Clinical In Silico Academic Lab

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H

•papers•May 15 2025

Hepatocellular carcinoma (HCC) is an aggressive cancer with limited biomarkers for predicting immunotherapy response. Recent advancements in large language models (LLMs) like GPT-4, GPT-4o, and Gemini offer the potential for enhancing clinical decision-making through multimodal data analysis. However, their effectiveness in predicting immunotherapy response, especially compared to human experts, remains unclear. This study assessed the performance of GPT-4, GPT-4o, and Gemini in predicting immunotherapy response in unresectable HCC, compared to radiologists and oncologists of varying expertise. A retrospective analysis of 186 patients with unresectable HCC utilized multimodal data (clinical and CT images). LLMs were evaluated with zero-shot prompting and two strategies: the 'voting method' and the 'OR rule method' for improved sensitivity. Performance metrics included accuracy, sensitivity, area under the curve (AUC), and agreement across LLMs and physicians.GPT-4o, using the 'OR rule method,' achieved 65% accuracy and 47% sensitivity, comparable to intermediate physicians but lower than senior physicians (accuracy: 72%, p = 0.045; sensitivity: 70%, p < 0.0001). Gemini-GPT, combining GPT-4, GPT-4o, and Gemini, achieved an AUC of 0.69, similar to senior physicians (AUC: 0.72, p = 0.35), with 68% accuracy, outperforming junior and intermediate physicians while remaining comparable to senior physicians (p = 0.78). However, its sensitivity (58%) was lower than senior physicians (p = 0.0097). LLMs demonstrated higher inter-model agreement (κ = 0.59-0.70) than inter-physician agreement, especially among junior physicians (κ = 0.15). This study highlights the potential of LLMs, particularly Gemini-GPT, as valuable tools in predicting immunotherapy response for HCC.

CT Classification Abdominal Retrospective Clinical In Silico Academic Lab GenAI

Accuracy and Reliability of Multimodal Imaging in Diagnosing Knee Sports Injuries.

Zhu D, Zhang Z, Li W

•papers•May 15 2025

Due to differences in subjective experience and professional level among doctors, as well as inconsistent diagnostic criteria, there are issues with the accuracy and reliability of single imaging diagnosis results for knee joint injuries. To address these issues, magnetic resonance imaging (MRI), computed tomography (CT) and ultrasound (US) are adopted in this article for ensemble learning, and deep learning (DL) is combined for automatic analysis. By steps such as image enhancement, noise elimination, and tissue segmentation, the quality of image data is improved, and then convolutional neural networks (CNN) are used to automatically identify and classify injury types. The experimental results show that the DL model exhibits high sensitivity and specificity in the diagnosis of different types of injuries, such as anterior cruciate ligament tear, meniscus injury, cartilage injury, and fracture. The diagnostic accuracy of anterior cruciate ligament tear exceeds 90%, and the highest diagnostic accuracy of cartilage injury reaches 95.80%. In addition, compared with traditional manual image interpretation, the DL model has significant advantages in time efficiency, with a significant reduction in average interpretation time per case. The diagnostic consistency experiment shows that the DL model has high consistency with doctors' diagnosis results, with an overall error rate of less than 2%. The model has high accuracy and strong generalization ability when dealing with different types of joint injuries. These data indicate that combining multiple imaging technologies and the DL algorithm can effectively improve the accuracy and efficiency of diagnosing sports injuries of knee joints.

Mixed Modality Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Jiao C, Ye J, Liao J, Li J, Liang J, He S

•papers•May 15 2025

Osteoarthritis (OA) is a major cause of disability worldwide, with symptoms like joint pain, limited functionality, and decreased quality of life, potentially leading to deformity and irreversible damage. Chemical changes in joint tissues precede imaging alterations, making early diagnosis challenging for conventional methods like X-rays. Although Raman imaging provides detailed chemical information, it is time-consuming. This paper aims to achieve rapid osteoarthritis diagnosis and grading using a self-developed Raman imaging system combined with deep learning denoising and acceleration algorithms. Our self-developed aberration-corrected line-scanning confocal Raman imaging device acquires a line of Raman spectra (hundreds of points) per scan using a galvanometer or displacement stage, achieving spatial and spectral resolutions of 2 μm and 0.2 nm, respectively. Deep learning algorithms enhance the imaging speed by over 4 times through effective spectrum denoising and signal-to-noise ratio (SNR) improvement. By leveraging the denoising capabilities of deep learning, we are able to acquire high-quality Raman spectral data with a reduced integration time, thereby accelerating the imaging process. Experiments on the tibial plateau of osteoarthritis patients compared three excitation wavelengths (532, 671, and 785 nm), with 671 nm chosen for optimal SNR and minimal fluorescence. Machine learning algorithms achieved a 98 % accuracy in distinguishing articular from calcified cartilage and a 97 % accuracy in differentiating osteoarthritis grades I to IV. Our fast Raman imaging system, combining an aberration-corrected line-scanning confocal Raman imager with deep learning denoising, offers improved imaging speed and enhanced spectral and spatial resolutions. It enables rapid, label-free detection of osteoarthritis severity and can identify early compositional changes before clinical imaging, allowing precise grading and tailored treatment, thus advancing orthopedic diagnostics and improving patient outcomes.

OCT Classification Musculoskeletal Retrospective Clinical In Silico Academic Lab Breakthrough

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach.

Yi X, Ma M, Wang X, Zhang J, Wu F, Huang H, Xiao Q, Xie A, Liu P, Grecucci A

•papers•May 15 2025

Pediatric bipolar disorder (PBD) is a highly debilitating condition, characterized by alternating episodes of mania and depression, with intervening periods of remission. Limited information is available about the functional and structural abnormalities in PBD, particularly when comparing type I with type II subtypes. Resting-state brain activity and structural grey matter, assessed through MRI, may provide insight into the neurobiological biomarkers of this disorder. In this study, Resting state Regional Homogeneity (ReHo) and grey matter concentration (GMC) data of 58 PBD patients, and 21 healthy controls matched for age, gender, education and IQ, were analyzed in a data fusion unsupervised machine learning approach known as transposed Independent Vector Analysis. Two networks significantly differed between BPD and HC. The first network included fronto- medial regions, such as the medial and superior frontal gyrus, the cingulate, and displayed higher ReHo and GMC values in PBD compared to HC. The second network included temporo-posterior regions, as well as the insula, the caudate and the precuneus and displayed lower ReHo and GMC values in PBD compared to HC. Additionally, two networks differ between type-I vs type-II in PBD: an occipito-cerebellar network with increased ReHo and GMC in type-I compared to type-II, and a fronto-parietal network with decreased ReHo and GMC in type-I compared to type-II. Of note, the first network positively correlated with depression scores. These findings shed new light on the functional and structural abnormalities displayed by pediatric bipolar patients.

MRI Classification Neurological Retrospective Clinical In Silico Academic Lab

Single View Echocardiographic Analysis for Left Ventricular Outflow Tract Obstruction Prediction in Hypertrophic Cardiomyopathy: A Deep Learning Approach

Kim, J., Park, J., Jeon, J., Yoon, Y. E., Jang, Y., Jeong, H., Lee, S.-A., Choi, H.-M., Hwang, I.-C., Cho, G.-Y., Chang, H.-J.

•preprint•May 14 2025

BackgroundAccurate left ventricular outflow tract obstruction (LVOTO) assessment is crucial for hypertrophic cardiomyopathy (HCM) management and prognosis. Traditional methods, requiring multiple views, Doppler, and provocation, is often infeasible, especially where resources are limited. This study aimed to develop and validate a deep learning (DL) model capable of predicting severe LVOTO in HCM patients using only the parasternal long-axis (PLAX) view from transthoracic echocardiography (TTE). MethodsA DL model was trained on PLAX videos extracted from TTE examinations (developmental dataset, n=1,007) to capture both morphological and dynamic motion features, generating a DL index for LVOTO (DLi-LVOTO, range 0-100). Performance was evaluated in an internal test dataset (ITDS, n=87) and externally validated in the distinct hospital dataset (DHDS, n=1,334) and the LVOTO reduction treatment dataset (n=156). ResultsThe model achieved high accuracy in detecting severe LVOTO (pressure gradient[≥] 50mmHg), with area under the receiver operating characteristics curve (AUROC) of 0.97 (95% confidence interval: 0.92-1.00) in ITDS and 0.93 (0.92-0.95) in DHDS. At a DLi-LVOTO threshold of 70, the model demonstrated a specificity of 97.3% and negative predictive value (NPV) of 96.1% in ITDS. In DHDS, a cutoff of 60 yielded a specificity of 94.6% and NPV of 95.5%. DLi-LVOTO also decreased significantly after surgical myectomy or Mavacamten treatment, correlating with reductions in peak pressure gradient (p<0.001 for all). ConclusionsOur DL-based approach predicts severe LVOTO using only the PLAX view from TTE, serving as a complementary tool, particularly in resource-limited settings or when Doppler is unavailable, and for monitoring treatment response.

Ultrasound Classification Cardiac Retrospective Clinical In Silico Academic Lab

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Yinuo Wang, Yue Zeng, Kai Chen, Cai Meng, Chao Pan, Zhouping Tang

•preprint•May 14 2025

Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping. Methods: We utilized a dataset provided by RSNA, comprising 192 NCCT volumes. The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer. Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation. Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively. For subtype classification, MLLMs also exhibit inferior performance compared to traditional deep learning models, with Gemini 2.0 Flash achieving an macro-averaged precision of 0.41 and a macro-averaged F1 score of 0.31. Conclusion: While MLLMs excel in interactive capabilities, their overall accuracy in ICH subtyping is inferior to deep networks. However, MLLMs enhance interpretability through language interactions, indicating potential in medical imaging analysis. Future efforts will focus on model refinement and developing more precise MLLMs to improve performance in three-dimensional medical image processing.

CT Classification Neurological Retrospective Clinical In Silico GenAI Benchmark SOTA

Clinical utility of ultrasound and MRI in rheumatoid arthritis: An expert review.

Kellner DA, Morris NT, Lee SM, Baker JF, Chu P, Ranganath VK, Kaeley GS, Yang HH

•papers•May 14 2025

Musculoskeletal ultrasound (MSUS) and magnetic resonance imaging (MRI) are advanced imaging techniques that are increasingly important in the diagnosis and management of rheumatoid arthritis (RA) and have significantly enhanced the rheumatologist's ability to assess RA disease activity and progression. This review serves as a five-year update to our previous publication on the contemporary role of imaging in RA, emphasizing the continued importance of MSUS and MRI in clinical practice and their expanding utility. The review examines the role of MSUS in diagnosing RA, differentiating RA from mimickers, scoring systems and quality control measures, novel longitudinal approaches to disease monitoring, and patient populations that may benefit most from MSUS. It also examines the role of MRI in diagnosing pre-clinical and early RA, disease activity monitoring, research and clinical trials, and development of alternative scoring approaches utilizing artificial intelligence. Finally, the role of MRI in RA diagnosis and management is summarized, and selected practice points offer key tips for integrating MSUS and MRI into clinical practice.

Mixed Modality Classification Musculoskeletal Review Concept Academic Lab

The Future of Urodynamics: Innovations, Challenges, and Possibilities.

Chew LE, Hannick JH, Woo LL, Weaver JK, Damaser MS

•papers•May 14 2025

Urodynamic studies (UDS) are essential for evaluating lower urinary tract function but are limited by patient discomfort, lack of standardization and diagnostic variability. Advances in technology aim to address these challenges and improve diagnostic accuracy and patient comfort. AUM offers physiological assessment by allowing natural bladder filling and monitoring during daily activities. Compared to conventional UDS, AUM demonstrates higher sensitivity for detecting detrusor overactivity and underlying pathophysiology. However, it faces challenges like motion artifacts, catheter-related discomfort, and difficulty measuring continuous bladder volume. Emerging devices such as Urodynamics Monitor and UroSound offer more patient-friendly alternatives. These tools have the potential to improve diagnostic accuracy for bladder pressure and voiding metrics but remain limited and still require further validation and testing. Ultrasound-based modalities, including dynamic ultrasonography and shear wave elastography, provide real-time, noninvasive assessment of bladder structure and function. These modalities are promising but will require further development of standardized protocols. AI and machine learning models enhance diagnostic accuracy and reduce variability in UDS interpretation. Applications include detecting detrusor overactivity and distinguishing bladder outlet obstruction from detrusor underactivity. However, further validation is required for clinical adoption. Advances in AUM, wearable technologies, ultrasonography, and AI demonstrate potential for transforming UDS into a more accurate, patient-centered tool. Despite significant progress, challenges like technical complexity, standardization, and cost-effectiveness must be addressed to integrate these innovations into routine practice. Nonetheless, these technologies provide the possibility of a future of improved diagnosis and treatment of lower urinary tract dysfunction.

Ultrasound Classification Abdominal Review Concept Academic Lab

Predicting response to anti-VEGF therapy in neovascular age-related macular degeneration using random forest and SHAP algorithms.

Zhang P, Duan J, Wang C, Li X, Su J, Shang Q

•papers•May 14 2025

This study aimed to establish and validate a prediction model based on machine learning methods and SHAP algorithm to predict response to anti-vascular endothelial growth factor (VEGF) therapy in neovascular age-related macular degeneration (AMD). In this retrospective study, we extracted data including demographic characteristics, laboratory test results, and imaging features from optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA). Eight machine learning methods, including Logistic Regression, Gradient Boosting Decision Tree, Random Forest, CatBoost, Support Vector Machine, XGboost, LightGBM, K Nearest Neighbors were employed to develop the predictive model. The machine learning method with optimal performance was selected for further interpretation. Finally, the SHAP algorithm was applied to explain the model's predictions. The study included 145 patients with neovascular AMD. Among the eight models developed, the Random Forest model demonstrated general optimal performance, achieving a high accuracy of 75.86% and the highest area under the receiver operating characteristic curve (AUC) value of 0.91. In this model, important features identified as significant contributors to the response to anti-VEGF therapy in neovascular AMD patients included fractal dimension, total number of end points, total number of junctions, total vessels length, vessels area, average lacunarity, choroidal neovascularization (CNV) type, age, duration and logMAR BCVA. SHAP analysis and visualization provided interpretation at both the factor level and individual level. The Random Forest model for predicting response to anti-VEGF therapy in neovascular AMD using SHAP algorithm proved to be feasible and effective. OCTA imaging features, such as fractal dimension, total number of end points et al, were the most effective predictive factors.

OCT Classification Retrospective Clinical In Silico Academic Lab

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Accuracy and Reliability of Multimodal Imaging in Diagnosing Knee Sports Injuries.

Measuring the severity of knee osteoarthritis with an aberration-free fast line scanning Raman imaging system.

Joint resting state and structural networks characterize pediatric bipolar patients compared to healthy controls: a multimodal fusion approach.

Single View Echocardiographic Analysis for Left Ventricular Outflow Tract Obstruction Prediction in Hypertrophic Cardiomyopathy: A Deep Learning Approach

Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping

Clinical utility of ultrasound and MRI in rheumatoid arthritis: An expert review.

The Future of Urodynamics: Innovations, Challenges, and Possibilities.

Predicting response to anti-VEGF therapy in neovascular age-related macular degeneration using random forest and SHAP algorithms.

Ready to Sharpen Your Edge?