Sort by:
Page 2 of 23225 results

Interobserver agreement between artificial intelligence models in the thyroid imaging and reporting data system (TIRADS) assessment of thyroid nodules.

Leoncini A, Trimboli P

pubmed logopapersMay 15 2025
As ultrasound (US) is the most accurate tool for assessing the thyroid nodule (TN) risk of malignancy (RoM), international societies have published various Thyroid Imaging and Reporting Data Systems (TIRADSs). With the recent advent of artificial intelligence (AI), clinicians and researchers should ask themselves how AI could interpret the terminology of the TIRADSs and whether or not AIs agree in the risk assessment of TNs. The study aim was to analyze the interobserver agreement (IOA) between AIs in assessing the RoM of TNs across various TIRADSs categories using a cases series created combining TIRADSs descriptors. ChatGPT, Google Gemini, and Claude were compared. ACR-TIRADS, EU-TIRADS, and K-TIRADS, were employed to evaluate the AI assessment. Multiple written scenarios for the three TIRADS were created, the cases were evaluated by the three AIs, and their assessments were analyzed and compared. The IOA was estimated by comparing the kappa (κ) values. Ninety scenarios were created. With ACR-TIRADS the IOA analysis gave κ = 0.58 between ChatGPT and Gemini, 0.53 between ChatGPT and Claude, and 0.90 between Gemini and Claude. With EU-TIRADS it was observed κ value = 0.73 between ChatGPT and Gemini, 0.62 between ChatGPT and Claude, and 0.72 between Gemini and Claude. With K-TIRADS it was found κ = 0.88 between ChatGPT and Gemini, 0.70 between ChatGPT and Claude, and 0.61 between Gemini and Claude. This study found that there were non-negligible variability between the three AIs. Clinicians and patients should be aware of these new findings.

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.

Xu J, Wang J, Li J, Zhu Z, Fu X, Cai W, Song R, Wang T, Li H

pubmed logopapersMay 15 2025
Hepatocellular carcinoma (HCC) is an aggressive cancer with limited biomarkers for predicting immunotherapy response. Recent advancements in large language models (LLMs) like GPT-4, GPT-4o, and Gemini offer the potential for enhancing clinical decision-making through multimodal data analysis. However, their effectiveness in predicting immunotherapy response, especially compared to human experts, remains unclear. This study assessed the performance of GPT-4, GPT-4o, and Gemini in predicting immunotherapy response in unresectable HCC, compared to radiologists and oncologists of varying expertise. A retrospective analysis of 186 patients with unresectable HCC utilized multimodal data (clinical and CT images). LLMs were evaluated with zero-shot prompting and two strategies: the 'voting method' and the 'OR rule method' for improved sensitivity. Performance metrics included accuracy, sensitivity, area under the curve (AUC), and agreement across LLMs and physicians.GPT-4o, using the 'OR rule method,' achieved 65% accuracy and 47% sensitivity, comparable to intermediate physicians but lower than senior physicians (accuracy: 72%, p = 0.045; sensitivity: 70%, p < 0.0001). Gemini-GPT, combining GPT-4, GPT-4o, and Gemini, achieved an AUC of 0.69, similar to senior physicians (AUC: 0.72, p = 0.35), with 68% accuracy, outperforming junior and intermediate physicians while remaining comparable to senior physicians (p = 0.78). However, its sensitivity (58%) was lower than senior physicians (p = 0.0097). LLMs demonstrated higher inter-model agreement (κ = 0.59-0.70) than inter-physician agreement, especially among junior physicians (κ = 0.15). This study highlights the potential of LLMs, particularly Gemini-GPT, as valuable tools in predicting immunotherapy response for HCC.

Comparison of lumbar disc degeneration grading between deep learning model SpineNet and radiologist: a longitudinal study with a 14-year follow-up.

Murto N, Lund T, Kautiainen H, Luoma K, Kerttula L

pubmed logopapersMay 15 2025
To assess the agreement between lumbar disc degeneration (DD) grading by the convolutional neural network model SpineNet and radiologist's visual grading. In a 14-year follow-up MRI study involving 19 male volunteers, lumbar DD was assessed by SpineNet and two radiologists using the Pfirrmann classification at baseline (age 37) and after 14 years (age 51). Pfirrmann summary scores (PSS) were calculated by summing individual disc grades. The agreement between the first radiologist and SpineNet was analyzed, with the second radiologist's grading used for inter-observer agreement. Significant differences were observed in the Pfirrmann grades and PSS assigned by the radiologist and SpineNet at both time points. SpineNet assigned Pfirrmann grade 1 to several discs and grade 5 to more discs compared to the radiologists. The concordance correlation coefficients (CCC) of PSS between the radiologist and SpineNet were 0.54 (95% CI: 0.28 to 0.79) at baseline and 0.54 (0.27 to 0.80) at follow-up. The average kappa (κ) values of 0.74 (0.68 to 0.81) at baseline and 0.68 (0.58 to 0.77) at follow-up. CCC of PSS between the radiologists was 0.83 (0.69 to 0.97) at baseline and 0.78 (0.61 to 0.95) at follow-up, with κ values ranging from 0.73 to 0.96. We found fair to substantial agreement in DD grading between SpineNet and the radiologist, albeit with notable discrepancies. These findings indicate that AI-based systems like SpineNet hold promise as complementary tools in radiological evaluation, including in longitudinal studies, but emphasize the need for ongoing refinement of AI algorithms.

[Orthodontics in the CBCT era: 25 years later, what are the guidelines?].

Foucart JM, Papelard N, Bourriau J

pubmed logopapersMay 15 2025
CBCT has become an essential tool in orthodontics, although its use must remain judicious and evidence-based. This study provides an updated analysis of international recommendations concerning the use of CBCT in orthodontics, with a particular focus on clinical indications, radiation dose reduction, and recent technological advancements. A systematic review of guidelines published between 2015 and 2025 was conducted following the PRISMA methodology. Inclusion criteria comprised official directives from recognized scientific societies and clinical studies evaluating low dose protocols in orthodontics. The analysis of the 19 retained recommendations reveals a consensus regarding the primary indications for CBCT in orthodontics, particularly for impacted teeth, skeletal anomalies, periodontal and upper airways assessment. Dose optimization and the integration of artificial intelligence emerge as major advancements, enabling significant radiation reduction while preserving diagnostic accuracy. The development of low dose protocols and advanced reconstruction algorithms presents promising perspectives for safer and more efficient imaging, increasingly replacing conventional 2D radiographic techniques. However, an international harmonization of recommendations for these new imaging sequences is imperative to standardize clinical practices and enhance patient radioprotection.

Recent advancements in personalized management of prostate cancer biochemical recurrence after radical prostatectomy.

Falkenbach F, Ekrutt J, Maurer T

pubmed logopapersMay 15 2025
Biochemical recurrence (BCR) after radical prostatectomy exhibits heterogeneous prognostic implications. Recent advancements in imaging and biomarkers have high potential for personalizing care. Prostate-specific membrane antigen imaging (PSMA)-PET/CT has revolutionized the BCR management in prostate cancer by detecting microscopic lesions earlier than conventional staging, leading to improved cancer control outcomes and changes in treatment plans in approximately two-thirds of cases. Salvage radiotherapy, often combined with androgen deprivation therapy, remains the standard treatment for high-risk BCR postprostatectomy, with PSMA-PET/CT guiding treatment adjustments, such as the radiation field, and improving progression-free survival. Advancements in biomarkers, genomic classifiers, and artificial intelligence-based models have enhanced risk stratification and personalized treatment planning, resulting in both treatment intensification and de-escalation. While conventional risk grouping relying on Gleason score and PSA level and kinetics remain the foundation for BCR management, PSMA-PET/CT, novel biomarkers, and artificial intelligence may enable more personalized treatment strategies.

Automated high precision PCOS detection through a segment anything model on super resolution ultrasound ovary images.

Reka S, Praba TS, Prasanna M, Reddy VNN, Amirtharajan R

pubmed logopapersMay 15 2025
PCOS (Poly-Cystic Ovary Syndrome) is a multifaceted disorder that often affects the ovarian morphology of women of their reproductive age, resulting in the development of numerous cysts on the ovaries. Ultrasound imaging typically diagnoses PCOS, which helps clinicians assess the size, shape, and existence of cysts in the ovaries. Nevertheless, manual ultrasound image analysis is often challenging and time-consuming, resulting in inter-observer variability. To effectively treat PCOS and prevent its long-term effects, prompt and accurate diagnosis is crucial. In such cases, a prediction model based on deep learning can help physicians by streamlining the diagnosis procedure, reducing time and potential errors. This article proposes a novel integrated approach, QEI-SAM (Quality Enhanced Image - Segment Anything Model), for enhancing image quality and ovarian cyst segmentation for accurate prediction. GAN (Generative Adversarial Networks) and CNN (Convolutional Neural Networks) are the most recent cutting-edge innovations that have supported the system in attaining the expected result. The proposed QEI-SAM model used Enhanced Super Resolution Generative Adversarial Networks (ESRGAN) for image enhancement to increase the resolution, sharpening the edges and restoring the finer structure of the ultrasound ovary images and achieved a better SSIM of 0.938, PSNR value of 38.60 and LPIPS value of 0.0859. Then, it incorporates the Segment Anything Model (SAM) to segment ovarian cysts and achieve the highest Dice coefficient of 0.9501 and IoU score of 0.9050. Furthermore, Convolutional Neural Network - ResNet 50, ResNet 101, VGG 16, VGG 19, AlexNet and Inception v3 have been implemented to diagnose PCOS promptly. Finally, VGG 19 has achieved the highest accuracy of 99.31%.

MRI-derived deep learning models for predicting 1p/19q codeletion status in glioma patients: a systematic review and meta-analysis of diagnostic test accuracy studies.

Ahmadzadeh AM, Broomand Lomer N, Ashoobi MA, Elyassirad D, Gheiji B, Vatanparast M, Rostami A, Abouei Mehrizi MA, Tabari A, Bathla G, Faghani S

pubmed logopapersMay 15 2025
We conducted a systematic review and meta-analysis to evaluate the performance of magnetic resonance imaging (MRI)-derived deep learning (DL) models in predicting 1p/19q codeletion status in glioma patients. The literature search was performed in four databases: PubMed, Web of Science, Embase, and Scopus. We included the studies that evaluated the performance of end-to-end DL models in predicting the status of glioma 1p/19q codeletion. The quality of the included studies was assessed by the Quality assessment of diagnostic accuracy studies-2 (QUADAS-2) METhodological RadiomICs Score (METRICS). We calculated diagnostic pooled estimates and heterogeneity was evaluated using I<sup>2</sup>. Subgroup analysis and sensitivity analysis were conducted to explore sources of heterogeneity. Publication bias was evaluated by Deeks' funnel plots. Twenty studies were included in the systematic review. Only two studies had a low quality. A meta-analysis of the ten studies demonstrated a pooled sensitivity of 0.77 (95% CI: 0.63-0.87), a specificity of 0.85 (95% CI: 0.74-0.92), a positive diagnostic likelihood ratio (DLR) of 5.34 (95% CI: 2.88-9.89), a negative DLR of 0.26 (95% CI: 0.16-0.45), a diagnostic odds ratio of 20.24 (95% CI: 8.19-50.02), and an area under the curve of 0.89 (95% CI: 0.86-0.91). The subgroup analysis identified a significant difference between groups depending on the segmentation method used. DL models can predict glioma 1p/19q codeletion status with high accuracy and may enhance non-invasive tumor characterization and aid in the selection of optimal therapeutic strategies.

A computed tomography-based radiomics prediction model for BRAF mutation status in colorectal cancer.

Zhou B, Tan H, Wang Y, Huang B, Wang Z, Zhang S, Zhu X, Wang Z, Zhou J, Cao Y

pubmed logopapersMay 15 2025
The aim of this study was to develop and validate CT venous phase image-based radiomics to predict BRAF gene mutation status in preoperative colorectal cancer patients. In this study, 301 patients with pathologically confirmed colorectal cancer were retrospectively enrolled, comprising 225 from Centre I (73 mutant and 152 wild-type) and 76 from Centre II (36 mutant and 40 wild-type). The Centre I cohort was randomly divided into a training set (n = 158) and an internal validation set (n = 67) in a 7:3 ratio, while Centre II served as an independent external validation set (n = 76). The whole tumor region of interest was segmented, and radiomics characteristics were extracted. To explore whether tumor expansion could improve the performance of the study objectives, the tumor contour was extended by 3 mm in this study. Finally, a t-test, Pearson correlation, and LASSO regression were used to screen out features strongly associated with BRAF mutations. Based on these features, six classifiers-Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Extreme Gradient Boosting (XGBoost)-were constructed. The model performance and clinical utility were evaluated using receiver operating characteristic (ROC) curves, decision curve analysis, accuracy, sensitivity, and specificity. Gender was an independent predictor of BRAF mutations. The unexpanded RF model, constructed using 11 imaging histologic features, demonstrated the best predictive performance. For the training cohort, it achieved an AUC of 0.814 (95% CI 0.732-0.895), an accuracy of 0.810, and a sensitivity of 0.620. For the internal validation cohort, it achieved an AUC of 0.798 (95% CI 0.690-0.907), an accuracy of 0.761, and a sensitivity of 0.609. For the external validation cohort, it achieved an AUC of 0.737 (95% CI 0.616-0.847), an accuracy of 0.658, and a sensitivity of 0.667. A machine learning model based on CT radiomics can effectively predict BRAF mutations in patients with colorectal cancer. The unexpanded RF model demonstrated optimal predictive performance.

Deep learning MRI-based radiomic models for predicting recurrence in locally advanced nasopharyngeal carcinoma after neoadjuvant chemoradiotherapy: a multi-center study.

Hu C, Xu C, Chen J, Huang Y, Meng Q, Lin Z, Huang X, Chen L

pubmed logopapersMay 15 2025
Local recurrence and distant metastasis were a common manifestation of locoregionally advanced nasopharyngeal carcinoma (LA-NPC) after neoadjuvant chemoradiotherapy (NACT). To validate the clinical value of MRI radiomic models based on deep learning for predicting the recurrence of LA-NPC patients. A total of 328 NPC patients from four hospitals were retrospectively included and divided into the training(n = 229) and validation (n = 99) cohorts randomly. Extracting 975 traditional radiomic features and 1000 deep radiomic features from contrast enhanced T1-weighted (T1WI + C) and T2-weighted (T2WI) sequences, respectively. Least absolute shrinkage and selection operator (LASSO) was applied for feature selection. Five machine learning classifiers were conducted to develop three models for LA-NPC prediction in training cohort, namely Model I: traditional radiomic features, Model II: combined the deep radiomic features with Model I, and Model III: combined Model II with clinical features. The predictive performance of these models were evaluated by receive operating characteristic (ROC) curve analysis, area under the curve (AUC), accuracy, sensitivity and specificity in both cohorts. The clinical characteristics in two cohorts showed no significant differences. Choosing 15 radiomic features and 6 deep radiomic features from T1WI + C. Choosing 9 radiomic features and 6 deep radiomic features from T2WI. In T2WI, the Model II based on Random forest (RF) (AUC = 0.87) performed best compared with other models in validation cohort. Traditional radiomic model combined with deep radiomic features shows excellent predictive performance. It could be used assist clinical doctors to predict curative effect for LA-NPC patients after NACT.

On the Interplay of Human-AI Alignment,Fairness, and Performance Trade-offs in Medical Imaging

Haozhe Luo, Ziyu Zhou, Zixin Shu, Aurélie Pahud de Mortanges, Robert Berke, Mauricio Reyes

arxiv logopreprintMay 15 2025
Deep neural networks excel in medical imaging but remain prone to biases, leading to fairness gaps across demographic groups. We provide the first systematic exploration of Human-AI alignment and fairness in this domain. Our results show that incorporating human insights consistently reduces fairness gaps and enhances out-of-domain generalization, though excessive alignment can introduce performance trade-offs, emphasizing the need for calibrated strategies. These findings highlight Human-AI alignment as a promising approach for developing fair, robust, and generalizable medical AI systems, striking a balance between expert guidance and automated efficiency. Our code is available at https://github.com/Roypic/Aligner.
Page 2 of 23225 results
Show
per page
Get Started

Upload your X-ray image and get interpretation.

Upload now →

Disclaimer: X-ray Interpreter's AI-generated results are for informational purposes only and not a substitute for professional medical advice. Always consult a healthcare professional for medical diagnosis and treatment.