Sort by:
Page 44 of 58575 results

Performance of GPT-4 Turbo and GPT-4o in Korean Society of Radiology In-Training Examinations.

Choi A, Kim HG, Choi MH, Ramasamy SK, Kim Y, Jung SE

pubmed logopapersJun 1 2025
Despite the potential of large language models for radiology training, their ability to handle image-based radiological questions remains poorly understood. This study aimed to evaluate the performance of the GPT-4 Turbo and GPT-4o in radiology resident examinations, to analyze differences across question types, and to compare their results with those of residents at different levels. A total of 776 multiple-choice questions from the Korean Society of Radiology In-Training Examinations were used, forming two question sets: one originally written in Korean and the other translated into English. We evaluated the performance of GPT-4 Turbo (gpt-4-turbo-2024-04-09) and GPT-4o (gpt-4o-2024-11-20) on these questions with the temperature set to zero, determining the accuracy based on the majority vote from five independent trials. We analyzed their results using the question type (text-only vs. image-based) and benchmarked them against nationwide radiology residents' performance. The impact of the input language (Korean or English) on model performance was examined. GPT-4o outperformed GPT-4 Turbo for both image-based (48.2% vs. 41.8%, <i>P</i> = 0.002) and text-only questions (77.9% vs. 69.0%, <i>P</i> = 0.031). On image-based questions, GPT-4 Turbo and GPT-4o showed comparable performance to that of 1st-year residents (41.8% and 48.2%, respectively, vs. 43.3%, <i>P</i> = 0.608 and 0.079, respectively) but lower performance than that of 2nd- to 4th-year residents (vs. 56.0%-63.9%, all <i>P</i> ≤ 0.005). For text-only questions, GPT-4 Turbo and GPT-4o performed better than residents across all years (69.0% and 77.9%, respectively, vs. 44.7%-57.5%, all <i>P</i> ≤ 0.039). Performance on the English- and Korean-version questions showed no significant differences for either model (all <i>P</i> ≥ 0.275). GPT-4o outperformed the GPT-4 Turbo in all question types. On image-based questions, both models' performance matched that of 1st-year residents but was lower than that of higher-year residents. Both models demonstrated superior performance compared to residents for text-only questions. The models showed consistent performances across English and Korean inputs.

Leveraging GPT-4 enables patient comprehension of radiology reports.

van Driel MHE, Blok N, van den Brand JAJG, van de Sande D, de Vries M, Eijlers B, Smits F, Visser JJ, Gommers D, Verhoef C, van Genderen ME, Grünhagen DJ, Hilling DE

pubmed logopapersJun 1 2025
To assess the feasibility of using GPT-4 to simplify radiology reports into B1-level Dutch for enhanced patient comprehension. This study utilised GPT-4, optimised through prompt engineering in Microsoft Azure. The researchers iteratively refined prompts to ensure accurate and comprehensive translations of radiology reports. Two radiologists assessed the simplified outputs for accuracy, completeness, and patient suitability. A third radiologist independently validated the final versions. Twelve colorectal cancer patients were recruited from two hospitals in the Netherlands. Semi-structured interviews were conducted to evaluate patients' comprehension and satisfaction with AI-generated reports. The optimised GPT-4 tool produced simplified reports with high accuracy (mean score 3.33/4). Patient comprehension improved significantly from 2.00 (original reports) to 3.28 (simplified reports) and 3.50 (summaries). Correct classification of report outcomes increased from 63.9% to 83.3%. Patient satisfaction was high (mean 8.30/10), with most preferring the long simplified report. RADiANT successfully enhances patient understanding and satisfaction through automated AI-driven report simplification, offering a scalable solution for patient-centred communication in clinical practice. This tool reduces clinician workload and supports informed patient decision-making, demonstrating the potential of LLMs beyond English-based healthcare contexts.

Radiomics across modalities: a comprehensive review of neurodegenerative diseases.

Inglese M, Conti A, Toschi N

pubmed logopapersJun 1 2025
Radiomics allows extraction from medical images of quantitative features that are able to reveal tissue patterns that are generally invisible to human observers. Despite the challenges in visually interpreting radiomic features and the computational resources required to generate them, they hold significant value in downstream automated processing. For instance, in statistical or machine learning frameworks, radiomic features enhance sensitivity and specificity, making them indispensable for tasks such as diagnosis, prognosis, prediction, monitoring, image-guided interventions, and evaluating therapeutic responses. This review explores the application of radiomics in neurodegenerative diseases, with a focus on Alzheimer's disease, Parkinson's disease, Huntington's disease, and multiple sclerosis. While radiomics literature often focuses on magnetic resonance imaging (MRI) and computed tomography (CT), this review also covers its broader application in nuclear medicine, with use cases of positron emission tomography (PET) and single-photon emission computed tomography (SPECT) radiomics. Additionally, we review integrated radiomics, where features from multiple imaging modalities are fused to improve model performance. This review also highlights the growing integration of radiomics with artificial intelligence and the need for feature standardisation and reproducibility to facilitate its translation into clinical practice.

An explainable adaptive channel weighting-based deep convolutional neural network for classifying renal disorders in computed tomography images.

Loganathan G, Palanivelan M

pubmed logopapersJun 1 2025
Renal disorders are a significant public health concern and a cause of mortality related to renal failure. Manual diagnosis is subjective, labor-intensive, and depends on the expertise of nephrologists in renal anatomy. To improve workflow efficiency and enhance diagnosis accuracy, we propose an automated deep learning model, called EACWNet, which incorporates adaptive channel weighting-based deep convolutional neural network and explainable artificial intelligence. The proposed model categorizes renal computed tomography images into various classes, such as cyst, normal, tumor, and stone. The adaptive channel weighting module utilizes both global and local contextual insights to refine the final feature map channel weights through the integration of a scale-adaptive channel attention module in the higher convolutional blocks of the VGG-19 backbone model employed in the proposed method. The efficacy of the EACWNet model has been assessed using a publicly available renal CT images dataset, attaining an accuracy of 98.87% and demonstrating a 1.75% improvement over the backbone model. However, this model exhibits class-wise precision variation, achieving higher precision for cyst, normal, and tumor cases but lower precision for the stone class due to its inherent variability and heterogeneity. Furthermore, the model predictions have been subjected to additional analysis using the explainable artificial intelligence method such as local interpretable model-agnostic explanations, to visualize better and understand the model predictions.

Tailoring ventilation and respiratory management in pediatric critical care: optimizing care with precision medicine.

Beauchamp FO, Thériault J, Sauthier M

pubmed logopapersJun 1 2025
Critically ill children admitted to the intensive care unit frequently need respiratory care to support the lung function. Mechanical ventilation is a complex field with multiples parameters to set. The development of precision medicine will allow clinicians to personalize respiratory care and improve patients' outcomes. Lung and diaphragmatic ultrasound, electrical impedance tomography, neurally adjusted ventilatory assist ventilation, as well as the use of monitoring data in machine learning models are increasingly used to tailor care. Each modality offers insights into different aspects of the patient's respiratory system function and enables the adjustment of treatment to better support the patient's physiology. Precision medicine in respiratory care has been associated with decreased ventilation time, increased extubation and ventilation wean success and increased ability to identify phenotypes to guide treatment and predict outcomes. This review will focus on the use of precision medicine in the setting of pediatric acute respiratory distress syndrome, asthma, bronchiolitis, extubation readiness trials and ventilation weaning, ventilation acquired pneumonia and other respiratory tract infections. Precision medicine is revolutionizing respiratory care and will decrease complications associated with ventilation. More research is needed to standardize its use and better evaluate its impact on patient outcomes.

Broadening the Net: Overcoming Challenges and Embracing Novel Technologies in Lung Cancer Screening.

Czerlanis CM, Singh N, Fintelmann FJ, Damaraju V, Chang AEB, White M, Hanna N

pubmed logopapersJun 1 2025
Lung cancer is one of the leading causes of cancer-related mortality worldwide, with most cases diagnosed at advanced stages where curative treatment options are limited. Low-dose computed tomography (LDCT) for lung cancer screening (LCS) of individuals selected based on age and smoking history has shown a significant reduction in lung cancer-specific mortality. The number needed to screen to prevent one death from lung cancer is lower than that for breast cancer, cervical cancer, and colorectal cancer. Despite the substantial impact on reducing lung cancer-related mortality and proof that LCS with LDCT is effective, uptake of LCS has been low and LCS eligibility criteria remain imperfect. While LCS programs have historically faced patient recruitment challenges, research suggests that there are novel opportunities to both identify and improve screening for at-risk populations. In this review, we discuss the global obstacles to implementing LCS programs and strategies to overcome barriers in resource-limited settings. We explore successful approaches to promote LCS through robust engagement with community partners. Finally, we examine opportunities to enhance LCS in at-risk populations not captured by current eligibility criteria, including never smokers and individuals with a family history of lung cancer, with a focus on early detection through novel artificial intelligence technologies.

Artificial intelligence in pediatric osteopenia diagnosis: evaluating deep network classification and model interpretability using wrist X-rays.

Harris CE, Liu L, Almeida L, Kassick C, Makrogiannis S

pubmed logopapersJun 1 2025
Osteopenia is a bone disorder that causes low bone density and affects millions of people worldwide. Diagnosis of this condition is commonly achieved through clinical assessment of bone mineral density (BMD). State of the art machine learning (ML) techniques, such as convolutional neural networks (CNNs) and transformer models, have gained increasing popularity in medicine. In this work, we employ six deep networks for osteopenia vs. healthy bone classification using X-ray imaging from the pediatric wrist dataset GRAZPEDWRI-DX. We apply two explainable AI techniques to analyze and interpret visual explanations for network decisions. Experimental results show that deep networks are able to effectively learn osteopenic and healthy bone features, achieving high classification accuracy rates. Among the six evaluated networks, DenseNet201 with transfer learning yielded the top classification accuracy at 95.2 %. Furthermore, visual explanations of CNN decisions provide valuable insight into the blackbox inner workings and present interpretable results. Our evaluation of deep network classification results highlights their capability to accurately differentiate between osteopenic and healthy bones in pediatric wrist X-rays. The combination of high classification accuracy and interpretable visual explanations underscores the promise of incorporating machine learning techniques into clinical workflows for the early and accurate diagnosis of osteopenia.

Toward Noninvasive High-Resolution In Vivo pH Mapping in Brain Tumors by <sup>31</sup>P-Informed deepCEST MRI.

Schüre JR, Rajput J, Shrestha M, Deichmann R, Hattingen E, Maier A, Nagel AM, Dörfler A, Steidl E, Zaiss M

pubmed logopapersJun 1 2025
The intracellular pH (pH<sub>i</sub>) is critical for understanding various pathologies, including brain tumors. While conventional pH<sub>i</sub> measurement through <sup>31</sup>P-MRS suffers from low spatial resolution and long scan times, <sup>1</sup>H-based APT-CEST imaging offers higher resolution with shorter scan times. This study aims to directly predict <sup>31</sup>P-pH<sub>i</sub> maps from CEST data by using a fully connected neuronal network. Fifteen tumor patients were scanned on a 3-T Siemens PRISMA scanner and received <sup>1</sup>H-based CEST and T1 measurement, as well as <sup>31</sup>P-MRS. A neural network was trained voxel-wise on CEST and T1 data to predict <sup>31</sup>P-pH<sub>i</sub> values, using data from 11 patients for training and 4 for testing. The predicted pH<sub>i</sub> maps were additionally down-sampled to the original the <sup>31</sup>P-pH<sub>i</sub> resolution, to be able to calculate the RMSE and analyze the correlation, while higher resolved predictions were compared with conventional CEST metrics. The results demonstrated a general correspondence between the predicted deepCEST pH<sub>i</sub> maps and the measured <sup>31</sup>P-pH<sub>i</sub> in test patients. However, slight discrepancies were also observed, with a RMSE of 0.04 pH units in tumor regions. High-resolution predictions revealed tumor heterogeneity and features not visible in conventional CEST data, suggesting the model captures unique pH information and is not simply a T1 segmentation. The deepCEST pH<sub>i</sub> neural network enables the APT-CEST hidden pH-sensitivity and offers pH<sub>i</sub> maps with higher spatial resolution in shorter scan time compared with <sup>31</sup>P-MRS. Although this approach is constrained by the limitations of the acquired data, it can be extended with additional CEST features for future studies, thereby offering a promising approach for 3D pH imaging in a clinical environment.

Large Language Models for Diagnosing Focal Liver Lesions From CT/MRI Reports: A Comparative Study With Radiologists.

Sheng L, Chen Y, Wei H, Che F, Wu Y, Qin Q, Yang C, Wang Y, Peng J, Bashir MR, Ronot M, Song B, Jiang H

pubmed logopapersJun 1 2025
Whether large language models (LLMs) could be integrated into the diagnostic workflow of focal liver lesions (FLLs) remains unclear. We aimed to investigate two generic LLMs (ChatGPT-4o and Gemini) regarding their diagnostic accuracies referring to the CT/MRI reports, compared to and combined with radiologists of different experience levels. From April 2022 to April 2024, this single-center retrospective study included consecutive adult patients who underwent contrast-enhanced CT/MRI for single FLL and subsequent histopathologic examination. The LLMs were prompted by clinical information and the "findings" section of radiology reports three times to provide differential diagnoses in the descending order of likelihood, with the first considered the final diagnosis. In the research setting, six radiologists (three junior and three middle-level) independently reviewed the CT/MRI images and clinical information in two rounds (first alone, then with LLM assistance). In the clinical setting, diagnoses were retrieved from the "impressions" section of radiology reports. Diagnostic accuracy was investigated against histopathology. 228 patients (median age, 59 years; 155 males) with 228 FLLs (median size, 3.6 cm) were included. Regarding the final diagnosis, the accuracy of two-step ChatGPT-4o (78.9%) was higher than single-step ChatGPT-4o (68.0%, p < 0.001) and single-step Gemini (73.2%, p = 0.004), similar to real-world radiology reports (80.0%, p = 0.34) and junior radiologists (78.9%-82.0%; p-values, 0.21 to > 0.99), but lower than middle-level radiologists (84.6%-85.5%; p-values, 0.001 to 0.02). No incremental diagnostic value of ChatGPT-4o was observed for any radiologist (p-values, 0.63 to > 0.99). Two-step ChatGPT-4o showed matching accuracies to real-world radiology reports and junior radiologists for diagnosing FLLs but was less accurate than middle-level radiologists and demonstrated little incremental diagnostic value.

Evaluation of large language models in generating pulmonary nodule follow-up recommendations.

Wen J, Huang W, Yan H, Sun J, Dong M, Li C, Qin J

pubmed logopapersJun 1 2025
To evaluate the performance of large language models (LLMs) in generating clinically follow-up recommendations for pulmonary nodules by leveraging radiological report findings and management guidelines. This retrospective study included CT follow-up reports of pulmonary nodules documented by senior radiologists from September 1st, 2023, to April 30th, 2024. Sixty reports were collected for prompting engineering additionally, based on few-shot learning and the Chain of Thought methodology. Radiological findings of pulmonary nodules, along with finally prompt, were input into GPT-4o-mini or ERNIE-4.0-Turbo-8K to generate follow-up recommendations. The AI-generated recommendations were evaluated against radiologist-defined guideline-based standards through binary classification, assessing nodule risk classifications, follow-up intervals, and harmfulness. Performance metrics included sensitivity, specificity, positive/negative predictive values, and F1 score. On 1009 reports from 996 patients (median age, 50.0 years, IQR, 39.0-60.0 years; 511 male patients), ERNIE-4.0-Turbo-8K and GPT-4o-mini demonstrated comparable performance in both accuracy of follow-up recommendations (94.6 % vs 92.8 %, P = 0.07) and harmfulness rates (2.9 % vs 3.5 %, P = 0.48). In nodules classification, ERNIE-4.0-Turbo-8K and GPT-4o-mini performed similarly with accuracy rates of 99.8 % vs 99.9 % sensitivity of 96.9 % vs 100.0 %, specificity of 99.9 % vs 99.9 %, positive predictive value of 96.9 % vs 96.9 %, negative predictive value of 100.0 % vs 99.9 %, f1-score of 96.9 % vs 98.4 %, respectively. LLMs show promise in providing guideline-based follow-up recommendations for pulmonary nodules, but require rigorous validation and supervision to mitigate potential clinical risks. This study offers insights into their potential role in automated radiological decision support.
Page 44 of 58575 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.