Sort by:
Page 6 of 879 results

Leveraging GPT-4 enables patient comprehension of radiology reports.

van Driel MHE, Blok N, van den Brand JAJG, van de Sande D, de Vries M, Eijlers B, Smits F, Visser JJ, Gommers D, Verhoef C, van Genderen ME, Grünhagen DJ, Hilling DE

pubmed logopapersJun 1 2025
To assess the feasibility of using GPT-4 to simplify radiology reports into B1-level Dutch for enhanced patient comprehension. This study utilised GPT-4, optimised through prompt engineering in Microsoft Azure. The researchers iteratively refined prompts to ensure accurate and comprehensive translations of radiology reports. Two radiologists assessed the simplified outputs for accuracy, completeness, and patient suitability. A third radiologist independently validated the final versions. Twelve colorectal cancer patients were recruited from two hospitals in the Netherlands. Semi-structured interviews were conducted to evaluate patients' comprehension and satisfaction with AI-generated reports. The optimised GPT-4 tool produced simplified reports with high accuracy (mean score 3.33/4). Patient comprehension improved significantly from 2.00 (original reports) to 3.28 (simplified reports) and 3.50 (summaries). Correct classification of report outcomes increased from 63.9% to 83.3%. Patient satisfaction was high (mean 8.30/10), with most preferring the long simplified report. RADiANT successfully enhances patient understanding and satisfaction through automated AI-driven report simplification, offering a scalable solution for patient-centred communication in clinical practice. This tool reduces clinician workload and supports informed patient decision-making, demonstrating the potential of LLMs beyond English-based healthcare contexts.

PRECISE framework: Enhanced radiology reporting with GPT for improved readability, reliability, and patient-centered care.

Tripathi S, Mutter L, Muppuri M, Dheer S, Garza-Frias E, Awan K, Jha A, Dezube M, Tabari A, Bizzo BC, Dreyer KJ, Bridge CP, Daye D

pubmed logopapersJun 1 2025
The PRECISE framework, defined as Patient-Focused Radiology Reports with Enhanced Clarity and Informative Summaries for Effective Communication, leverages GPT-4 to create patient-friendly summaries of radiology reports at a sixth-grade reading level. The purpose of the study was to evaluate the effectiveness of the PRECISE framework in improving the readability, reliability, and understandability of radiology reports. We hypothesized that the PRECISE framework improves the readability and patient understanding of radiology reports compared to the original versions. The PRECISE framework was assessed using 500 chest X-ray reports. Readability was evaluated using the Flesch Reading Ease, Gunning Fog Index, and Automated Readability Index. Reliability was gauged by clinical volunteers, while understandability was assessed by non-medical volunteers. Statistical analyses including t-tests, regression analyses, and Mann-Whitney U tests were conducted to determine the significance of the differences in readability scores between the original and PRECISE-generated reports. Readability scores significantly improved, with the mean Flesch Reading Ease score increasing from 38.28 to 80.82 (p-value < 0.001), the Gunning Fog Index decreasing from 13.04 to 6.99 (p-value < 0.001), and the ARI score improving from 13.33 to 5.86 (p-value < 0.001). Clinical volunteer assessments found 95 % of the summaries reliable, and non-medical volunteers rated 97 % of the PRECISE-generated summaries as fully understandable. The application of the PRECISE approach demonstrates promise in enhancing patient understanding and communication without adding significant burden to radiologists. With improved reliability and patient-friendly summaries, this approach holds promise for fostering patient engagement and understanding in healthcare decision-making. The PRECISE framework represents a pivotal step towards more inclusive and patient-centric care delivery.

Evaluating artificial intelligence chatbots for patient education in oral and maxillofacial radiology.

Helvacioglu-Yigit D, Demirturk H, Ali K, Tamimi D, Koenig L, Almashraqi A

pubmed logopapersJun 1 2025
This study aimed to compare the quality and readability of the responses generated by 3 publicly available artificial intelligence (AI) chatbots in answering frequently asked questions (FAQs) related to Oral and Maxillofacial Radiology (OMR) to assess their suitability for patient education. Fifteen OMR-related questions were selected from professional patient information websites. These questions were posed to ChatGPT-3.5 by OpenAI, Gemini 1.5 Pro by Google, and Copilot by Microsoft to generate responses. Three board-certified OMR specialists evaluated the responses regarding scientific adequacy, ease of understanding, and overall reader satisfaction. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores. The Wilcoxon signed-rank test was conducted to compare the scores assigned by the evaluators to the responses from the chatbots and professional websites. Interevaluator agreement was examined by calculating the Fleiss kappa coefficient. There were no significant differences between groups in terms of scientific adequacy. In terms of readability, chatbots had overall mean FKGL and FRE scores of 12.97 and 34.11, respectively. Interevaluator agreement level was generally high. Although chatbots are relatively good at responding to FAQs, validating AI-generated information using input from healthcare professionals can enhance patient care and safety. Readability of the text content in the chatbots and websites requires high reading levels.

Influence of prior probability information on large language model performance in radiological diagnosis.

Fukushima T, Kurokawa R, Hagiwara A, Sonoda Y, Asari Y, Kurokawa M, Kanzawa J, Gonoi W, Abe O

pubmed logopapersJun 1 2025
Large language models (LLMs) show promise in radiological diagnosis, but their performance may be affected by the context of the cases presented. Our purpose is to investigate how providing information about prior probabilities influences the diagnostic performance of an LLM in radiological quiz cases. We analyzed 322 consecutive cases from Radiology's "Diagnosis Please" quiz using Claude 3.5 Sonnet under three conditions: without context (Condition 1), informed as quiz cases (Condition 2), and presented as primary care cases (Condition 3). Diagnostic accuracy was compared using McNemar's test. The overall accuracy rate significantly improved in Condition 2 compared to Condition 1 (70.2% vs. 64.9%, p = 0.029). Conversely, the accuracy rate significantly decreased in Condition 3 compared to Condition 1 (59.9% vs. 64.9%, p = 0.027). Providing information that may influence prior probabilities significantly affects the diagnostic performance of the LLM in radiological cases. This suggests that LLMs may incorporate Bayesian-like principles and adjust the weighting of their diagnostic responses based on prior information, highlighting the potential for optimizing LLM's performance in clinical settings by providing relevant contextual information.

Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study.

Choi H, Lee D, Kang YK, Suh M

pubmed logopapersJun 1 2025
The potential of Large Language Models (LLMs) in enhancing a variety of natural language tasks in clinical fields includes medical imaging reporting. This pilot study examines the efficacy of a retrieval-augmented generation (RAG) LLM system considering zero-shot learning capability of LLMs, integrated with a comprehensive database of PET reading reports, in improving reference to prior reports and decision making. We developed a custom LLM framework with retrieval capabilities, leveraging a database of over 10 years of PET imaging reports from a single center. The system uses vector space embedding to facilitate similarity-based retrieval. Queries prompt the system to generate context-based answers and identify similar cases or differential diagnoses. From routine clinical PET readings, experienced nuclear medicine physicians evaluated the performance of system in terms of the relevance of queried similar cases and the appropriateness score of suggested potential diagnoses. The system efficiently organized embedded vectors from PET reports, showing that imaging reports were accurately clustered within the embedded vector space according to the diagnosis or PET study type. Based on this system, a proof-of-concept chatbot was developed and showed the framework's potential in referencing reports of previous similar cases and identifying exemplary cases for various purposes. From routine clinical PET readings, 84.1% of the cases retrieved relevant similar cases, as agreed upon by all three readers. Using the RAG system, the appropriateness score of the suggested potential diagnoses was significantly better than that of the LLM without RAG. Additionally, it demonstrated the capability to offer differential diagnoses, leveraging the vast database to enhance the completeness and precision of generated reports. The integration of RAG LLM with a large database of PET imaging reports suggests the potential to support clinical practice of nuclear medicine imaging reading by various tasks of AI including finding similar cases and deriving potential diagnoses from them. This study underscores the potential of advanced AI tools in transforming medical imaging reporting practices.

Multi-modal large language models in radiology: principles, applications, and potential.

Shen Y, Xu Y, Ma J, Rui W, Zhao C, Heacock L, Huang C

pubmed logopapersJun 1 2025
Large language models (LLMs) and multi-modal large language models (MLLMs) represent the cutting-edge in artificial intelligence. This review provides a comprehensive overview of their capabilities and potential impact on radiology. Unlike most existing literature reviews focusing solely on LLMs, this work examines both LLMs and MLLMs, highlighting their potential to support radiology workflows such as report generation, image interpretation, EHR summarization, differential diagnosis generation, and patient education. By streamlining these tasks, LLMs and MLLMs could reduce radiologist workload, improve diagnostic accuracy, support interdisciplinary collaboration, and ultimately enhance patient care. We also discuss key limitations, such as the limited capacity of current MLLMs to interpret 3D medical images and to integrate information from both image and text data, as well as the lack of effective evaluation methods. Ongoing efforts to address these challenges are introduced.

Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human Performance.

Liu HY, Chen SJ, Wang W, Lee CH, Hsu HH, Shen SH, Chiou HJ, Lee WJ

pubmed logopapersMay 27 2025
The radiology specialty examination assesses clinical decision-making, image interpretation, and diagnostic reasoning. With the expansion of medical knowledge, traditional test design faces challenges in maintaining accuracy and relevance. Large language models (LLMs) demonstrate potential in medical education. This study evaluates LLM performance in radiology specialty exams, explores their role in assessing question difficulty, and investigates their reasoning processes, aiming to develop a more objective and efficient framework for exam design. This study compared the performance of LLMs and human examinees in a radiology specialty examination. Three LLMs (GPT-4o, o1-preview, and GPT-3.5-turbo-1106) were evaluated under zero-shot conditions. Exam accuracy, examinee accuracy, discrimination index, and point-biserial correlation were used to assess LLMs' ability to predict question difficulty and reasoning processes. The data provided by the Taiwan Radiological Society ensures comparability between AI and human performance. As for accuracy, GPT-4o (88.0%) and o1-preview (90.9%) outperformed human examinees (76.3%), whereas GPT-3.5-turbo-1106 showed significantly lower accuracy (50.2%). Question difficulty analysis revealed that newer LLMs excel in solving complex questions, while GPT-3.5-turbo-1106 exhibited greater performance variability. Discrimination index and point-biserial Correlation analyses demonstrated that GPT-4o and o1-preview accurately identified key differentiating questions, closely mirroring human reasoning patterns. These findings suggest that advanced LLMs can assess medical examination difficulty, offering potential applications in exam standardization and question evaluation. This study evaluated the problem-solving capabilities of GPT-3.5-turbo-1106, GPT-4o, and o1-preview in a radiology specialty examination. LLMs should be utilized as tools for assessing exam question difficulty and assisting in the standardized development of medical examinations.
Page 6 of 879 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.