Sort by:
Page 22 of 78779 results

Exploring Radiologists' Use of AI Chatbots for Assistance in Image Interpretation: Patterns of Use and Trust Evaluation.

Alarifi M

pubmed logopapersAug 13 2025
This study investigated radiologists' perceptions of AI-generated, patient-friendly radiology reports across three modalities: MRI, CT, and mammogram/ultrasound. The evaluation focused on report correctness, completeness, terminology complexity, and emotional impact. Seventy-nine radiologists from four major Saudi Arabian hospitals assessed AI-simplified versions of clinical radiology reports. Each participant reviewed one report from each modality and completed a structured questionnaire covering factual correctness, completeness, terminology complexity, and emotional impact. A structured and detailed prompt was used to guide ChatGPT-4 in generating the reports, which included clear findings, a lay summary, glossary, and clarification of ambiguous elements. Statistical analyses included descriptive summaries, Friedman tests, and Pearson correlations. Radiologists rated mammogram reports highest for correctness (M = 4.22), followed by CT (4.05) and MRI (3.95). Completeness scores followed a similar trend. Statistically significant differences were found in correctness (χ<sup>2</sup>(2) = 17.37, p < 0.001) and completeness (χ<sup>2</sup>(2) = 13.13, p = 0.001). Anxiety and complexity ratings were moderate, with MRI reports linked to slightly higher concern. A weak positive correlation emerged between radiologists' experience and mammogram correctness ratings (r = .235, p = .037). Radiologists expressed overall support for AI-generated simplified radiology reports when created using a structured prompt that includes summaries, glossaries, and clarification of ambiguous findings. While mammography and CT reports were rated favorably, MRI reports showed higher emotional impact, highlighting a need for clearer and more emotionally supportive language.

Quantitative Prostate MRI, From the <i>AJR</i> Special Series on Quantitative Imaging.

Margolis DJA, Chatterjee A, deSouza NM, Fedorov A, Fennessy F, Maier SE, Obuchowski N, Punwani S, Purysko AS, Rakow-Penner R, Shukla-Dave A, Tempany CM, Boss M, Malyarenko D

pubmed logopapersAug 13 2025
Prostate MRI has traditionally relied on qualitative interpretation. However, quantitative components hold the potential to markedly improve performance. The ADC from DWI is probably the most widely recognized quantitative MRI biomarker and has shown strong discriminatory value for clinically significant prostate cancer as well as for recurrent cancer after treatment. Advanced diffusion techniques, including intravoxel incoherent motion imaging, diffusion kurtosis imaging, diffusion-tensor imaging, and specific implementations such as restriction spectrum imaging, purport even better discrimination but are more technically challenging. The inherent T1 and T2 of tissue also provide diagnostic value, with more advanced techniques deriving luminal water fraction and hybrid multidimensional MRI metrics. Dynamic contrast-enhanced imaging, primarily using a modified Tofts model, also shows independent discriminatory value. Finally, quantitative lesion size and shape features can be combined with the aforementioned techniques and can be further refined using radiomics, texture analysis, and artificial intelligence. Which technique will ultimately find widespread clinical use will depend on validation across a myriad of platforms and use cases.

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Haibo Jin, Haoxuan Che, Sunan He, Hao Chen

arxiv logopreprintAug 13 2025
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Explainable AI Technique in Lung Cancer Detection Using Convolutional Neural Networks

Nishan Rai, Sujan Khatri, Devendra Risal

arxiv logopreprintAug 13 2025
Early detection of lung cancer is critical to improving survival outcomes. We present a deep learning framework for automated lung cancer screening from chest computed tomography (CT) images with integrated explainability. Using the IQ-OTH/NCCD dataset (1,197 scans across Normal, Benign, and Malignant classes), we evaluate a custom convolutional neural network (CNN) and three fine-tuned transfer learning backbones: DenseNet121, ResNet152, and VGG19. Models are trained with cost-sensitive learning to mitigate class imbalance and evaluated via accuracy, precision, recall, F1-score, and ROC-AUC. While ResNet152 achieved the highest accuracy (97.3%), DenseNet121 provided the best overall balance in precision, recall, and F1 (up to 92%, 90%, 91%, respectively). We further apply Shapley Additive Explanations (SHAP) to visualize evidence contributing to predictions, improving clinical transparency. Results indicate that CNN-based approaches augmented with explainability can provide fast, accurate, and interpretable support for lung cancer screening, particularly in resource-limited settings.

Multi-organ AI Endophenotypes Chart the Heterogeneity of Pan-disease in the Brain, Eye, and Heart

Consortium, T. M., Boquet-Pujadas, A., anagnostakis, f., Yang, Z., Tian, Y. E., duggan, m., erus, g., srinivasan, d., Joynes, C., Bai, W., patel, p., Walker, K. A., Zalesky, A., davatzikos, c., WEN, J.

medrxiv logopreprintAug 13 2025
Disease heterogeneity and commonality pose significant challenges to precision medicine, as traditional approaches frequently focus on single disease entities and overlook shared mechanisms across conditions1. Inspired by pan-cancer2 and multi-organ research3, we introduce the concept of "pan-disease" to investigate the heterogeneity and shared etiology in brain, eye, and heart diseases. Leveraging individual-level data from 129,340 participants, as well as summary-level data from the MULTI consortium, we applied a weakly-supervised deep learning model (Surreal-GAN4,5) to multi-organ imaging, genetic, proteomic, and RNA-seq data, identifying 11 AI-derived biomarkers - called Multi-organ AI Endophenotypes (MAEs) - for the brain (Brain 1-6), eye (Eye 1-3), and heart (Heart 1-2), respectively. We found Brain 3 to be a risk factor for Alzheimers disease (AD) progression and mortality, whereas Brain 5 was protective against AD progression. Crucially, in data from an anti-amyloid AD drug (solanezumab6), heterogeneity in cognitive decline trajectories was observed across treatment groups. At week 240, patients with lower brain 1-3 expression had slower cognitive decline, whereas patients with higher expression had faster cognitive decline. A multi-layer causal pathway pinpointed Brain 1 as a mediational endophenotype7 linking the FLRT2 protein to migraine, exemplifying novel therapeutic targets and pathways. Additionally, genes associated with Eye 1 and Eye 3 were enriched in cancer drug-related gene sets with causal links to specific cancer types and proteins. Finally, Heart 1 and Heart 2 had the highest mortality risk and unique medication history profiles, with Heart 1 showing favorable responses to antihypertensive medications and Heart 2 to digoxin treatment. The 11 MAEs provide novel AI dimensional representations for precision medicine and highlight the potential of AI-driven patient stratification for disease risk monitoring, clinical trials, and drug discovery.

Exploring GPT-4o's multimodal reasoning capabilities with panoramic radiograph: the role of prompt engineering.

Xiong YT, Lian WJ, Sun YN, Liu W, Guo JX, Tang W, Liu C

pubmed logopapersAug 12 2025
The aim of this study was to evaluate GPT-4o's multimodal reasoning ability to review panoramic radiograph (PR) and verify its radiologic findings, while exploring the role of prompt engineering in enhancing its performance. The study included 230 PRs from West China Hospital of Stomatology in 2024, which were interpreted to generate the PR findings. A total of 300 instances of interpretation errors, were manually inserted into the PR findings. The ablation study was conducted to assess whether GPT-4o can perform reasoning on PR under a zero-shot prompt. Prompt engineering was employed to enhance the reasoning capabilities of GPT-4o in identifying interpretation errors with PRs. The prompt strategies included chain-of-thought, self-consistency, in-context learning, multimodal in-context learning, and their systematic integration into a meta-prompt. Recall, accuracy, and F1 score were employed to evaluate the outputs. Subsequently, the localization capability of GPT-4o and its influence on reasoning capability were evaluated. In the ablation study, GPT-4o's recall increased significantly from 2.67 to 43.33% upon acquiring PRs (P < 0.001). GPT-4o with the meta prompt demonstrated improvements in recall (43.33% vs. 52.67%, P = 0.022), accuracy (39.95% vs. 68.75%, P < 0.001), and F1 score (0.42 vs. 0.60, P < 0.001) compared to the zero-shot prompt and other prompt strategies. The localization accuracy of GPT-4o was 45.67% (137 out of 300, 95% CI: 40.00 to 51.34). A significant correlation was observed between its localization accuracy and reasoning capability under the meta prompt (φ coefficient = 0.33, p < 0.001). The model's recall increased by 5.49% (P = 0.031) by providing accurate localization cues within the meta prompt. GPT-4o demonstrated a certain degree of multimodal capability for PR, with performance enhancement through prompt engineering. Nevertheless, its performance remains inadequate for clinical requirements. Future efforts will be necessary to identify additional factors influencing the model's reasoning capability or to develop more advanced models. Evaluating GPT-4o's capability to interpret and reason through PRs and exploring potential methods to enhance its performance before clinical application in assisting radiological assessments.

Current imaging applications, radiomics, and machine learning modalities of CNS demyelinating disorders and its mimickers.

Alam Z, Maddali A, Patel S, Weber N, Al Rikabi S, Thiemann D, Desai K, Monoky D

pubmed logopapersAug 12 2025
Distinguishing among neuroinflammatory demyelinating diseases of the central nervous system can present a significant diagnostic challenge due to substantial overlap in clinical presentations and imaging features. Collaboration between specialists, novel antibody testing, and dedicated magnetic resonance imaging protocols have helped to narrow the diagnostic gap, but challenging cases remain. Machine learning algorithms have proven to be able to identify subtle patterns that escape even the most experienced human eye. Indeed, machine learning and the subfield of radiomics have demonstrated exponential growth and improvement in diagnosis capacity within the past decade. The sometimes daunting diagnostic overlap of various demyelinating processes thus provides a unique opportunity: can the elite pattern recognition powers of machine learning close the gap in making the correct diagnosis? This review specifically focuses on neuroinflammatory demyelinating diseases, exploring the role of artificial intelligence in the detection, diagnosis, and differentiation of the most common pathologies: multiple sclerosis (MS), neuromyelitis optica spectrum disorder (NMOSD), acute disseminated encephalomyelitis (ADEM), Sjogren's syndrome, MOG antibody-associated disorder (MOGAD), and neuropsychiatric systemic lupus erythematosus (NPSLE). Understanding how these tools enhance diagnostic precision may lead to earlier intervention, improved outcomes, and optimized management strategies.

The performance of large language models in dentomaxillofacial radiology: a systematic review.

Liu Z, Nalley A, Hao J, H Ai QY, Kan Yeung AW, Tanaka R, Hung KF

pubmed logopapersAug 12 2025
This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR). Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards. The initial search yielded 1621 titles, and nineteen studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (n = 8), diagnosis and treatment recommendations (n = 7), and radiology report generation and patient communication (n = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37-92.5% and expert ratings for differential diagnosis and treatment planning between 3.6-4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3-86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4-81.3%. LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Impression Generation on Multi-institution and Multi-system Data.

Zhong T, Zhao W, Zhang Y, Pan Y, Dong P, Jiang Z, Jiang H, Zhou Y, Kui X, Shang Y, Zhao L, Yang L, Wei Y, Li Z, Zhang J, Yang L, Chen H, Zhao H, Liu Y, Zhu N, Li Y, Wang Y, Yao J, Wang J, Zeng Y, He L, Zheng C, Zhang Z, Li M, Liu Z, Dai H, Wu Z, Zhang L, Zhang S, Cai X, Hu X, Zhao S, Jiang X, Zhang X, Liu W, Li X, Zhu D, Guo L, Shen D, Han J, Liu T, Liu J, Zhang T

pubmed logopapersAug 11 2025
Achieving clinical level performance and widespread deployment for generating radiology impressions encounters a giant challenge for conventional artificial intelligence models tailored to specific diseases and organs. Concurrent with the increasing accessibility of radiology reports and advancements in modern general AI techniques, the emergence and potential of deployable radiology AI exploration have been bolstered. Here, we present ChatRadio-Valuer, the first general radiology diagnosis large language model for localized deployment within hospitals and being close to clinical use for multi-institution and multi-system diseases. ChatRadio-Valuer achieved 15 state-of-the-art results across five human systems and six institutions in clinical-level events (n=332,673) through rigorous and full-spectrum assessment, including engineering metrics, clinical validation, and efficiency evaluation. Notably, it exceeded OpenAI's GPT-3.5 and GPT-4 models, achieving superior performance in comprehensive disease diagnosis compared to the average level of radiology experts. Besides, ChatRadio-Valuer supports zero-shot transfer learning, greatly boosting its effectiveness as a radiology assistant, while ensuring adherence to privacy standards and being readily utilized for large-scale patient populations. Our expeditions suggest the development of localized LLMs would become an imperative avenue in hospital applications.

Decoding fetal motion in 4D ultrasound with DeepLabCut.

Inubashiri E, Kaishi Y, Miyake T, Yamaguchi R, Hamaguchi T, Inubashiri M, Ota H, Watanabe Y, Deguchi K, Kuroki K, Maeda N

pubmed logopapersAug 11 2025
This study aimed to objectively and quantitatively analyze fetal motor behavior using DeepLabCut (DLC), a markerless posture estimation tool based on deep learning, applied to four-dimensional ultrasound (4DUS) data collected during the second trimester. We propose a novel clinical method for precise assessment of fetal neurodevelopment. Fifty 4DUS video recordings of normal singleton fetuses aged 12 to 22 gestational weeks were analyzed. Eight fetal joints were manually labeled in 2% of each video to train a customized DLC model. The model's accuracy was evaluated using likelihood scores. Intra- and inter-rater reliability of manual labeling were assessed using intraclass correlation coefficients (ICC). Angular velocity time series derived from joint coordinates were analyzed to quantify fetal movement patterns and developmental coordination. Manual labeling demonstrated excellent reproducibility (inter-rater ICC = 0.990, intra-rater ICC = 0.961). The trained DLC model achieved a mean likelihood score of 0.960, confirming high tracking accuracy. Kinematic analysis revealed developmental trends: localized rapid limb movements were common at 12-13 weeks; movements became more coordinated and systemic by 18-20 weeks, reflecting advancing neuromuscular maturation. Although a modest increase in tracking accuracy was observed with gestational age, this trend did not reach statistical significance (p < 0.001). DLC enables precise quantitative analysis of fetal motor behavior from 4DUS recordings. This AI-driven approach offers a promising, noninvasive alternative to conventional qualitative assessments, providing detailed insights into early fetal neurodevelopmental trajectories and potential early screening for neurodevelopmental disorders.
Page 22 of 78779 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.