Sort by:
Page 1 of 58571 results
Next

Aphasia severity prediction using a multi-modal machine learning approach.

Hu X, Varkanitsa M, Kropp E, Betke M, Ishwar P, Kiran S

pubmed logopapersAug 15 2025
The present study examined an integrated multiple neuroimaging modality (T1 structural, Diffusion Tensor Imaging (DTI), and resting-state FMRI (rsFMRI)) to predict aphasia severity using Western Aphasia Battery-Revised Aphasia Quotient (WAB-R AQ) in 76 individuals with post-stroke aphasia. We employed Support Vector Regression (SVR) and Random Forest (RF) models with supervised feature selection and a stacked feature prediction approach. The SVR model outperformed RF, achieving an average root mean square error (RMSE) of 16.38±5.57, Pearson's correlation coefficient (r) of 0.70±0.13, and mean absolute error (MAE) of 12.67±3.27, compared to RF's RMSE of 18.41±4.34, r of 0.66±0.15, and MAE of 14.64±3.04. Resting-state neural activity and structural integrity emerged as crucial predictors of aphasia severity, appearing in the top 20% of predictor combinations for both SVR and RF. Finally, the feature selection method revealed that functional connectivity in both hemispheres and between homologous language areas is critical for predicting language outcomes in patients with aphasia. The statistically significant difference in performance between the model using only single modality and the optimal multi-modal SVR/RF model (which included both resting-state connectivity and structural information) underscores that aphasia severity is influenced by factors beyond lesion location and volume. These findings suggest that integrating multiple neuroimaging modalities enhances the prediction of language outcomes in aphasia beyond lesion characteristics alone, offering insights that could inform personalized rehabilitation strategies.

Artificial Intelligence based fractional flow reserve.

Bednarek A, Gąsior P, Jaguszewski M, Buszman PP, Milewski K, Hawranek M, Gil R, Wojakowski W, Kochman J, Tomaniak M

pubmed logopapersAug 14 2025
Fractional flow reserve (FFR) - a physiological indicator of coronary stenosis significance - has now become a widely used parameter also in the guidance of percutaneous coronary intervention (PCI). Several studies have shown the superiority of FFR compared to visual assessment, contributing to the reduction in clinical endpoints. However, the current approach to FFR assessment requires coronary instrumentation with a dedicated pressure wire and thus increasing invasiveness, cost, and duration of the procedure. Alternative, noninvasive methods of FFR assessment based on computational fluid dynamics are being widely tested; these approaches are generally not fully automated and may sometimes require substantial computational power. Nowadays, one of the most rapidly expanding fields in medicine is the use of artificial intelligence (AI) in therapy optimization, diagnosis, treatment, and risk stratification. AI usage contributes to the development of more sophisticated methods of imaging analysis and allows for the derivation of clinically important parameters in a faster and more accurate way. Over the recent years, AI utility in deriving FFR in a noninvasive manner has been increasingly reported. In this review, we critically summarize current knowledge in the field of AI-derived FFR based on data from computed tomography angiography, invasive angiography, optical coherence tomography, and intravascular ultrasound. Available solutions, possible future directions in optimizing cathlab performance, including the use of mixed reality, as well as current limitations standing behind the wide adoption of these techniques, are overviewed.

Performance of GPT-5 in Brain Tumor MRI Reasoning

Mojtaba Safari, Shansong Wang, Mingzhe Hu, Zach Eidex, Qiang Li, Xiaofeng Yang

arxiv logopreprintAug 14 2025
Accurate differentiation of brain tumor types on magnetic resonance imaging (MRI) is critical for guiding treatment planning in neuro-oncology. Recent advances in large language models (LLMs) have enabled visual question answering (VQA) approaches that integrate image interpretation with natural language reasoning. In this study, we evaluated GPT-4o, GPT-5-nano, GPT-5-mini, and GPT-5 on a curated brain tumor VQA benchmark derived from 3 Brain Tumor Segmentation (BraTS) datasets - glioblastoma (GLI), meningioma (MEN), and brain metastases (MET). Each case included multi-sequence MRI triplanar mosaics and structured clinical features transformed into standardized VQA items. Models were assessed in a zero-shot chain-of-thought setting for accuracy on both visual and reasoning tasks. Results showed that GPT-5-mini achieved the highest macro-average accuracy (44.19%), followed by GPT-5 (43.71%), GPT-4o (41.49%), and GPT-5-nano (35.85%). Performance varied by tumor subtype, with no single model dominating across all cohorts. These findings suggest that GPT-5 family models can achieve moderate accuracy in structured neuro-oncological VQA tasks, but not at a level acceptable for clinical use.

Quantitative Prostate MRI, From the <i>AJR</i> Special Series on Quantitative Imaging.

Margolis DJA, Chatterjee A, deSouza NM, Fedorov A, Fennessy F, Maier SE, Obuchowski N, Punwani S, Purysko AS, Rakow-Penner R, Shukla-Dave A, Tempany CM, Boss M, Malyarenko D

pubmed logopapersAug 13 2025
Prostate MRI has traditionally relied on qualitative interpretation. However, quantitative components hold the potential to markedly improve performance. The ADC from DWI is probably the most widely recognized quantitative MRI biomarker and has shown strong discriminatory value for clinically significant prostate cancer as well as for recurrent cancer after treatment. Advanced diffusion techniques, including intravoxel incoherent motion imaging, diffusion kurtosis imaging, diffusion-tensor imaging, and specific implementations such as restriction spectrum imaging, purport even better discrimination but are more technically challenging. The inherent T1 and T2 of tissue also provide diagnostic value, with more advanced techniques deriving luminal water fraction and hybrid multidimensional MRI metrics. Dynamic contrast-enhanced imaging, primarily using a modified Tofts model, also shows independent discriminatory value. Finally, quantitative lesion size and shape features can be combined with the aforementioned techniques and can be further refined using radiomics, texture analysis, and artificial intelligence. Which technique will ultimately find widespread clinical use will depend on validation across a myriad of platforms and use cases.

Exploring Radiologists' Use of AI Chatbots for Assistance in Image Interpretation: Patterns of Use and Trust Evaluation.

Alarifi M

pubmed logopapersAug 13 2025
This study investigated radiologists' perceptions of AI-generated, patient-friendly radiology reports across three modalities: MRI, CT, and mammogram/ultrasound. The evaluation focused on report correctness, completeness, terminology complexity, and emotional impact. Seventy-nine radiologists from four major Saudi Arabian hospitals assessed AI-simplified versions of clinical radiology reports. Each participant reviewed one report from each modality and completed a structured questionnaire covering factual correctness, completeness, terminology complexity, and emotional impact. A structured and detailed prompt was used to guide ChatGPT-4 in generating the reports, which included clear findings, a lay summary, glossary, and clarification of ambiguous elements. Statistical analyses included descriptive summaries, Friedman tests, and Pearson correlations. Radiologists rated mammogram reports highest for correctness (M = 4.22), followed by CT (4.05) and MRI (3.95). Completeness scores followed a similar trend. Statistically significant differences were found in correctness (χ<sup>2</sup>(2) = 17.37, p < 0.001) and completeness (χ<sup>2</sup>(2) = 13.13, p = 0.001). Anxiety and complexity ratings were moderate, with MRI reports linked to slightly higher concern. A weak positive correlation emerged between radiologists' experience and mammogram correctness ratings (r = .235, p = .037). Radiologists expressed overall support for AI-generated simplified radiology reports when created using a structured prompt that includes summaries, glossaries, and clarification of ambiguous findings. While mammography and CT reports were rated favorably, MRI reports showed higher emotional impact, highlighting a need for clearer and more emotionally supportive language.

Explainable AI Technique in Lung Cancer Detection Using Convolutional Neural Networks

Nishan Rai, Sujan Khatri, Devendra Risal

arxiv logopreprintAug 13 2025
Early detection of lung cancer is critical to improving survival outcomes. We present a deep learning framework for automated lung cancer screening from chest computed tomography (CT) images with integrated explainability. Using the IQ-OTH/NCCD dataset (1,197 scans across Normal, Benign, and Malignant classes), we evaluate a custom convolutional neural network (CNN) and three fine-tuned transfer learning backbones: DenseNet121, ResNet152, and VGG19. Models are trained with cost-sensitive learning to mitigate class imbalance and evaluated via accuracy, precision, recall, F1-score, and ROC-AUC. While ResNet152 achieved the highest accuracy (97.3%), DenseNet121 provided the best overall balance in precision, recall, and F1 (up to 92%, 90%, 91%, respectively). We further apply Shapley Additive Explanations (SHAP) to visualize evidence contributing to predictions, improving clinical transparency. Results indicate that CNN-based approaches augmented with explainability can provide fast, accurate, and interpretable support for lung cancer screening, particularly in resource-limited settings.

A Chain of Diagnosis Framework for Accurate and Explainable Radiology Report Generation

Haibo Jin, Haoxuan Che, Sunan He, Hao Chen

arxiv logopreprintAug 13 2025
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.

Exploring GPT-4o's multimodal reasoning capabilities with panoramic radiograph: the role of prompt engineering.

Xiong YT, Lian WJ, Sun YN, Liu W, Guo JX, Tang W, Liu C

pubmed logopapersAug 12 2025
The aim of this study was to evaluate GPT-4o's multimodal reasoning ability to review panoramic radiograph (PR) and verify its radiologic findings, while exploring the role of prompt engineering in enhancing its performance. The study included 230 PRs from West China Hospital of Stomatology in 2024, which were interpreted to generate the PR findings. A total of 300 instances of interpretation errors, were manually inserted into the PR findings. The ablation study was conducted to assess whether GPT-4o can perform reasoning on PR under a zero-shot prompt. Prompt engineering was employed to enhance the reasoning capabilities of GPT-4o in identifying interpretation errors with PRs. The prompt strategies included chain-of-thought, self-consistency, in-context learning, multimodal in-context learning, and their systematic integration into a meta-prompt. Recall, accuracy, and F1 score were employed to evaluate the outputs. Subsequently, the localization capability of GPT-4o and its influence on reasoning capability were evaluated. In the ablation study, GPT-4o's recall increased significantly from 2.67 to 43.33% upon acquiring PRs (P < 0.001). GPT-4o with the meta prompt demonstrated improvements in recall (43.33% vs. 52.67%, P = 0.022), accuracy (39.95% vs. 68.75%, P < 0.001), and F1 score (0.42 vs. 0.60, P < 0.001) compared to the zero-shot prompt and other prompt strategies. The localization accuracy of GPT-4o was 45.67% (137 out of 300, 95% CI: 40.00 to 51.34). A significant correlation was observed between its localization accuracy and reasoning capability under the meta prompt (φ coefficient = 0.33, p < 0.001). The model's recall increased by 5.49% (P = 0.031) by providing accurate localization cues within the meta prompt. GPT-4o demonstrated a certain degree of multimodal capability for PR, with performance enhancement through prompt engineering. Nevertheless, its performance remains inadequate for clinical requirements. Future efforts will be necessary to identify additional factors influencing the model's reasoning capability or to develop more advanced models. Evaluating GPT-4o's capability to interpret and reason through PRs and exploring potential methods to enhance its performance before clinical application in assisting radiological assessments.

Current imaging applications, radiomics, and machine learning modalities of CNS demyelinating disorders and its mimickers.

Alam Z, Maddali A, Patel S, Weber N, Al Rikabi S, Thiemann D, Desai K, Monoky D

pubmed logopapersAug 12 2025
Distinguishing among neuroinflammatory demyelinating diseases of the central nervous system can present a significant diagnostic challenge due to substantial overlap in clinical presentations and imaging features. Collaboration between specialists, novel antibody testing, and dedicated magnetic resonance imaging protocols have helped to narrow the diagnostic gap, but challenging cases remain. Machine learning algorithms have proven to be able to identify subtle patterns that escape even the most experienced human eye. Indeed, machine learning and the subfield of radiomics have demonstrated exponential growth and improvement in diagnosis capacity within the past decade. The sometimes daunting diagnostic overlap of various demyelinating processes thus provides a unique opportunity: can the elite pattern recognition powers of machine learning close the gap in making the correct diagnosis? This review specifically focuses on neuroinflammatory demyelinating diseases, exploring the role of artificial intelligence in the detection, diagnosis, and differentiation of the most common pathologies: multiple sclerosis (MS), neuromyelitis optica spectrum disorder (NMOSD), acute disseminated encephalomyelitis (ADEM), Sjogren's syndrome, MOG antibody-associated disorder (MOGAD), and neuropsychiatric systemic lupus erythematosus (NPSLE). Understanding how these tools enhance diagnostic precision may lead to earlier intervention, improved outcomes, and optimized management strategies.

The performance of large language models in dentomaxillofacial radiology: a systematic review.

Liu Z, Nalley A, Hao J, H Ai QY, Kan Yeung AW, Tanaka R, Hung KF

pubmed logopapersAug 12 2025
This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR). Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards. The initial search yielded 1621 titles, and nineteen studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (n = 8), diagnosis and treatment recommendations (n = 7), and radiology report generation and patient communication (n = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37-92.5% and expert ratings for differential diagnosis and treatment planning between 3.6-4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3-86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4-81.3%. LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.
Page 1 of 58571 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.