Sort by:
Page 59 of 66652 results

Patient Reactions to Artificial Intelligence-Clinician Discrepancies: Web-Based Randomized Experiment.

Madanay F, O'Donohue LS, Zikmund-Fisher BJ

pubmed logopapersMay 22 2025
As the US Food and Drug Administration (FDA)-approved use of artificial intelligence (AI) for medical imaging rises, radiologists are increasingly integrating AI into their clinical practices. In lung cancer screening, diagnostic AI offers a second set of eyes with the potential to detect cancer earlier than human radiologists. Despite AI's promise, a potential problem with its integration is the erosion of patient confidence in clinician expertise when there is a discrepancy between the radiologist's and the AI's interpretation of the imaging findings. We examined how discrepancies between AI-derived recommendations and radiologists' recommendations affect patients' agreement with radiologists' recommendations and satisfaction with their radiologists. We also analyzed how patients' medical maximizing-minimizing preferences moderate these relationships. We conducted a randomized, between-subjects experiment with 1606 US adult participants. Assuming the role of patients, participants imagined undergoing a low-dose computerized tomography scan for lung cancer screening and receiving results and recommendations from (1) a radiologist only, (2) AI and a radiologist in agreement, (3) a radiologist who recommended more testing than AI (ie, radiologist overcalled AI), or (4) a radiologist who recommended less testing than AI (ie, radiologist undercalled AI). Participants rated the radiologist on three criteria: agreement with the radiologist's recommendation, how likely they would be to recommend the radiologist to family and friends, and how good of a provider they perceived the radiologist to be. We measured medical maximizing-minimizing preferences and categorized participants as maximizers (ie, those who seek aggressive intervention), minimizers (ie, those who prefer no or passive intervention), and neutrals (ie, those in the middle). Participants' agreement with the radiologist's recommendation was significantly lower when the radiologist undercalled AI (mean 4.01, SE 0.07, P<.001) than in the other 3 conditions, with no significant differences among them (radiologist overcalled AI [mean 4.63, SE 0.06], agreed with AI [mean 4.55, SE 0.07], or had no AI [mean 4.57, SE 0.06]). Similarly, participants were least likely to recommend (P<.001) and positively rate (P<.001) the radiologist who undercalled AI, with no significant differences among the other conditions. Maximizers agreed with the radiologist who overcalled AI (β=0.82, SE 0.14; P<.001) and disagreed with the radiologist who undercalled AI (β=-0.47, SE 0.14; P=.001). However, whereas minimizers disagreed with the radiologist who overcalled AI (β=-0.43, SE 0.18, P=.02), they did not significantly agree with the radiologist who undercalled AI (β=0.14, SE 0.17, P=.41). Radiologists who recommend less testing than AI may face decreased patient confidence in their expertise, but they may not face this same penalty for giving more aggressive recommendations than AI. Patients' reactions may depend in part on whether their general preferences to maximize or minimize align with the radiologists' recommendations. Future research should test communication strategies for radiologists' disclosure of AI discrepancies to patients.

Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding

Ta Duc Huy, Duy Anh Huynh, Yutong Xie, Yuankai Qi, Qi Chen, Phi Le Nguyen, Sen Kim Tran, Son Lam Phung, Anton van den Hengel, Zhibin Liao, Minh-Son To, Johan W. Verjans, Vu Minh Hieu Phan

arxiv logopreprintMay 21 2025
Visual grounding (VG) is the capability to identify the specific regions in an image associated with a particular text description. In medical imaging, VG enhances interpretability by highlighting relevant pathological features corresponding to textual descriptions, improving model transparency and trustworthiness for wider adoption of deep learning models in clinical practice. Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations. In this paper, we empirically demonstrate two key observations. First, current VLMs assign high norms to background tokens, diverting the model's attention from regions of disease. Second, the global tokens used for cross-modal learning are not representative of local disease tokens. This hampers identifying correlations between the text and disease tokens. To address this, we introduce simple, yet effective Disease-Aware Prompting (DAP) process, which uses the explainability map of a VLM to identify the appropriate image features. This simple strategy amplifies disease-relevant regions while suppressing background interference. Without any additional pixel-level annotations, DAP improves visual grounding accuracy by 20.74% compared to state-of-the-art methods across three major chest X-ray datasets.

Lung Nodule-SSM: Self-Supervised Lung Nodule Detection and Classification in Thoracic CT Images

Muniba Noreen, Furqan Shaukat

arxiv logopreprintMay 21 2025
Lung cancer remains among the deadliest types of cancer in recent decades, and early lung nodule detection is crucial for improving patient outcomes. The limited availability of annotated medical imaging data remains a bottleneck in developing accurate computer-aided diagnosis (CAD) systems. Self-supervised learning can help leverage large amounts of unlabeled data to develop more robust CAD systems. With the recent advent of transformer-based architecture and their ability to generalize to unseen tasks, there has been an effort within the healthcare community to adapt them to various medical downstream tasks. Thus, we propose a novel "LungNodule-SSM" method, which utilizes selfsupervised learning with DINOv2 as a backbone to enhance lung nodule detection and classification without annotated data. Our methodology has two stages: firstly, the DINOv2 model is pre-trained on unlabeled CT scans to learn robust feature representations, then secondly, these features are fine-tuned using transformer-based architectures for lesionlevel detection and accurate lung nodule diagnosis. The proposed method has been evaluated on the challenging LUNA 16 dataset, consisting of 888 CT scans, and compared with SOTA methods. Our experimental results show the superiority of our proposed method with an accuracy of 98.37%, explaining its effectiveness in lung nodule detection. The source code, datasets, and pre-processed data can be accessed using the link:https://github.com/EMeRALDsNRPU/Lung-Nodule-SSM-Self-Supervised-Lung-Nodule-Detection-and-Classification/tree/main

Benchmarking Chest X-ray Diagnosis Models Across Multinational Datasets

Qinmei Xu, Yiheng Li, Xianghao Zhan, Ahmet Gorkem Er, Brittany Dashevsky, Chuanjun Xu, Mohammed Alawad, Mengya Yang, Liu Ya, Changsheng Zhou, Xiao Li, Haruka Itakura, Olivier Gevaert

arxiv logopreprintMay 21 2025
Foundation models leveraging vision-language pretraining have shown promise in chest X-ray (CXR) interpretation, yet their real-world performance across diverse populations and diagnostic tasks remains insufficiently evaluated. This study benchmarks the diagnostic performance and generalizability of foundation models versus traditional convolutional neural networks (CNNs) on multinational CXR datasets. We evaluated eight CXR diagnostic models - five vision-language foundation models and three CNN-based architectures - across 37 standardized classification tasks using six public datasets from the USA, Spain, India, and Vietnam, and three private datasets from hospitals in China. Performance was assessed using AUROC, AUPRC, and other metrics across both shared and dataset-specific tasks. Foundation models outperformed CNNs in both accuracy and task coverage. MAVL, a model incorporating knowledge-enhanced prompts and structured supervision, achieved the highest performance on public (mean AUROC: 0.82; AUPRC: 0.32) and private (mean AUROC: 0.95; AUPRC: 0.89) datasets, ranking first in 14 of 37 public and 3 of 4 private tasks. All models showed reduced performance on pediatric cases, with average AUROC dropping from 0.88 +/- 0.18 in adults to 0.57 +/- 0.29 in children (p = 0.0202). These findings highlight the value of structured supervision and prompt design in radiologic AI and suggest future directions including geographic expansion and ensemble modeling for clinical deployment. Code for all evaluated models is available at https://drive.google.com/drive/folders/1B99yMQm7bB4h1sVMIBja0RfUu8gLktCE

Comprehensive Lung Disease Detection Using Deep Learning Models and Hybrid Chest X-ray Data with Explainable AI

Shuvashis Sarker, Shamim Rahim Refat, Faika Fairuj Preotee, Tanvir Rouf Shawon, Raihan Tanvir

arxiv logopreprintMay 21 2025
Advanced diagnostic instruments are crucial for the accurate detection and treatment of lung diseases, which affect millions of individuals globally. This study examines the effectiveness of deep learning and transfer learning models using a hybrid dataset, created by merging four individual datasets from Bangladesh and global sources. The hybrid dataset significantly enhances model accuracy and generalizability, particularly in detecting COVID-19, pneumonia, lung opacity, and normal lung conditions from chest X-ray images. A range of models, including CNN, VGG16, VGG19, InceptionV3, Xception, ResNet50V2, InceptionResNetV2, MobileNetV2, and DenseNet121, were applied to both individual and hybrid datasets. The results showed superior performance on the hybrid dataset, with VGG16, Xception, ResNet50V2, and DenseNet121 each achieving an accuracy of 99%. This consistent performance across the hybrid dataset highlights the robustness of these models in handling diverse data while maintaining high accuracy. To understand the models implicit behavior, explainable AI techniques were employed to illuminate their black-box nature. Specifically, LIME was used to enhance the interpretability of model predictions, especially in cases of misclassification, contributing to the development of reliable and interpretable AI-driven solutions for medical imaging.

Fusing radiomics and deep learning features for automated classification of multi-type pulmonary nodule.

Du L, Tang G, Che Y, Ling S, Chen X, Pan X

pubmed logopapersMay 20 2025
The accurate classification of lung nodules is critical to achieving personalized lung cancer treatment and prognosis prediction. The treatment options for lung cancer and the prognosis of patients are closely related to the type of lung nodules, but there are many types of lung nodules, and the distinctions between certain types are subtle, making accurate classification based on traditional medical imaging technology and doctor experience challenging. In this study, a novel method was used to analyze quantitative features in CT images using CT radiomics to reveal the characteristics of pulmonary nodules, and then feature fusion was used to integrate radiomics features and deep learning features to improve the accuracy of classification. This paper proposes a fusion feature pulmonary nodule classification method that fuses radiomics features with deep learning neural network features, aiming to automatically classify different types of pulmonary nodules (such as Malignancy, Calcification, Spiculation, Lobulation, Margin, and Texture). By introducing the Discriminant Correlation Analysis feature fusion algorithm, the method maximizes the complementarity between the two types of features and the differences between different classes. This ensures interaction between the information, effectively utilizing the complementary characteristics of the features. The LIDC-IDRI dataset is used for training, and the fusion feature model has been validated for its advantages and effectiveness in classifying multiple types of pulmonary nodules. The experimental results show that the fusion feature model outperforms the single-feature model in all classification tasks. The AUCs for the tasks of classifying Calcification, Lobulation, Margin, Spiculation, Texture, and Malignancy reached 0.9663, 0.8113, 0.8815, 0.8140, 0.9010, and 0.9316, respectively. In tasks such as nodule calcification and texture classification, the fusion feature model significantly improved the recognition ability of minority classes. The fusion of radiomics features and deep learning neural network features can effectively enhance the overall performance of pulmonary nodule classification models while also improving the recognition of minority classes when there is a significant class imbalance.

Federated learning in low-resource settings: A chest imaging study in Africa -- Challenges and lessons learned

Jorge Fabila, Lidia Garrucho, Víctor M. Campello, Carlos Martín-Isla, Karim Lekadir

arxiv logopreprintMay 20 2025
This study explores the use of Federated Learning (FL) for tuberculosis (TB) diagnosis using chest X-rays in low-resource settings across Africa. FL allows hospitals to collaboratively train AI models without sharing raw patient data, addressing privacy concerns and data scarcity that hinder traditional centralized models. The research involved hospitals and research centers in eight African countries. Most sites used local datasets, while Ghana and The Gambia used public ones. The study compared locally trained models with a federated model built across all institutions to evaluate FL's real-world feasibility. Despite its promise, implementing FL in sub-Saharan Africa faces challenges such as poor infrastructure, unreliable internet, limited digital literacy, and weak AI regulations. Some institutions were also reluctant to share model updates due to data control concerns. In conclusion, FL shows strong potential for enabling AI-driven healthcare in underserved regions, but broader adoption will require improvements in infrastructure, education, and regulatory support.

RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection

Wenjun Hou, Yi Cheng, Kaishuai Xu, Heng Li, Yan Hu, Wenjie Li, Jiang Liu

arxiv logopreprintMay 20 2025
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, including radiology report generation. Previous approaches have attempted to utilize multimodal LLMs for this task, enhancing their performance through the integration of domain-specific knowledge retrieval. However, these approaches often overlook the knowledge already embedded within the LLMs, leading to redundant information integration and inefficient utilization of learned representations. To address this limitation, we propose RADAR, a framework for enhancing radiology report generation with supplementary knowledge injection. RADAR improves report generation by systematically leveraging both the internal knowledge of an LLM and externally retrieved information. Specifically, it first extracts the model's acquired knowledge that aligns with expert image-based classification outputs. It then retrieves relevant supplementary knowledge to further enrich this information. Finally, by aggregating both sources, RADAR generates more accurate and informative radiology reports. Extensive experiments on MIMIC-CXR, CheXpert-Plus, and IU X-ray demonstrate that our model outperforms state-of-the-art LLMs in both language quality and clinical accuracy

Mask of Truth: Model Sensitivity to Unexpected Regions of Medical Images.

Sourget T, Hestbek-Møller M, Jiménez-Sánchez A, Junchi Xu J, Cheplygina V

pubmed logopapersMay 20 2025
The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an area under the curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chákṣu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge.

Diagnostic value of fully automated CT pulmonary angiography in patients with chronic thromboembolic pulmonary hypertension and chronic thromboembolic disease.

Lin Y, Li M, Xie S

pubmed logopapersMay 20 2025
To evaluate the value of employing artificial intelligence (AI)-assisted CT pulmonary angiography (CTPA) for patients with chronic thromboembolic pulmonary hypertension (CTEPH) and chronic thromboembolic disease (CTED). A single-center, retrospective analysis of 350 sequential patients with right heart catheterization (RHC)-confirmed CTEPH, CTED, and normal controls was conducted. Parameters such as the main pulmonary artery diameter (MPAd), the ratio of MPA to ascending aorta diameter (MPAd/AAd), the ratio of right to left ventricle diameter (RVd/LVd), and the ratio of RV to LV volume (RVv/LVv) were evaluated using automated AI software and compared with manual analysis. The reliability was assessed through an intraclass correlation coefficient (ICC) analysis. The diagnostic accuracy was determined using receiver-operating characteristic (ROC) curves. Compared to CTED and control groups, CTEPH patients were significantly more likely to have elevated automatic CTPA metrics (all p < 0.001, respectively). Automated MPAd, MPAd/Aad, and RVv/LVv had a strong correlation with mPAP (r = 0.952, 0.904, and 0.815, respectively, all p < 0.001). The automated and manual CTPA analyses showed strong concordance. For the CTEPH and CTED categories, the optimal area under the curve (AU-ROC) reached 0.939 (CI: 0.908-0.969). In the CTEPH and control groups, the best AU-ROC was 0.970 (CI: 0.953-0.988). In the CTED and control groups, the best AU-ROC was 0.782 (CI: 0.724-0.840). Automated AI-driven CTPA analysis provides a dependable approach for evaluating patients with CTEPH, CTED, and normal controls, demonstrating excellent consistency and efficiency. Question Guidelines do not advocate for applying treatment protocols for CTEPH to patients with CTED; early detection of the condition is crucial. Findings Automated CTPA analysis was feasible in 100% of patients with good agreement and would have added information for early detection and identification. Clinical relevance Automated AI-driven CTPA analysis provides a reliable approach demonstrating excellent consistency and efficiency. Additionally, these noninvasive imaging findings may aid in treatment stratification and determining optimal intervention directed by RHC.
Page 59 of 66652 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.