Sort by:
Page 1 of 19 results

The Effectiveness of Large Language Models in Providing Automated Feedback in Medical Imaging Education: A Protocol for a Systematic Review

Al-Mashhadani, M., Ajaz, F., Guraya, S. S., Ennab, F.

medrxiv logopreprintAug 6 2025
BackgroundLarge Language Models (LLMs) represent an ever-emerging and rapidly evolving generative artificial intelligence (AI) modality with promising developments in the field of medical education. LLMs can provide automated feedback services to medical trainees (i.e. medical students, residents, fellows, etc.) and possibly serve a role in medical imaging education. AimThis systematic review aims to comprehensively explore the current applications and educational outcomes of LLMs in providing automated feedback on medical imaging reports. MethodsThis study employs a comprehensive systematic review strategy, involving an extensive search of the literature (Pubmed, Scopus, Embase, and Cochrane), data extraction, and synthesis of the data. ConclusionThis systematic review will highlight the best practices of LLM use in automated feedback of medical imaging reports and guide further development of these models.

Interpreting convolutional neural network explainability for head-and-neck cancer radiotherapy organ-at-risk segmentation

Strijbis, V. I. J., Gurney-Champion, O. J., Grama, D. I., Slotman, B. J., Verbakel, W. F. A. R.

medrxiv logopreprintJul 31 2025
BackgroundConvolutional neural networks (CNNs) have emerged to reduce clinical resources and standardize auto-contouring of organs-at-risk (OARs). Although CNNs perform adequately for most patients, understanding when the CNN might fail is critical for effective and safe clinical deployment. However, the limitations of CNNs are poorly understood because of their black-box nature. Explainable artificial intelligence (XAI) can expose CNNs inner mechanisms for classification. Here, we investigate the inner mechanisms of CNNs for segmentation and explore a novel, computational approach to a-priori flag potentially insufficient parotid gland (PG) contours. MethodsFirst, 3D UNets were trained in three PG segmentation situations using (1) synthetic cases; (2) 1925 clinical computed tomography (CT) scans with typical and (3) more consistent contours curated through a previously validated auto-curation step. Then, we generated attribution maps for seven XAI methods, and qualitatively assessed them for congruency between simulated and clinical contours, and how much XAI agreed with expert reasoning. To objectify observations, we explored persistent homology intensity filtrations to capture essential topological characteristics of XAI attributions. Principal component (PC) eigenvalues of Euler characteristic profiles were correlated with spatial agreement (Dice-Sorensen similarity coefficient; DSC). Evaluation was done using sensitivity, specificity and the area under receiver operating characteristic (AUROC) curve on an external AAPM dataset, where as proof-of-principle, we regard the lowest 15% DSC as insufficient. ResultsPatternNet attributions (PNet-A) focused on soft-tissue structures, whereas guided backpropagation (GBP) highlighted both soft-tissue and high-density structures (e.g. mandible bone), which was congruent with synthetic situations. Both methods typically had higher/denser activations in better auto-contoured medial and anterior lobes. Curated models produced "cleaner" gradient class-activation mapping (GCAM) attributions. Quantitative analysis showed that PC{lambda}1 of guided GCAMs (GGCAM) Euler characteristic (EC) profile had good predictive value (sensitivity>0.85, specificity>0.9) of DSC for AAPM cases, with AUROC=0.66, 0.74, 0.94, 0.83 for GBP, GCAM, GGCAM and PNet-A. For for {lambda}1<-1.8e3 of GGCAMs EC-profile, 87% of cases were insufficient. ConclusionsGBP and PNet-A qualitatively agreed most with expert reasoning on directly (structure borders) and indirectly (proxies used for identifying structure borders) important features for PG segmentation. Additionally, this work investigated as proof-of-principle how topological data analysis could possibly be used for quantitative XAI signal analysis to a-priori mark potentially inadequate CNN-segmentations, using only features from inside the predicted PG. This work used PG as a well-understood segmentation paradigm and may extend to target volumes and other organs-at-risk.

The impacts of artificial intelligence on the workload of diagnostic radiology services: A rapid review and stakeholder contextualisation

Sutton, C., Prowse, J., Elshehaly, M., Randell, R.

medrxiv logopreprintJul 24 2025
BackgroundAdvancements in imaging technology, alongside increasing longevity and co-morbidities, have led to heightened demand for diagnostic radiology services. However, there is a shortfall in radiology and radiography staff to acquire, read and report on such imaging examinations. Artificial intelligence (AI) has been identified, notably by AI developers, as a potential solution to impact positively the workload of radiology services for diagnostics to address this staffing shortfall. MethodsA rapid review complemented with data from interviews with UK radiology service stakeholders was undertaken. ArXiv, Cochrane Library, Embase, Medline and Scopus databases were searched for publications in English published between 2007 and 2022. Following screening 110 full texts were included. Interviews with 15 radiology service managers, clinicians and academics were carried out between May and September 2022. ResultsMost literature was published in 2021 and 2022 with a distinct focus on AI for diagnostics of lung and chest disease (n = 25) notably COVID-19 and respiratory system cancers, closely followed by AI for breast screening (n = 23). AI contribution to streamline the workload of radiology services was categorised as autonomous, augmentative and assistive contributions. However, percentage estimates, of workload reduction, varied considerably with the most significant reduction identified in national screening programmes. AI was also recognised as aiding radiology services through providing second opinion, assisting in prioritisation of images for reading and improved quantification in diagnostics. Stakeholders saw AI as having the potential to remove some of the laborious work and contribute service resilience. ConclusionsThis review has shown there is limited data on real-world experiences from radiology services for the implementation of AI in clinical production. Autonomous, augmentative and assistive AI can, as noted in the article, decrease workload and aid reading and reporting, however the governance surrounding these advancements lags.

DREAM: A framework for discovering mechanisms underlying AI prediction of protected attributes

Gadgil, S. U., DeGrave, A. J., Janizek, J. D., Xu, S., Nwandu, L., Fonjungo, F., Lee, S.-I., Daneshjou, R.

medrxiv logopreprintJul 21 2025
Recent advances in Artificial Intelligence (AI) have started disrupting the healthcare industry, especially medical imaging, and AI devices are increasingly being deployed into clinical practice. Such classifiers have previously demonstrated the ability to discern a range of protected demographic attributes (like race, age, sex) from medical images with unexpectedly high performance, a sensitive task which is difficult even for trained physicians. In this study, we motivate and introduce a general explainable AI (XAI) framework called DREAM (DiscoveRing and Explaining AI Mechanisms) for interpreting how AI models trained on medical images predict protected attributes. Focusing on two modalities, radiology and dermatology, we are successfully able to train high-performing classifiers for predicting race from chest x-rays (ROC-AUC score of [~]0.96) and sex from dermoscopic lesions (ROC-AUC score of [~]0.78). We highlight how incorrect use of these demographic shortcuts can have a detrimental effect on the performance of a clinically relevant downstream task like disease diagnosis under a domain shift. Further, we employ various XAI techniques to identify specific signals which can be leveraged to predict sex. Finally, we propose a technique, which we callremoval via balancing, to quantify how much a signal contributes to the classification performance. Using this technique and the signals identified, we are able to explain [~]15% of the total performance for radiology and [~]42% of the total performance for dermatology. We envision DREAM to be broadly applicable to other modalities and demographic attributes. This analysis not only underscores the importance of cautious AI application in healthcare but also opens avenues for improving the transparency and reliability of AI-driven diagnostic tools.

Artificial Intelligence for Early Detection and Prognosis Prediction of Diabetic Retinopathy

Budi Susilo, Y. K., Yuliana, D., Mahadi, M., Abdul Rahman, S., Ariffin, A. E.

medrxiv logopreprintJun 20 2025
This review explores the transformative role of artificial intelligence (AI) in the early detection and prognosis prediction of diabetic retinopathy (DR), a leading cause of vision loss in diabetic patients. AI, particularly deep learning and convolutional neural networks (CNNs), has demonstrated remarkable accuracy in analyzing retinal images, identifying early-stage DR with high sensitivity and specificity. These advancements address critical challenges such as intergrader variability in manual screening and the limited availability of specialists, especially in underserved regions. The integration of AI with telemedicine has further enhanced accessibility, enabling remote screening through portable devices and smartphone-based imaging. Economically, AI-based systems reduce healthcare costs by optimizing resource allocation and minimizing unnecessary referrals. Key findings highlight the dominance of Medicine (819 documents) and Computer Science (613 documents) in research output, reflecting the interdisciplinary nature of this field. Geographically, China, the United States, and India lead in contributions, underscoring global efforts to combat DR. Despite these successes, challenges such as algorithmic bias, data privacy, and the need for explainable AI (XAI) remain. Future research should focus on multi-center validation, diverse AI methodologies, and clinician-friendly tools to ensure equitable adoption. By addressing these gaps, AI can revolutionize DR management, reducing the global burden of diabetes-related blindness through early intervention and scalable solutions.

Radiologist-AI workflow can be modified to reduce the risk of medical malpractice claims

Bernstein, M., Sheppard, B., Bruno, M. A., Lay, P. S., Baird, G. L.

medrxiv logopreprintJun 16 2025
BackgroundArtificial Intelligence (AI) is rapidly changing the legal landscape of radiology. Results from a previous experiment suggested that providing AI error rates can reduce perceived radiologist culpability, as judged by mock jury members (4). The current study advances this work by examining whether the radiologists behavior also impacts perceptions of liability. Methods. Participants (n=282) read about a hypothetical malpractice case where a 50-year-old who visited the Emergency Department with acute neurological symptoms received a brain CT scan to determine if bleeding was present. An AI system was used by the radiologist who interpreted imaging. The AI system correctly flagged the case as abnormal. Nonetheless, the radiologist concluded no evidence of bleeding, and the blood-thinner t-PA was administered. Participants were randomly assigned to either a 1.) single-read condition, where the radiologist interpreted the CT once after seeing AI feedback, or 2.) a double-read condition, where the radiologist interpreted the CT twice, first without AI and then with AI feedback. Participants were then told the patient suffered irreversible brain damage due to the missed brain bleed, resulting in the patient (plaintiff) suing the radiologist (defendant). Participants indicated whether the radiologist met their duty of care to the patient (yes/no). Results. Hypothetical jurors were more likely to side with the plaintiff in the single-read condition (106/142, 74.7%) than in the double-read condition (74/140, 52.9%), p=0.0002. Conclusion. This suggests that the penalty for disagreeing with correct AI can be mitigated when images are interpreted twice, or at least if a radiologist gives an interpretation before AI is used.

Lack of children in public medical imaging data points to growing age bias in biomedical AI

Hua, S. B. Z., Heller, N., He, P., Towbin, A. J., Chen, I., Lu, A., Erdman, L.

medrxiv logopreprintJun 7 2025
Artificial intelligence (AI) is rapidly transforming healthcare, but its benefits are not reaching all patients equally. Children remain overlooked with only 17% of FDA-approved medical AI devices labeled for pediatric use. In this work, we demonstrate that this exclusion may stem from a fundamental data gap. Our systematic review of 181 public medical imaging datasets reveals that children represent just under 1% of available data, while the majority of machine learning imaging conference papers we surveyed utilized publicly available data for methods development. Much like systematic biases of other kinds in model development, past studies have demonstrated the manner in which pediatric representation in data used for models intended for the pediatric population is essential for model performance in that population. We add to these findings, showing that adult-trained chest radiograph models exhibit significant age bias when applied to pediatric populations, with higher false positive rates in younger children. This work underscores the urgent need for increased pediatric representation in publicly accessible medical datasets. We provide actionable recommendations for researchers, policymakers, and data curators to address this age equity gap and ensure AI benefits patients of all ages. 1-2 sentence summaryOur analysis reveals a critical healthcare age disparity: children represent less than 1% of public medical imaging datasets. This gap in representation leads to biased predictions across medical image foundation models, with the youngest patients facing the highest risk of misdiagnosis.

Evaluating the performance and potential bias of predictive models for the detection of transthyretin cardiac amyloidosis

Hourmozdi, J., Easton, N., Benigeri, S., Thomas, J. D., Narang, A., Ouyang, D., Duffy, G., Upton, R., Hawkes, W., Akerman, A., Okwuosa, I., Kline, A., Kho, A. N., Luo, Y., Shah, S. J., Ahmad, F. S.

medrxiv logopreprintJun 2 2025
BackgroundDelays in the diagnosis of transthyretin amyloid cardiomyopathy (ATTR-CM) contribute to the significant morbidity of the condition, especially in the era of disease-modifying therapies. Screening for ATTR-CM with AI and other algorithms may improve timely diagnosis, but these algorithms have not been directly compared. ObjectivesThe aim of this study was to compare the performance of four algorithms for ATTR-CM detection in a heart failure population and assess the risk for harms due to model bias. MethodsWe identified patients in an integrated health system from 2010-2022 with ATTR-CM and age- and sex-matched them to controls with heart failure to target 5% prevalence. We compared the performance of a claims-based random forest model (Huda et al. model), a regression-based score (Mayo ATTR-CM), and two deep learning echo models (EchoNet-LVH and EchoGo(R) Amyloidosis). We evaluated for bias using standard fairness metrics. ResultsThe analytical cohort included 176 confirmed cases of ATTR-CM and 3192 control patients with 79.2% self-identified as White and 9.0% as Black. The Huda et al. model performed poorly (AUC 0.49). Both deep learning echo models had a higher AUC when compared to the Mayo ATTR-CM Score (EchoNet-LVH 0.88; EchoGo Amyloidosis 0.92; Mayo ATTR-CM Score 0.79; DeLong P<0.001 for both). Bias auditing met fairness criteria for equal opportunity among patients who identified as Black. ConclusionsDeep learning, echo-based models to detect ATTR-CM demonstrated best overall discrimination when compared to two other models in external validation with low risk of harms due to racial bias.

Artificial Intelligence-Driven Innovations in Diabetes Care and Monitoring

Abdul Rahman, S., Mahadi, M., Yuliana, D., Budi Susilo, Y. K., Ariffin, A. E., Amgain, K.

medrxiv logopreprintJun 2 2025
This study explores Artificial Intelligence (AI)s transformative role in diabetes care and monitoring, focusing on innovations that optimize patient outcomes. AI, particularly machine learning and deep learning, significantly enhances early detection of complications like diabetic retinopathy and improves screening efficacy. The methodology employs a bibliometric analysis using Scopus, VOSviewer, and Publish or Perish, analyzing 235 articles from 2023-2025. Results indicate a strong interdisciplinary focus, with Computer Science and Medicine being dominant subject areas (36.9% and 12.9% respectively). Bibliographic coupling reveals robust international collaborations led by the U.S. (1558.52 link strength), UK, and China, with key influential documents by Zhu (2023c) and Annuzzi (2023). This research highlights AIs impact on enhancing monitoring, personalized treatment, and proactive care, while acknowledging challenges in data privacy and ethical deployment. Future work should bridge technological advancements with real-world implementation to create equitable and efficient diabetes care systems.
Page 1 of 19 results
Show
per page
1

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.