Latest Papers on Radiology AI. Tags: Ethics

Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification

Xing Shen, Justin Szeto, Mingyang Li, Hengguan Huang, Tal Arbel

•preprint•Jun 29 2025

Multimodal large language models (MLLMs) have enormous potential to perform few-shot in-context learning in the context of medical image analysis. However, safe deployment of these models into real-world clinical practice requires an in-depth analysis of the accuracies of their predictions, and their associated calibration errors, particularly across different demographic subgroups. In this work, we present the first investigation into the calibration biases and demographic unfairness of MLLMs' predictions and confidence scores in few-shot in-context learning for medical image classification. We introduce CALIN, an inference-time calibration method designed to mitigate the associated biases. Specifically, CALIN estimates the amount of calibration needed, represented by calibration matrices, using a bi-level procedure: progressing from the population level to the subgroup level prior to inference. It then applies this estimation to calibrate the predicted confidence scores during inference. Experimental results on three medical imaging datasets: PAPILA for fundus image classification, HAM10000 for skin cancer classification, and MIMIC-CXR for chest X-ray classification demonstrate CALIN's effectiveness at ensuring fair confidence calibration in its prediction, while improving its overall prediction accuracies and exhibiting minimum fairness-utility trade-off.

Mixed Modality Classification Methodology In Silico Academic Lab Ethics

Artificial intelligence in coronary CT angiography: transforming the diagnosis and risk stratification of atherosclerosis.

Irannejad K, Mafi M, Krishnan S, Budoff MJ

•papers•Jun 27 2025

Coronary CT Angiography (CCTA) is essential for assessing atherosclerosis and coronary artery disease, aiding in early detection, risk prediction, and clinical assessment. However, traditional CCTA interpretation is limited by observer variability, time inefficiency, and inconsistent plaque characterization. AI has emerged as a transformative tool, enhancing diagnostic accuracy, workflow efficiency, and risk prediction for major adverse cardiovascular events (MACE). Studies show that AI improves stenosis detection by 27%, inter-reader agreement by 30%, and reduces reporting times by 40%, thereby addressing key limitations of manual interpretation. Integrating AI with multimodal imaging (e.g., FFR-CT, PET-CT) further enhances ischemia detection by 28% and lesion classification by 35%, providing a more comprehensive cardiovascular evaluation. This review synthesizes recent advancements in CCTA-AI automation, risk stratification, and precision diagnostics while critically analyzing data quality, generalizability, ethics, and regulation challenges. Future directions, including real-time AI-assisted triage, cloud-based diagnostics, and AI-driven personalized medicine, are explored for their potential to revolutionize clinical workflows and optimize patient outcomes.

CT Classification Cardiac Review In Silico Academic Lab Ethics Policy

Leadership in radiology in the era of technological advancements and artificial intelligence.

Wichtmann BD, Paech D, Pianykh OS, Huang SY, Seltzer SE, Brink J, Fennessy FM

•papers•Jun 27 2025

Radiology has evolved from the pioneering days of X-ray imaging to a field rich in advanced technologies on the cusp of a transformative future driven by artificial intelligence (AI). As imaging workloads grow in volume and complexity, and economic as well as environmental pressures intensify, visionary leadership is needed to navigate the unprecedented challenges and opportunities ahead. Leveraging its strengths in automation, accuracy and objectivity, AI will profoundly impact all aspects of radiology practice-from workflow management, to imaging, diagnostics, reporting and data-driven analytics-freeing radiologists to focus on value-driven tasks that improve patient care. However, successful AI integration requires strong leadership and robust governance structures to oversee algorithm evaluation, deployment, and ongoing maintenance, steering the transition from static to continuous learning systems. The vision of a "diagnostic cockpit" that integrates multidimensional data for quantitative precision diagnoses depends on visionary leadership that fosters innovation and interdisciplinary collaboration. Through administrative automation, precision medicine, and predictive analytics, AI can enhance operational efficiency, reduce administrative burden, and optimize resource allocation, leading to substantial cost reductions. Leaders need to understand not only the technical aspects but also the complex human, administrative, and organizational challenges of AI's implementation. Establishing sound governance and organizational frameworks will be essential to ensure ethical compliance and appropriate oversight of AI algorithms. As radiology advances toward this AI-driven future, leaders must cultivate an environment where technology enhances rather than replaces human skills, upholding an unwavering commitment to human-centered care. Their vision will define radiology's pioneering role in AI-enabled healthcare transformation. KEY POINTS: Question Artificial intelligence (AI) will transform radiology, improving workflow efficiency, reducing administrative burden, and optimizing resource allocation to meet imaging workloads' increasing complexity and volume. Findings Strong leadership and governance ensure ethical deployment of AI, steering the transition from static to continuous learning systems while fostering interdisciplinary innovation and collaboration. Clinical relevance Visionary leaders must harness AI to enhance, rather than replace, the role of professionals in radiology, advancing human-centered care while pioneering healthcare transformation.

Review Policy Ethics

Enhancing Diagnostic Precision: Utilising a Large Language Model to Extract U Scores from Thyroid Sonography Reports.

Watts E, Pournik O, Allington R, Ding X, Boelaert K, Sharma N, Ghalichi L, Arvanitis TN

•papers•Jun 26 2025

This study evaluates the performance of ChatGPT-4, a Large Language Model (LLM), in automatically extracting U scores from free-text thyroid ultrasound reports collected from University Hospitals Birmingham (UHB), UK, between 2014 and 2024. The LLM was provided with guidelines on the U classification system and extracted U scores independently from 14,248 de-identified reports, without access to human-assigned scores. The LLM-extracted scores were compared to initial clinician-assigned and refined U scores provided by expert reviewers. The LLM achieved 97.7% agreement with refined human U scores, successfully identifying the highest U score in 98.1% of reports with multiple nodules. Most discrepancies (2.5%) were linked to ambiguous descriptions, multi-nodule reports, and cases with human-documented uncertainty. While the results demonstrate the potential for LLMs to improve reporting consistency and reduce manual workload, ethical and governance challenges such as transparency, privacy, and bias must be addressed before routine clinical deployment. Embedding LLMs into reporting workflows, such as Online Analytical Processing (OLAP) tools, could further enhance reporting quality and consistency.

Ultrasound LLM Radiology Report Abdominal Retrospective Clinical In Silico Academic Lab GenAI Ethics

Implementation of an Intelligent System for Detecting Breast Cancer Cells from Histological Images, and Evaluation of Its Results at CHU Bogodogo.

Nikiema WC, Ouattara TA, Barro SG, Ouedraogo AS

•papers•Jun 26 2025

Early detection of breast cancer is a major challenge in the fight against this disease. Artificial intelligence (AI), particularly through medical imaging, offers promising prospects for improving diagnostic accuracy. This article focuses on evaluating the effectiveness of an intelligent electronic system deployed at the CHU of Bogodogo in Burkina Faso, designed to detect breast cancer cells from histological images. The system aims to reduce diagnosis time and enhance screening reliability. The article also discusses the challenges, innovations, and prospects for integrating the system into the conventional laboratory examination process, while considering the associated ethical and technical issues.

Mixed Modality Detection Breast Retrospective Clinical Clinical Pilot Academic Lab Ethics

[AI-enabled clinical decision support systems: challenges and opportunities].

Tschochohei M, Adams LC, Bressem KK, Lammert J

•papers•Jun 25 2025

Clinical decision-making is inherently complex, time-sensitive, and prone to error. AI-enabled clinical decision support systems (CDSS) offer promising solutions by leveraging large datasets to provide evidence-based recommendations. These systems range from rule-based and knowledge-based to increasingly AI-driven approaches. However, key challenges persist, particularly concerning data quality, seamless integration into clinical workflows, and clinician trust and acceptance. Ethical and legal considerations, especially data privacy, are also paramount.AI-CDSS have demonstrated success in fields like radiology (e.g., pulmonary nodule detection, mammography interpretation) and cardiology, where they enhance diagnostic accuracy and improve patient outcomes. Looking ahead, chat and voice interfaces powered by large language models (LLMs) could support shared decision-making (SDM) by fostering better patient engagement and understanding.To fully realize the potential of AI-CDSS in advancing efficient, patient-centered care, it is essential to ensure their responsible development. This includes grounding AI models in domain-specific data, anonymizing user inputs, and implementing rigorous validation of AI-generated outputs before presentation. Thoughtful design and ethical oversight will be critical to integrating AI safely and effectively into clinical practice.

Mixed Modality Detection Review In Silico Academic Lab Ethics Policy

Interventional Radiology Reporting Standards and Checklist for Artificial Intelligence Research Evaluation (iCARE).

Anibal JT, Huth HB, Boeken T, Daye D, Gichoya J, Muñoz FG, Chapiro J, Wood BJ, Sze DY, Hausegger K

•papers•Jun 25 2025

As artificial intelligence (AI) becomes increasingly prevalent within interventional radiology (IR) research and clinical practice, steps must be taken to ensure the robustness of novel technological systems presented in peer-reviewed journals. This report introduces comprehensive standards and an evaluation checklist (iCARE) that covers the application of modern AI methods in IR-specific contexts. The iCARE checklist encompasses the full "code-to-clinic" pipeline of AI development, including dataset curation, pre-training, task-specific training, explainability, privacy protection, bias mitigation, reproducibility, and model deployment. The iCARE checklist aims to support the development of safe, generalizable technologies for enhancing IR workflows, the delivery of care, and patient outcomes.

Mixed Modality Detection Vascular Review Concept Consortium Ethics Policy Reproducibility

Multimodal deep learning for predicting neoadjuvant treatment outcomes in breast cancer: a systematic review.

Krasniqi E, Filomeno L, Arcuri T, Ferretti G, Gasparro S, Fulvi A, Roselli A, D'Onofrio L, Pizzuti L, Barba M, Maugeri-Saccà M, Botti C, Graziano F, Puccica I, Cappelli S, Pelle F, Cavicchi F, Villanucci A, Paris I, Calabrò F, Rea S, Costantini M, Perracchio L, Sanguineti G, Takanen S, Marucci L, Greco L, Kayal R, Moscetti L, Marchesini E, Calonaci N, Blandino G, Caravagna G, Vici P

•papers•Jun 23 2025

Pathological complete response (pCR) to neoadjuvant systemic therapy (NAST) is an established prognostic marker in breast cancer (BC). Multimodal deep learning (DL), integrating diverse data sources (radiology, pathology, omics, clinical), holds promise for improving pCR prediction accuracy. This systematic review synthesizes evidence on multimodal DL for pCR prediction and compares its performance against unimodal DL. Following PRISMA, we searched PubMed, Embase, and Web of Science (January 2015-April 2025) for studies applying DL to predict pCR in BC patients receiving NAST, using data from radiology, digital pathology (DP), multi-omics, and/or clinical records, and reporting AUC. Data on study design, DL architectures, and performance (AUC) were extracted. A narrative synthesis was conducted due to heterogeneity. Fifty-one studies, mostly retrospective (90.2%, median cohort 281), were included. Magnetic resonance imaging and DP were common primary modalities. Multimodal approaches were used in 52.9% of studies, often combining imaging with clinical data. Convolutional neural networks were the dominant architecture (88.2%). Longitudinal imaging improved prediction over baseline-only (median AUC 0.91 vs. 0.82). Overall, the median AUC across studies was 0.88, with 35.3% achieving AUC ≥ 0.90. Multimodal models showed a modest but consistent improvement over unimodal approaches (median AUC 0.88 vs. 0.83). Omics and clinical text were rarely primary DL inputs. DL models demonstrate promising accuracy for pCR prediction, especially when integrating multiple modalities and longitudinal imaging. However, significant methodological heterogeneity, reliance on retrospective data, and limited external validation hinder clinical translation. Future research should prioritize prospective validation, integration underutilized data (multi-omics, clinical), and explainable AI to advance DL predictors to the clinical setting.

MRI Classification Breast Review In Silico Academic Lab Benchmark SOTA Ethics

Enabling PSO-Secure Synthetic Data Sharing Using Diversity-Aware Diffusion Models

Mischa Dombrowski, Bernhard Kainz

•preprint•Jun 22 2025

Synthetic data has recently reached a level of visual fidelity that makes it nearly indistinguishable from real data, offering great promise for privacy-preserving data sharing in medical imaging. However, fully synthetic datasets still suffer from significant limitations: First and foremost, the legal aspect of sharing synthetic data is often neglected and data regulations, such as the GDPR, are largley ignored. Secondly, synthetic models fall short of matching the performance of real data, even for in-domain downstream applications. Recent methods for image generation have focused on maximising image diversity instead of fidelity solely to improve the mode coverage and therefore the downstream performance of synthetic data. In this work, we shift perspective and highlight how maximizing diversity can also be interpreted as protecting natural persons from being singled out, which leads to predicate singling-out (PSO) secure synthetic datasets. Specifically, we propose a generalisable framework for training diffusion models on personal data which leads to unpersonal synthetic datasets achieving performance within one percentage point of real-data models while significantly outperforming state-of-the-art methods that do not ensure privacy. Our code is available at https://github.com/MischaD/Trichotomy.

Image Synthesis Methodology In Silico Academic Lab Open Code Ethics

AI in radiology: Powerful, promising… but alarmingly hackable.

Lecler A, Soyer P

•papers•Jun 21 2025

Review Ethics

Filter Papers

Tags