Sort by:
Page 10 of 42415 results

An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn, Florian Buettner

arxiv logopreprintJul 8 2025
The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.

Performance of a deep-learning-based lung nodule detection system using 0.25-mm thick ultra-high-resolution CT images.

Higashibori H, Fukumoto W, Kusuda S, Yokomachi K, Mitani H, Nakamura Y, Awai K

pubmed logopapersJul 7 2025
Artificial intelligence (AI) algorithms for lung nodule detection assist radiologists. As their performance using ultra-high-resolution CT (U-HRCT) images has not been evaluated, we investigated the usefulness of 0.25-mm slices at U-HRCT using the commercially available deep-learning-based lung nodule detection (DL-LND) system. We enrolled 63 patients who underwent U-HRCT for lung cancer and suspected lung cancer. Two board-certified radiologists identified nodules more than 4 mm in diameter on 1-mm HRCT slices and set the reference standard consensually. They recorded all lesions detected on 5-, 1-, and 0.25-mm slices by the DL-LND system. Unidentified nodules were included in the reference standard. To examine the performance of the DL-LND system, the sensitivity, and positive predictive value (PPV) and the number of false positive (FP) nodules were recorded. The mean number of lesions detected on 5-, 1-, and 0.25-mm slices was 5.1, 7.8 and 7.2 per CT scan. On 5-mm slices the sensitivity and PPV were 79.8% and 46.4%; on 1-mm slices they were 91.5% and 34.8%, and on 0.25-mm slices they were 86.7% and 36.1%. The sensitivity was significantly higher on 1- than 5-mm slices (p < 0.01) while the PPV was significantly lower on 1- than 5-mm slices (p < 0.01). A slice thickness of 0.25 mm failed to improve its performance. The mean number of FP nodules on 5-, 1-, and 0.25-mm slices was 2.8, 5.2, and 4.7 per CT scan. We found that 1 mm was the best slice thickness for U-HRCT images using the commercially available DL-LND system.

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays.

Aljuaid H, Albalahad H, Alshuaibi W, Almutairi S, Aljohani TH, Hussain N, Mohammad F

pubmed logopapersJul 7 2025
<b>Background:</b> Chest X-rays are rapidly gaining prominence as a prevalent diagnostic tool, as recognized by the World Health Organization (WHO). However, interpreting chest X-rays can be demanding and time-consuming, even for experienced radiologists, leading to potential misinterpretations and delays in treatment. <b>Method:</b> The purpose of this research is the development of a RadAI model. The RadAI model can accurately detect four types of lung abnormalities in chest X-rays and generate a report on each identified abnormality. Moreover, deep learning algorithms, particularly convolutional neural networks (CNNs), have demonstrated remarkable potential in automating medical image analysis, including chest X-rays. This work addresses the challenge of chest X-ray interpretation by fine tuning the following three advanced deep learning models: Feature-selective and Spatial Receptive Fields Network (FSRFNet50), ResNext50, and ResNet50. These models are compared based on accuracy, precision, recall, and F1-score. <b>Results:</b> The outstanding performance of RadAI shows its potential to assist radiologists to interpret the detected chest abnormalities accurately. <b>Conclusions:</b> RadAI is beneficial in enhancing the accuracy and efficiency of chest X-ray interpretation, ultimately supporting the timely and reliable diagnosis of lung abnormalities.

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

Xin You, Runze Yang, Chuyan Zhang, Zhongliang Jiang, Jie Yang, Nassir Navab

arxiv logopreprintJul 6 2025
The temporal interpolation task for 4D medical imaging, plays a crucial role in clinical practice of respiratory motion modeling. Following the simplified linear-motion hypothesis, existing approaches adopt optical flow-based models to interpolate intermediate frames. However, realistic respiratory motions should be nonlinear and quasi-periodic with specific frequencies. Intuited by this property, we resolve the temporal interpolation task from the frequency perspective, and propose a Fourier basis-guided Diffusion model, termed FB-Diff. Specifically, due to the regular motion discipline of respiration, physiological motion priors are introduced to describe general characteristics of temporal data distributions. Then a Fourier motion operator is elaborately devised to extract Fourier bases by incorporating physiological motion priors and case-specific spectral information in the feature space of Variational Autoencoder. Well-learned Fourier bases can better simulate respiratory motions with motion patterns of specific frequencies. Conditioned on starting and ending frames, the diffusion model further leverages well-learned Fourier bases via the basis interaction operator, which promotes the temporal interpolation task in a generative manner. Extensive results demonstrate that FB-Diff achieves state-of-the-art (SOTA) perceptual performance with better temporal consistency while maintaining promising reconstruction metrics. Codes are available.

Deep-Learning-Assisted Highly-Accurate COVID-19 Diagnosis on Lung Computed Tomography Images

Yinuo Wang, Juhyun Bae, Ka Ho Chow, Shenyang Chen, Shreyash Gupta

arxiv logopreprintJul 6 2025
COVID-19 is a severe and acute viral disease that can cause symptoms consistent with pneumonia in which inflammation is caused in the alveolous regions of the lungs leading to a build-up of fluid and breathing difficulties. Thus, the diagnosis of COVID using CT scans has been effective in assisting with RT-PCR diagnosis and severity classifications. In this paper, we proposed a new data quality control pipeline to refine the quality of CT images based on GAN and sliding windows. Also, we use class-sensitive cost functions including Label Distribution Aware Loss(LDAM Loss) and Class-balanced(CB) Loss to solve the long-tail problem existing in datasets. Our model reaches more than 0.983 MCC in the benchmark test dataset.

Early warning and stratification of the elderly cardiopulmonary dysfunction-related diseases: multicentre prospective study protocol.

Zhou X, Jin Q, Xia Y, Guan Y, Zhang Z, Guo Z, Liu Z, Li C, Bai Y, Hou Y, Zhou M, Liao WH, Lin H, Wang P, Liu S, Fan L

pubmed logopapersJul 5 2025
In China, there is a lack of standardised clinical imaging databases for multidimensional evaluation of cardiopulmonary diseases. To address this gap, this study protocol launched a project to build a clinical imaging technology integration and a multicentre database for early warning and stratification of cardiopulmonary dysfunction in the elderly. This study employs a cross-sectional design, enrolling over 6000 elderly participants from five regions across China to evaluate cardiopulmonary function and related diseases. Based on clinical criteria, participants are categorized into three groups: a healthy cardiopulmonary function group, a functional decrease group and an established cardiopulmonary diseases group. All subjects will undergo comprehensive assessments including chest CT scans, echocardiography, and laboratory examinations. Additionally, at least 50 subjects will undergo cardiopulmonary exercise testing (CPET). By leveraging artificial intelligence technology, multimodal data will be integrated to establish reference ranges for cardiopulmonary function in the elderly population, as well as to develop early-warning models and severity grading standard models. The study has been approved by the local ethics committee of Shanghai Changzheng Hospital (approval number: 2022SL069A). All the participants will sign the informed consent. The results will be disseminated through peer-reviewed publications and conferences.

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Haifeng Zhao, Yufei Zhang, Leilei Ma, Shuo Xu, Dengdi Sun

arxiv logopreprintJul 5 2025
Radiology report generation represents a significant application within medical AI, and has achieved impressive results. Concurrently, large language models (LLMs) have demonstrated remarkable performance across various domains. However, empirical validation indicates that general LLMs tend to focus more on linguistic fluency rather than clinical effectiveness, and lack the ability to effectively capture the relationship between X-ray images and their corresponding texts, thus resulting in poor clinical practicability. To address these challenges, we propose Optimal Transport-Driven Radiology Report Generation (OTDRG), a novel framework that leverages Optimal Transport (OT) to align image features with disease labels extracted from reports, effectively bridging the cross-modal gap. The core component of OTDRG is Alignment \& Fine-Tuning, where OT utilizes results from the encoding of label features and image visual features to minimize cross-modal distances, then integrating image and text features for LLMs fine-tuning. Additionally, we design a novel disease prediction module to predict disease labels contained in X-ray images during validation and testing. Evaluated on the MIMIC-CXR and IU X-Ray datasets, OTDRG achieves state-of-the-art performance in both natural language generation (NLG) and clinical efficacy (CE) metrics, delivering reports that are not only linguistically coherent but also clinically accurate.

Att-BrainNet: Attention-based BrainNet for lung cancer segmentation network.

Xiao X, Wang Z, Yao J, Wei J, Zhang B, Chen W, Geng Z, Song E

pubmed logopapersJul 5 2025
Most current medical image segmentation models employ a unified feature modeling strategy for all target regions. However, they overlook the significant heterogeneity in lesion structure, boundary characteristics, and semantic texture, which frequently restricts their ability to accurately segment morphologically diverse lesions in complex imaging contexts, thereby reducing segmentation accuracy and robustness. To address this issue, we propose a brain-inspired segmentation framework named BrainNet, which adopts a tri-level backbone encoder-Brain Network-decoder architecture. Such an architecture enables globally guided, locally differentiated feature modeling. We further instantiate the framework with an attention-enhanced segmentation model, termed Att-BrainNet. In this model, a Thalamus Gating Module (TGM) dynamically selects and activates structurally identical but functionally diverse Encephalic Region Networks (ERNs) to collaboratively extract lesion-specific features. In addition, an S-F image enhancement module is incorporated to improve sensitivity to boundaries and fine structures. Meanwhile, multi-head self-attention is embedded in the encoder to strengthen global semantic modeling and regional coordination. Experiments conducted on two lung cancer CT segmentation datasets and the Synapse multi-organ dataset demonstrate that Att-BrainNet outperforms existing mainstream segmentation models in terms of both accuracy and generalization. Further ablation studies and mechanism visualizations confirm the effectiveness of the BrainNet architecture and the dynamic scheduling strategy. This research provides a novel structural paradigm for medical image segmentation and holds promise for extension to other complex segmentation scenarios.
Page 10 of 42415 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.