Latest Papers on Radiology AI. Tags: Chest

MTMedFormer: multi-task vision transformer for medical imaging with federated learning.

Nath A, Shukla S, Gupta P

•papers•Jul 8 2025

Deep learning has revolutionized medical imaging, improving tasks like image segmentation, detection, and classification, often surpassing human accuracy. However, the training of effective diagnostic models is hindered by two major challenges: the need for large datasets for each task and privacy laws restricting the sharing of medical data. Multi-task learning (MTL) addresses the first challenge by enabling a single model to perform multiple tasks, though convolution-based MTL models struggle with contextualizing global features. Federated learning (FL) helps overcome the second challenge by allowing models to train collaboratively without sharing data, but traditional methods struggle to aggregate stable feature maps due to the permutation-invariant nature of neural networks. To tackle these issues, we propose MTMedFormer, a transformer-based multi-task medical imaging model. We leverage the transformers' ability to learn task-agnostic features using a shared encoder and utilize task-specific decoders for robust feature extraction. By combining MTL with a hybrid loss function, MTMedFormer learns distinct diagnostic tasks in a synergistic manner. Additionally, we introduce a novel Bayesian federation method for aggregating multi-task imaging models. Our results show that MTMedFormer outperforms traditional single-task and MTL models on mammogram and pneumonia datasets, while our Bayesian federation method surpasses traditional methods in image segmentation.

Mixed Modality Segmentation Chest Methodology In Silico Academic Lab Benchmark SOTA

An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn, Florian Buettner

•preprint•Jul 8 2025

The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA Reproducibility

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

•preprint•Jul 7 2025

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

X-Ray Classification Chest Methodology In Silico Big Tech Benchmark SOTA Open Code

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

•preprint•Jul 7 2025

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

X-Ray Classification Chest Methodology In Silico Big Tech GenAI Open Code

Performance of a deep-learning-based lung nodule detection system using 0.25-mm thick ultra-high-resolution CT images.

Higashibori H, Fukumoto W, Kusuda S, Yokomachi K, Mitani H, Nakamura Y, Awai K

•papers•Jul 7 2025

Artificial intelligence (AI) algorithms for lung nodule detection assist radiologists. As their performance using ultra-high-resolution CT (U-HRCT) images has not been evaluated, we investigated the usefulness of 0.25-mm slices at U-HRCT using the commercially available deep-learning-based lung nodule detection (DL-LND) system. We enrolled 63 patients who underwent U-HRCT for lung cancer and suspected lung cancer. Two board-certified radiologists identified nodules more than 4 mm in diameter on 1-mm HRCT slices and set the reference standard consensually. They recorded all lesions detected on 5-, 1-, and 0.25-mm slices by the DL-LND system. Unidentified nodules were included in the reference standard. To examine the performance of the DL-LND system, the sensitivity, and positive predictive value (PPV) and the number of false positive (FP) nodules were recorded. The mean number of lesions detected on 5-, 1-, and 0.25-mm slices was 5.1, 7.8 and 7.2 per CT scan. On 5-mm slices the sensitivity and PPV were 79.8% and 46.4%; on 1-mm slices they were 91.5% and 34.8%, and on 0.25-mm slices they were 86.7% and 36.1%. The sensitivity was significantly higher on 1- than 5-mm slices (p < 0.01) while the PPV was significantly lower on 1- than 5-mm slices (p < 0.01). A slice thickness of 0.25 mm failed to improve its performance. The mean number of FP nodules on 5-, 1-, and 0.25-mm slices was 2.8, 5.2, and 4.7 per CT scan. We found that 1 mm was the best slice thickness for U-HRCT images using the commercially available DL-LND system.

CT Detection Chest Retrospective Clinical Clinical Pilot Academic Lab

RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays.

Aljuaid H, Albalahad H, Alshuaibi W, Almutairi S, Aljohani TH, Hussain N, Mohammad F

•papers•Jul 7 2025

Background: Chest X-rays are rapidly gaining prominence as a prevalent diagnostic tool, as recognized by the World Health Organization (WHO). However, interpreting chest X-rays can be demanding and time-consuming, even for experienced radiologists, leading to potential misinterpretations and delays in treatment. Method: The purpose of this research is the development of a RadAI model. The RadAI model can accurately detect four types of lung abnormalities in chest X-rays and generate a report on each identified abnormality. Moreover, deep learning algorithms, particularly convolutional neural networks (CNNs), have demonstrated remarkable potential in automating medical image analysis, including chest X-rays. This work addresses the challenge of chest X-ray interpretation by fine tuning the following three advanced deep learning models: Feature-selective and Spatial Receptive Fields Network (FSRFNet50), ResNext50, and ResNet50. These models are compared based on accuracy, precision, recall, and F1-score. Results: The outstanding performance of RadAI shows its potential to assist radiologists to interpret the detected chest abnormalities accurately. Conclusions: RadAI is beneficial in enhancing the accuracy and efficiency of chest X-ray interpretation, ultimately supporting the timely and reliable diagnosis of lung abnormalities.

X-Ray Classification Chest Methodology In Silico

Deep-Learning-Assisted Highly-Accurate COVID-19 Diagnosis on Lung Computed Tomography Images

Yinuo Wang, Juhyun Bae, Ka Ho Chow, Shenyang Chen, Shreyash Gupta

•preprint•Jul 6 2025

COVID-19 is a severe and acute viral disease that can cause symptoms consistent with pneumonia in which inflammation is caused in the alveolous regions of the lungs leading to a build-up of fluid and breathing difficulties. Thus, the diagnosis of COVID using CT scans has been effective in assisting with RT-PCR diagnosis and severity classifications. In this paper, we proposed a new data quality control pipeline to refine the quality of CT images based on GAN and sliding windows. Also, we use class-sensitive cost functions including Label Distribution Aware Loss(LDAM Loss) and Class-balanced(CB) Loss to solve the long-tail problem existing in datasets. Our model reaches more than 0.983 MCC in the benchmark test dataset.

CT Classification Chest Methodology In Silico

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

Xin You, Runze Yang, Chuyan Zhang, Zhongliang Jiang, Jie Yang, Nassir Navab

•preprint•Jul 6 2025

The temporal interpolation task for 4D medical imaging, plays a crucial role in clinical practice of respiratory motion modeling. Following the simplified linear-motion hypothesis, existing approaches adopt optical flow-based models to interpolate intermediate frames. However, realistic respiratory motions should be nonlinear and quasi-periodic with specific frequencies. Intuited by this property, we resolve the temporal interpolation task from the frequency perspective, and propose a Fourier basis-guided Diffusion model, termed FB-Diff. Specifically, due to the regular motion discipline of respiration, physiological motion priors are introduced to describe general characteristics of temporal data distributions. Then a Fourier motion operator is elaborately devised to extract Fourier bases by incorporating physiological motion priors and case-specific spectral information in the feature space of Variational Autoencoder. Well-learned Fourier bases can better simulate respiratory motions with motion patterns of specific frequencies. Conditioned on starting and ending frames, the diffusion model further leverages well-learned Fourier bases via the basis interaction operator, which promotes the temporal interpolation task in a generative manner. Extensive results demonstrate that FB-Diff achieves state-of-the-art (SOTA) perceptual performance with better temporal consistency while maintaining promising reconstruction metrics. Codes are available.

CT Reconstruction Chest Methodology In Silico Academic Lab Benchmark SOTA Open Code

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Haifeng Zhao, Yufei Zhang, Leilei Ma, Shuo Xu, Dengdi Sun

•preprint•Jul 5 2025

Radiology report generation represents a significant application within medical AI, and has achieved impressive results. Concurrently, large language models (LLMs) have demonstrated remarkable performance across various domains. However, empirical validation indicates that general LLMs tend to focus more on linguistic fluency rather than clinical effectiveness, and lack the ability to effectively capture the relationship between X-ray images and their corresponding texts, thus resulting in poor clinical practicability. To address these challenges, we propose Optimal Transport-Driven Radiology Report Generation (OTDRG), a novel framework that leverages Optimal Transport (OT) to align image features with disease labels extracted from reports, effectively bridging the cross-modal gap. The core component of OTDRG is Alignment \& Fine-Tuning, where OT utilizes results from the encoding of label features and image visual features to minimize cross-modal distances, then integrating image and text features for LLMs fine-tuning. Additionally, we design a novel disease prediction module to predict disease labels contained in X-ray images during validation and testing. Evaluated on the MIMIC-CXR and IU X-Ray datasets, OTDRG achieves state-of-the-art performance in both natural language generation (NLG) and clinical efficacy (CE) metrics, delivering reports that are not only linguistically coherent but also clinically accurate.

X-Ray Report Generation Chest Methodology In Silico GenAI Benchmark SOTA

Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports.

Prucker P, Busch F, Dorfner F, Mertens CJ, Bayerl N, Makowski MR, Bressem KK, Adams LC

•papers•Jul 5 2025

Large Language Models (LLMs) show promise for generating patient-friendly radiology reports, but the performance of open-source versus proprietary LLMs needs assessment. To compare open-source and proprietary LLMs in generating patient-friendly radiology reports from chest CTs using quantitative readability metrics and qualitative assessments by radiologists. Fifty chest CT reports were processed by seven LLMs: three open-source models (Llama-3-70b, Mistral-7b, Mixtral-8x7b) and four proprietary models (GPT-4, GPT-3.5-Turbo, Claude-3-Opus, Gemini-Ultra). Simplification was evaluated using five quantitative readability metrics. Three radiologists rated patient-friendliness on a five-point Likert scale across five criteria. Content and coherence errors were counted. Inter-rater reliability and differences among models were statistically assessed. Inter-rater reliability was substantial to near perfect (κ = 0.76-0.86). Qualitatively, Llama-3-70b was non-inferior to leading proprietary models in 4/5 categories. GPT-3.5-Turbo showed the best overall readability, outperforming GPT-4 in two metrics. Llama-3-70b outperformed GPT-3.5-Turbo on the CLI (p = 0.006). Claude-3-Opus and Gemini-Ultra scored lower on readability but were rated highly in qualitative assessments. Claude-3-Opus maintained perfect factual accuracy. Claude-3-Opus and GPT-4 outperformed Llama-3-70b in emotional sensitivity (90.0 % vs 46.0 %, p < 0.001). Llama-3-70b shows strong potential in generating quality, patient-friendly radiology reports, challenging proprietary models. With further adaptation, open-source LLMs could advance patient-friendly reporting technology.

CT LLM Radiology Report Chest Retrospective Clinical In Silico Academic Lab GenAI

Filter Papers

Tags

MTMedFormer: multi-task vision transformer for medical imaging with federated learning.

An autonomous agent for auditing and improving the reliability of clinical AI models

MedGemma Technical Report

MedGemma Technical Report

Performance of a deep-learning-based lung nodule detection system using 0.25-mm thick ultra-high-resolution CT images.

RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays.

Deep-Learning-Assisted Highly-Accurate COVID-19 Diagnosis on Lung Computed Tomography Images

FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging

Bridging Vision and Language: Optimal Transport-Driven Radiology Report Generation via LLMs

Performance of open-source and proprietary large language models in generating patient-friendly radiology chest CT reports.

Ready to Sharpen Your Edge?