Sort by:
Page 1 of 33328 results
Next

Population-scale cross-sectional observational study for AI-powered TB screening on one million CXRs.

Munjal P, Mahrooqi AA, Rajan R, Jeremijenko A, Ahmad I, Akhtar MI, Pimentel MAF, Khan S

pubmed logopapersJul 9 2025
Traditional tuberculosis (TB) screening involves radiologists manually reviewing chest X-rays (CXR), which is time-consuming, error-prone, and limited by workforce shortages. Our AI model, AIRIS-TB (AI Radiology In Screening TB), aims to address these challenges by automating the reporting of all X-rays without any findings. AIRIS-TB was evaluated on over one million CXRs, achieving an AUC of 98.51% and overall false negative rate (FNR) of 1.57%, outperforming radiologists (1.85%) while maintaining a 0% TB-FNR. By selectively deferring only cases with findings to radiologists, the model has the potential to automate up to 80% of routine CXR reporting. Subgroup analysis revealed insignificant performance disparities across age, sex, HIV status, and region of origin, with sputum tests for suspected TB showing a strong correlation with model predictions. This large-scale validation demonstrates AIRIS-TB's safety and efficiency in high-volume TB screening programs, reducing radiologist workload without compromising diagnostic accuracy.

Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients

Qilong Xing, Zikai Song, Bingxin Gong, Lian Yang, Junqing Yu, Wei Yang

arxiv logopreprintJul 9 2025
Accurate prognosis of non-small cell lung cancer (NSCLC) patients undergoing immunotherapy is essential for personalized treatment planning, enabling informed patient decisions, and improving both treatment outcomes and quality of life. However, the lack of large, relevant datasets and effective multi-modal feature fusion strategies pose significant challenges in this domain. To address these challenges, we present a large-scale dataset and introduce a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. The dataset comprises 3D CT images and corresponding clinical records from NSCLC patients treated with immune checkpoint inhibitors (ICI), along with progression-free survival (PFS) and overall survival (OS) data. We further propose a cross-modality masked learning approach for medical feature fusion, consisting of two distinct branches, each tailored to its respective modality: a Slice-Depth Transformer for extracting 3D features from CT images and a graph-based Transformer for learning node features and relationships among clinical variables in tabular data. The fusion process is guided by a masked modality learning strategy, wherein the model utilizes the intact modality to reconstruct missing components. This mechanism improves the integration of modality-specific features, fostering more effective inter-modality relationships and feature interactions. Our approach demonstrates superior performance in multi-modal integration for NSCLC survival prediction, surpassing existing methods and setting a new benchmark for prognostic models in this context.

MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Qilong Xing, Zikai Song, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang

arxiv logopreprintJul 9 2025
Despite significant advancements in adapting Large Language Models (LLMs) for radiology report generation (RRG), clinical adoption remains challenging due to difficulties in accurately mapping pathological and anatomical features to their corresponding text descriptions. Additionally, semantic agnostic feature extraction further hampers the generation of accurate diagnostic reports. To address these challenges, we introduce Medical Concept Aligned Radiology Report Generation (MCA-RG), a knowledge-driven framework that explicitly aligns visual features with distinct medical concepts to enhance the report generation process. MCA-RG utilizes two curated concept banks: a pathology bank containing lesion-related knowledge, and an anatomy bank with anatomical descriptions. The visual features are aligned with these medical concepts and undergo tailored enhancement. We further propose an anatomy-based contrastive learning procedure to improve the generalization of anatomical features, coupled with a matching loss for pathological features to prioritize clinically relevant regions. Additionally, a feature gating mechanism is employed to filter out low-quality concept features. Finally, the visual features are corresponding to individual medical concepts, and are leveraged to guide the report generation process. Experiments on two public benchmarks (MIMIC-CXR and CheXpert Plus) demonstrate that MCA-RG achieves superior performance, highlighting its effectiveness in radiology report generation.

Dataset and Benchmark for Enhancing Critical Retained Foreign Object Detection

Yuli Wang, Victoria R. Shi, Liwei Zhou, Richard Chin, Yuwei Dai, Yuanyun Hu, Cheng-Yi Li, Haoyue Guan, Jiashu Cheng, Yu Sun, Cheng Ting Lin, Ihab Kamel, Premal Trivedi, Pamela Johnson, John Eng, Harrison Bai

arxiv logopreprintJul 9 2025
Critical retained foreign objects (RFOs), including surgical instruments like sponges and needles, pose serious patient safety risks and carry significant financial and legal implications for healthcare institutions. Detecting critical RFOs using artificial intelligence remains challenging due to their rarity and the limited availability of chest X-ray datasets that specifically feature critical RFOs cases. Existing datasets only contain non-critical RFOs, like necklace or zipper, further limiting their utility for developing clinically impactful detection algorithms. To address these limitations, we introduce "Hopkins RFOs Bench", the first and largest dataset of its kind, containing 144 chest X-ray images of critical RFO cases collected over 18 years from the Johns Hopkins Health System. Using this dataset, we benchmark several state-of-the-art object detection models, highlighting the need for enhanced detection methodologies for critical RFO cases. Recognizing data scarcity challenges, we further explore image synthetic methods to bridge this gap. We evaluate two advanced synthetic image methods, DeepDRR-RFO, a physics-based method, and RoentGen-RFO, a diffusion-based method, for creating realistic radiographs featuring critical RFOs. Our comprehensive analysis identifies the strengths and limitations of each synthetic method, providing insights into effectively utilizing synthetic data to enhance model training. The Hopkins RFOs Bench and our findings significantly advance the development of reliable, generalizable AI-driven solutions for detecting critical RFOs in clinical chest X-rays.

Airway Segmentation Network for Enhanced Tubular Feature Extraction

Qibiao Wu, Yagang Wang, Qian Zhang

arxiv logopreprintJul 9 2025
Manual annotation of airway regions in computed tomography images is a time-consuming and expertise-dependent task. Automatic airway segmentation is therefore a prerequisite for enabling rapid bronchoscopic navigation and the clinical deployment of bronchoscopic robotic systems. Although convolutional neural network methods have gained considerable attention in airway segmentation, the unique tree-like structure of airways poses challenges for conventional and deformable convolutions, which often fail to focus on fine airway structures, leading to missed segments and discontinuities. To address this issue, this study proposes a novel tubular feature extraction network, named TfeNet. TfeNet introduces a novel direction-aware convolution operation that first applies spatial rotation transformations to adjust the sampling positions of linear convolution kernels. The deformed kernels are then represented as line segments or polylines in 3D space. Furthermore, a tubular feature fusion module (TFFM) is designed based on asymmetric convolution and residual connection strategies, enhancing the network's focus on subtle airway structures. Extensive experiments conducted on one public dataset and two datasets used in airway segmentation challenges demonstrate that the proposed TfeNet achieves more accuracy and continuous airway structure predictions compared with existing methods. In particular, TfeNet achieves the highest overall score of 94.95% on the current largest airway segmentation dataset, Airway Tree Modeling(ATM22), and demonstrates advanced performance on the lung fibrosis dataset(AIIB23). The code is available at https://github.com/QibiaoWu/TfeNet.

MTMedFormer: multi-task vision transformer for medical imaging with federated learning.

Nath A, Shukla S, Gupta P

pubmed logopapersJul 8 2025
Deep learning has revolutionized medical imaging, improving tasks like image segmentation, detection, and classification, often surpassing human accuracy. However, the training of effective diagnostic models is hindered by two major challenges: the need for large datasets for each task and privacy laws restricting the sharing of medical data. Multi-task learning (MTL) addresses the first challenge by enabling a single model to perform multiple tasks, though convolution-based MTL models struggle with contextualizing global features. Federated learning (FL) helps overcome the second challenge by allowing models to train collaboratively without sharing data, but traditional methods struggle to aggregate stable feature maps due to the permutation-invariant nature of neural networks. To tackle these issues, we propose MTMedFormer, a transformer-based multi-task medical imaging model. We leverage the transformers' ability to learn task-agnostic features using a shared encoder and utilize task-specific decoders for robust feature extraction. By combining MTL with a hybrid loss function, MTMedFormer learns distinct diagnostic tasks in a synergistic manner. Additionally, we introduce a novel Bayesian federation method for aggregating multi-task imaging models. Our results show that MTMedFormer outperforms traditional single-task and MTL models on mammogram and pneumonia datasets, while our Bayesian federation method surpasses traditional methods in image segmentation.

An autonomous agent for auditing and improving the reliability of clinical AI models

Lukas Kuhn, Florian Buettner

arxiv logopreprintJul 8 2025
The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.

Performance of a deep-learning-based lung nodule detection system using 0.25-mm thick ultra-high-resolution CT images.

Higashibori H, Fukumoto W, Kusuda S, Yokomachi K, Mitani H, Nakamura Y, Awai K

pubmed logopapersJul 7 2025
Artificial intelligence (AI) algorithms for lung nodule detection assist radiologists. As their performance using ultra-high-resolution CT (U-HRCT) images has not been evaluated, we investigated the usefulness of 0.25-mm slices at U-HRCT using the commercially available deep-learning-based lung nodule detection (DL-LND) system. We enrolled 63 patients who underwent U-HRCT for lung cancer and suspected lung cancer. Two board-certified radiologists identified nodules more than 4 mm in diameter on 1-mm HRCT slices and set the reference standard consensually. They recorded all lesions detected on 5-, 1-, and 0.25-mm slices by the DL-LND system. Unidentified nodules were included in the reference standard. To examine the performance of the DL-LND system, the sensitivity, and positive predictive value (PPV) and the number of false positive (FP) nodules were recorded. The mean number of lesions detected on 5-, 1-, and 0.25-mm slices was 5.1, 7.8 and 7.2 per CT scan. On 5-mm slices the sensitivity and PPV were 79.8% and 46.4%; on 1-mm slices they were 91.5% and 34.8%, and on 0.25-mm slices they were 86.7% and 36.1%. The sensitivity was significantly higher on 1- than 5-mm slices (p < 0.01) while the PPV was significantly lower on 1- than 5-mm slices (p < 0.01). A slice thickness of 0.25 mm failed to improve its performance. The mean number of FP nodules on 5-, 1-, and 0.25-mm slices was 2.8, 5.2, and 4.7 per CT scan. We found that 1 mm was the best slice thickness for U-HRCT images using the commercially available DL-LND system.

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry, Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

MedGemma Technical Report

Andrew Sellergren, Sahar Kazemzadeh, Tiam Jaroensri, Atilla Kiraly, Madeleine Traverse, Timo Kohlberger, Shawn Xu, Fayaz Jamil, Cían Hughes, Charles Lau, Justin Chen, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Stefanie Anna Baby, Susanna Maria Baby, Jeremy Lai, Samuel Schmidgall, Lu Yang, Kejia Chen, Per Bjornsson, Shashir Reddy, Ryan Brush, Kenneth Philbrick, Howard Hu, Howard Yang, Richa Tiwari, Sunny Jansen, Preeti Singh, Yun Liu, Shekoofeh Azizi, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Riviere, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Elena Buchatskaya, Jean-Baptiste Alayrac, Dmitry Lepikhin, Vlad Feinberg, Sebastian Borgeaud, Alek Andreev, Cassidy Hardin, Robert Dadashi, Léonard Hussenot, Armand Joulin, Olivier Bachem, Yossi Matias, Katherine Chou, Avinatan Hassidim, Kavi Goel, Clement Farabet, Joelle Barral, Tris Warkentin, Jonathon Shlens, David Fleet, Victor Cotruta, Omar Sanseviero, Gus Martins, Phoebe Kirk, Anand Rao, Shravya Shetty, David F. Steiner, Can Kirmizibayrak, Rory Pilgrim, Daniel Golden, Lin Yang

arxiv logopreprintJul 7 2025
Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.
Page 1 of 33328 results
Show
per page

Ready to Sharpen Your Edge?

Join hundreds of your peers who rely on RadAI Slice. Get the essential weekly briefing that empowers you to navigate the future of radiology.

We respect your privacy. Unsubscribe at any time.