Latest Papers on Radiology AI. Tags: Chest

Prediction of Pulmonary Ground-Glass Nodule Progression State on Initial Screening CT Using a Radiomics-Based Model.

Jin L, Liu Z, Sun Y, Gao P, Ma Z, Ye H, Liu Z, Dong X, Sun Y, Han J, Lv L, Guan D, Li M

•papers•Sep 7 2025

Diagnosing pulmonary ground-glass nodules (GGNs) on chest CT imaging remains challenging in clinical practice. Moreover, different stages of GGNs may require different clinical treatments. Hence, we sought to predict the progressive state of pulmonary GGNs (absorption or persistence) for accurate clinical treatment and decision-making. We retrospectively enrolled 672 patients (absorption group: 299; control group: 373) from two medical centres from January 2017 to March 2023. Clinical information and radiomic features extracted from regions of interest of all patients on chest CT imaging were collected. All patients were randomly divided into training and test sets at a ratio of 7:3. Three models were constructed-Rad-score (Model 1), clinical factor (Model 2), and clinical factors and Rad-score (Model 3)-to identify GGN progression. In the test dataset, two radiologists (with over 8 years of experience in chest imaging) evaluated the models' performance. Receiver operating characteristic curves, accuracy, sensitivity, and specificity were analysed. In the test set, the area under the curve (AUC) of Model 1 and Model 2 was 0.907 [0.868-0.946] and 0.918 [0.88-0.955], respectively. Model 3 achieved the best predictive performance, with an AUC of 0.959 [0.936-0.982], an accuracy of 0.881, a sensitivity of 0.902, and a specificity of 0.856. The intraclass correlation coefficient of Model 3 (0.86) showed better performance than radiologists (0.83 and 0.71). We developed and validated a radiomics-based machine-learning method that achieved good performance in predicting the progressive state of GGNs on initial computed tomography. The model may improve follow-up management of GGNs.

CT Classification Chest Retrospective Clinical In Silico

Prediction of bronchopulmonary dysplasia using machine learning from chest X-rays of premature infants in the neonatal intensive care unit.

Ozcelik G, Erol S, Korkut S, Kose Cetinkaya A, Ozcelik H

•papers•Sep 5 2025

Bronchopulmonary dysplasia (BPD) is a significant morbidity in premature infants. This study aimed to assess the accuracy of the model's predictions in comparison to clinical outcomes. Medical records of premature infants born ≤ 28 weeks and < 1250 g between January 1, 2020, and December 31, 2021, in the neonatal intensive care unit were obtained. In this retrospective model development and validation study, an artificial intelligence model was developed using DenseNet121 deep learning architecture. The data set and test set consisted of chest radiographs obtained on postnatal day 1 as well as during the 2nd, 3rd, and 4th weeks. The model predicted the likelihood of developing no BPD, or mild, moderate, or severe BPD. The accuracy of the artificial intelligence model was tested based on the clinical outcomes of patients. This study included 122 premature infants with a birth weight of 990 g (range: 840-1120 g). Of these, 33 (27%) patients did not develop BPD, 24 (19.7%) had mild BPD, 28 (23%) had moderate BPD, and 37 (30%) had severe BPD. A total of 395 chest radiographs from these patients were used to develop an artificial intelligence (AI) model for predicting BPD. Area under the curve values, representing the accuracy of predicting severe, moderate, mild, and no BPD, were as follows: 0.79, 0.75, 0.82, and 0.82 for day 1 radiographs; 0.88, 0.82, 0.74, and 0.94 for week 2 radiographs; 0.87, 0.83, 0.88, and 0.96 for week 3 radiographs; and 0.90, 0.82, 0.86, and 0.97 for week 4 radiographs. The artificial intelligence model successfully identified BPD on chest radiographs and classified its severity. The accuracy of the model can be improved using larger control and external validation datasets.

X-Ray Classification Chest Retrospective Clinical In Silico Academic Lab

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

Gulamali F, Sawant AS, Liharska L, Horowitz C, Chan L, Hofer I, Singh K, Richardson L, Mensah E, Charney A, Reich D, Hu J, Nadkarni G

•papers•Sep 4 2025

The growing adoption of diagnostic and prognostic algorithms in health care has led to concerns about the perpetuation of algorithmic bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success and tradeoffs. However, there have been limited substantive efforts to address bias at the level of the data used to generate algorithms in health care datasets. The aim of this study is to create a simple metric (AEquity) that uses a learning curve approximation to distinguish and mitigate bias via guided dataset collection or relabeling. We demonstrate this metric in 2 well-known examples, chest X-rays and health care cost utilization, and detect novel biases in the National Health and Nutrition Examination Survey. We demonstrated that using AEquity to guide data-centric collection for each diagnostic finding in the chest radiograph dataset decreased bias by between 29% and 96.5% when measured by differences in area under the curve. Next, we wanted to examine (1) whether AEquity worked on intersectional populations and (2) if AEquity is invariant to different types of fairness metrics, not just area under the curve. Subsequently, we examined the effect of AEquity on mitigating bias when measured by false negative rate, precision, and false discovery rate for Black patients on Medicaid. When we examined Black patients on Medicaid, at the intersection of race and socioeconomic status, we found that AEquity-based interventions reduced bias across a number of different fairness metrics including overall false negative rate by 33.3% (bias reduction absolute=1.88×10-1, 95% CI 1.4×10-1 to 2.5×10-1; bias reduction of 33.3%, 95% CI 26.6%-40%; precision bias by 7.50×10-2, 95% CI 7.48×10-2 to 7.51×10-2; bias reduction of 94.6%, 95% CI 94.5%-94.7%; false discovery rate by 94.5%; absolute bias reduction=3.50×10-2, 95% CI 3.49×10-2 to 3.50×10-2). Similarly, AEquity-guided data collection demonstrated bias reduction of up to 80% on mortality prediction with the National Health and Nutrition Examination Survey (bias reduction absolute=0.08, 95% CI 0.07-0.09). Then, we wanted to compare AEquity to state-of-the-art data-guided debiasing measures such as balanced empirical risk minimization and calibration. Consequently, we benchmarked against balanced empirical risk minimization and calibration and showed that AEquity-guided data collection outperforms both standard approaches. Moreover, we demonstrated that AEquity works on fully connected networks; convolutional neural networks such as ResNet-50; transformer architectures such as VIT-B-16, a vision transformer with 86 million parameters; and nonparametric methods such as Light Gradient-Boosting Machine. In short, we demonstrated that AEquity is a robust tool by applying it to different datasets, algorithms, and intersectional analyses and measuring its effectiveness with respect to a range of traditional fairness metrics.

X-Ray Classification Chest Methodology In Silico Ethics Benchmark SOTA

Chest X-ray Pneumothorax Segmentation Using EfficientNet-B4 Transfer Learning in a U-Net Architecture

Alvaro Aranibar Roque, Helga Sebastian

•preprint•Sep 4 2025

Pneumothorax, the abnormal accumulation of air in the pleural space, can be life-threatening if undetected. Chest X-rays are the first-line diagnostic tool, but small cases may be subtle. We propose an automated deep-learning pipeline using a U-Net with an EfficientNet-B4 encoder to segment pneumothorax regions. Trained on the SIIM-ACR dataset with data augmentation and a combined binary cross-entropy plus Dice loss, the model achieved an IoU of 0.7008 and Dice score of 0.8241 on the independent PTX-498 dataset. These results demonstrate that the model can accurately localize pneumothoraces and support radiologists.

X-Ray Segmentation Chest Methodology In Silico

A Generative Foundation Model for Chest Radiography

Yuanfeng Ji, Dan Lin, Xiyue Wang, Lu Zhang, Wenhui Zhou, Chongjian Ge, Ruihang Chu, Xiaoli Yang, Junhan Zhao, Junsong Chen, Xiangde Luo, Sen Yang, Jin Fang, Ping Luo, Ruijiang Li

•preprint•Sep 4 2025

The scarcity of well-annotated diverse medical images is a major hurdle for developing reliable AI models in healthcare. Substantial technical advances have been made in generative foundation models for natural images. Here we develop `ChexGen', a generative vision-language foundation model that introduces a unified framework for text-, mask-, and bounding box-guided synthesis of chest radiographs. Built upon the latent diffusion transformer architecture, ChexGen was pretrained on the largest curated chest X-ray dataset to date, consisting of 960,000 radiograph-report pairs. ChexGen achieves accurate synthesis of radiographs through expert evaluations and quantitative metrics. We demonstrate the utility of ChexGen for training data augmentation and supervised pretraining, which led to performance improvements across disease classification, detection, and segmentation tasks using a small fraction of training data. Further, our model enables the creation of diverse patient cohorts that enhance model fairness by detecting and mitigating demographic biases. Our study supports the transformative role of generative foundation models in building more accurate, data-efficient, and equitable medical AI systems.

X-Ray Image Synthesis Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA Open Dataset

A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning

Qika Lin, Yifan Zhu, Bin Pu, Ling Huang, Haoran Luo, Jingying Ma, Zhen Peng, Tianzhe Zhao, Fangzhi Xu, Jian Zhang, Kai He, Zhonghong Ou, Swapnil Mishra, Mengling Feng

•preprint•Sep 4 2025

Medical foundation models (FMs) have shown tremendous promise amid the rapid advancements in artificial intelligence (AI) technologies. However, current medical FMs typically generate answers in a black-box manner, lacking transparent reasoning processes and locally grounded interpretability, which hinders their practical clinical deployments. To this end, we introduce DeepMedix-R1, a holistic medical FM for chest X-ray (CXR) interpretation. It leverages a sequential training pipeline: initially fine-tuned on curated CXR instruction data to equip with fundamental CXR interpretation capabilities, then exposed to high-quality synthetic reasoning samples to enable cold-start reasoning, and finally refined via online reinforcement learning to enhance both grounded reasoning quality and generation performance. Thus, the model produces both an answer and reasoning steps tied to the image's local regions for each query. Quantitative evaluation demonstrates substantial improvements in report generation (e.g., 14.54% and 31.32% over LLaVA-Rad and MedGemma) and visual question answering (e.g., 57.75% and 23.06% over MedGemma and CheXagent) tasks. To facilitate robust assessment, we propose Report Arena, a benchmarking framework using advanced language models to evaluate answer quality, further highlighting the superiority of DeepMedix-R1. Expert review of generated reasoning steps reveals greater interpretability and clinical plausibility compared to the established Qwen2.5-VL-7B model (0.7416 vs. 0.2584 overall preference). Collectively, our work advances medical FM development toward holistic, transparent, and clinically actionable modeling for CXR interpretation.

X-Ray Report Generation Chest Methodology In Silico Academic Lab GenAI Benchmark SOTA

Lung lobe segmentation: performance of open-source MOOSE, TotalSegmentator, and LungMask models compared to a local in-house model.

Amini E, Klein R

•papers•Sep 4 2025

Lung lobe segmentation is required to assess lobar function with nuclear imaging before surgical interventions. We evaluated the performance of open-source deep learning-based lung lobe segmentation tools, compared to a similar nnU-Net model trained on a smaller but more representative clinical dataset. We collated and semi-automatically segmented an internal dataset of 164 computed tomography scans and classified them for task difficulty as easy, moderate, or hard. The performance of three open-source models-multi-organ objective segmentation (MOOSE), TotalSegmentator, and LungMask-was assessed using Dice similarity coefficient (DSC), robust Hausdorff distance (rHd95), and normalized surface distance (NSD). Additionally, we trained, validated, and tested an nnU-Net model using our local dataset and compared its performance with that of the other software on the test subset. All models were evaluated for generalizability using an external competition (LOLA11, n = 55). TotalSegmentator outperformed MOOSE in DSC and NSD across all difficulty levels (p < 0.001), but not in rHd95 (p = 1.000). MOOSE and TotalSegmentator surpassed LungMask across metrics and difficulty classes (p < 0.001). Our model exceeded all other models on the internal dataset (n = 33) in all metrics, across all difficulty classes (p < 0.001), and on the external dataset. Missing lobes were correctly identified only by our model and LungMask in 3 and 1 of 7 cases, respectively. Open-source segmentation tools perform well in straightforward cases but struggle in unfamiliar, complex cases. Training on diverse, specialized datasets can improve generalizability, emphasizing representative data over sheer quantity. Training lung lobe segmentation models on a local variety of cases improves accuracy, thus enhancing presurgical planning, ventilation-perfusion analysis, and disease localization, potentially impacting treatment decisions and patient outcomes in respiratory and thoracic care. Deep learning models trained on non-specialized datasets struggle with complex lung anomalies, yet their real-world limitations are insufficiently assessed. Training an identical model on a smaller yet clinically diverse and representative cohort improved performance in challenging cases. Data diversity outweighs the quantity in deep learning-based segmentation models. Accurate lung lobe segmentation may enhance presurgical assessment of lung lobar ventilation and perfusion function, optimizing clinical decision-making and patient outcomes.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Open Code Benchmark SOTA Open Dataset

Convolutional neural network application for automated lung cancer detection on chest CT using Google AI Studio.

Aljneibi Z, Almenhali S, Lanca L

•papers•Sep 3 2025

This study aimed to evaluate the diagnostic performance of an artificial intelligence (AI)-enhanced model for detecting lung cancer on computed tomography (CT) images of the chest. It assessed diagnostic accuracy, sensitivity, specificity, and interpretative consistency across normal, benign, and malignant cases. An exploratory analysis was performed using the publicly available IQ-OTH/NCCD dataset, comprising 110 CT cases (55 normal, 15 benign, 40 malignant). A pre-trained convolutional neural network in Google AI Studio was fine-tuned using 25 training images and tested on a separate image from each case. Quantitative evaluation of diagnostic accuracy and qualitative content analysis of AI-generated reports was conducted to assess diagnostic patterns and interpretative behavior. The AI model achieved an overall accuracy of 75.5 %, with a sensitivity of 74.5 % and specificity of 76.4 %. The area under the ROC curve (AUC) for all cases was 0.824 (95 % CI: 0.745-0.897), indicating strong discriminative power. Malignant cases had the highest classification performance (AUC = 0.902), while benign cases were more challenging to classify (AUC = 0.615). Qualitative analysis showed the AI used consistent radiological terminology, but demonstrated oversensitivity to ground-glass opacities, contributing to false positives in non-malignant cases. The AI model showed promising diagnostic potential, particularly in identifying malignancies. However, specificity limitations and interpretative errors in benign and normal cases underscore the need for human oversight and continued model refinement. AI-enhanced CT interpretation can improve efficiency in high-volume settings but should serve as a decision-support tool rather than a replacement for expert image review.

CT Classification Chest Retrospective Clinical In Silico Big Tech

MetaPredictomics: A Comprehensive Approach to Predict Postsurgical Non-Small Cell Lung Cancer Recurrence Using Clinicopathologic, Radiomics, and Organomics Data.

Amini M, Hajianfar G, Salimi Y, Mansouri Z, Zaidi H

•papers•Sep 3 2025

Non-small cell lung cancer (NSCLC) is a complex disease characterized by diverse clinical, genetic, and histopathologic traits, necessitating personalized treatment approaches. While numerous biomarkers have been introduced for NSCLC prognostication, no single source of information can provide a comprehensive understanding of the disease. However, integrating biomarkers from multiple sources may offer a holistic view of the disease, enabling more accurate predictions. In this study, we present MetaPredictomics, a framework that integrates clinicopathologic data with PET/CT radiomics from the primary tumor and presumed healthy organs (referred to as "organomics") to predict postsurgical recurrence. A fully automated deep learning-based segmentation model was employed to delineate 19 affected (whole lung and the affected lobe) and presumed healthy organs from CT images of the presurgical PET/CT scans of 145 NSCLC patients sourced from a publicly available data set. Using PyRadiomics, 214 features (107 from CT, 107 from PET) were extracted from the gross tumor volume (GTV) and each segmented organ. In addition, a clinicopathologic feature set was constructed, incorporating clinical characteristics, histopathologic data, gene mutation status, conventional PET imaging biomarkers, and patients' treatment history. GTV Radiomics, each of the organomics, and the clinicopathologic feature sets were each fed to a time-to-event prediction machine, based on glmboost, to establish first-level models. The risk scores obtained from the first-level models were then used as inputs for meta models developed using a stacked ensemble approach. Questing optimized performance, we assessed meta models established upon all combinations of first-level models with concordance index (C-index) ≥0.6. The performance of all the models was evaluated using the average C-index across a unique 3-fold cross-validation scheme for fair comparison. The clinicopathologic model outperformed other first-level models with a C-index of 0.67, followed closely by GTV radiomics model with C-index of 0.65. Among the organomics models, whole-lung and aorta models achieved top performance with a C-index of 0.65, while 12 organomics models achieved C-indices of ≥0.6. Meta models significantly outperformed the first-level models with the top 100 achieving C-indices between 0.703 and 0.731. The clinicopathologic, whole lung, esophagus, pancreas, and GTV models were the most frequently present models in the top 100 meta models with frequencies of 98, 71, 69, 62, and 61, respectively. In this study, we highlighted the value of maximizing the use of medical imaging for NSCLC recurrence prognostication by incorporating data from various organs, rather than focusing solely on the tumor and its immediate surroundings. This multisource integration proved particularly beneficial in the meta models, where combining clinicopathologic data with tumor radiomics and organomics models significantly enhanced recurrence prediction.

Mixed Modality Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Anisotropic Fourier Features for Positional Encoding in Medical Imaging

Nabil Jabareen, Dongsheng Yuan, Dingming Liu, Foo-Wei Ten, Sören Lukassen

•preprint•Sep 2 2025

The adoption of Transformer-based architectures in the medical domain is growing rapidly. In medical imaging, the analysis of complex shapes - such as organs, tissues, or other anatomical structures - combined with the often anisotropic nature of high-dimensional images complicates these adaptations. In this study, we critically examine the role of Positional Encodings (PEs), arguing that commonly used approaches may be suboptimal for the specific challenges of medical imaging. Sinusoidal Positional Encodings (SPEs) have proven effective in vision tasks, but they struggle to preserve Euclidean distances in higher-dimensional spaces. Isotropic Fourier Feature Positional Encodings (IFPEs) have been proposed to better preserve Euclidean distances, but they lack the ability to account for anisotropy in images. To address these limitations, we propose Anisotropic Fourier Feature Positional Encoding (AFPE), a generalization of IFPE that incorporates anisotropic, class-specific, and domain-specific spatial dependencies. We systematically benchmark AFPE against commonly used PEs on multi-label classification in chest X-rays, organ classification in CT images, and ejection fraction regression in echocardiography. Our results demonstrate that choosing the correct PE can significantly improve model performance. We show that the optimal PE depends on the shape of the structure of interest and the anisotropy of the data. Finally, our proposed AFPE significantly outperforms state-of-the-art PEs in all tested anisotropic settings. We conclude that, in anisotropic medical images and videos, it is of paramount importance to choose an anisotropic PE that fits the data and the shape of interest.

Mixed Modality Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Prediction of Pulmonary Ground-Glass Nodule Progression State on Initial Screening CT Using a Radiomics-Based Model.

Prediction of bronchopulmonary dysplasia using machine learning from chest X-rays of premature infants in the neonatal intensive care unit.

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study.

Chest X-ray Pneumothorax Segmentation Using EfficientNet-B4 Transfer Learning in a U-Net Architecture

A Generative Foundation Model for Chest Radiography

A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning

Lung lobe segmentation: performance of open-source MOOSE, TotalSegmentator, and LungMask models compared to a local in-house model.

Convolutional neural network application for automated lung cancer detection on chest CT using Google AI Studio.

MetaPredictomics: A Comprehensive Approach to Predict Postsurgical Non-Small Cell Lung Cancer Recurrence Using Clinicopathologic, Radiomics, and Organomics Data.

Anisotropic Fourier Features for Positional Encoding in Medical Imaging

Ready to Sharpen Your Edge?