Latest Papers on Radiology AI. Tags: Chest

Malignancy risk stratification for pulmonary nodules: comparing a deep learning approach to multiparametric statistical models in different disease groups.

Piskorski L, Debic M, von Stackelberg O, Schlamp K, Welzel L, Weinheimer O, Peters AA, Wielpütz MO, Frauenfelder T, Kauczor HU, Heußel CP, Kroschke J

•papers•Jul 1 2025

Incidentally detected pulmonary nodules present a challenge in clinical routine with demand for reliable support systems for risk classification. We aimed to evaluate the performance of the lung-cancer-prediction-convolutional-neural-network (LCP-CNN), a deep learning-based approach, in comparison to multiparametric statistical methods (Brock model and Lung-RADS®) for risk classification of nodules in cohorts with different risk profiles and underlying pulmonary diseases. Retrospective analysis was conducted on non-contrast and contrast-enhanced CT scans containing pulmonary nodules measuring 5-30 mm. Ground truth was defined by histology or follow-up stability. The final analysis was performed on 297 patients with 422 eligible nodules, of which 105 nodules were malignant. Classification performance of the LCP-CNN, Brock model, and Lung-RADS® was evaluated in terms of diagnostic accuracy measurements including ROC-analysis for different subcohorts (total, screening, emphysema, and interstitial lung disease). LCP-CNN demonstrated superior performance compared to the Brock model in total and screening cohorts (AUC 0.92 (95% CI: 0.89-0.94) and 0.93 (95% CI: 0.89-0.96)). Superior sensitivity of LCP-CNN was demonstrated compared to the Brock model and Lung-RADS® in total, screening, and emphysema cohorts for a risk threshold of 5%. Superior sensitivity of LCP-CNN was also shown across all disease groups compared to the Brock model at a threshold of 65%, compared to Lung-RADS® sensitivity was better or equal. No significant differences in the performance of LCP-CNN were found between subcohorts. This study offers further evidence of the potential to integrate deep learning-based decision support systems into pulmonary nodule classification workflows, irrespective of the individual patient risk profile and underlying pulmonary disease. Question Is a deep-learning approach (LCP-CNN) superior to multiparametric models (Brock model, Lung-RADS®) in classifying pulmonary nodule risk across varied patient profiles? Findings LCP-CNN shows superior performance in risk classification of pulmonary nodules compared to multiparametric models with no significant impact on risk profiles and structural pulmonary diseases. Clinical relevance LCP-CNN offers efficiency and accuracy, addressing limitations of traditional models, such as variations in manual measurements or lack of patient data, while producing robust results. Such approaches may therefore impact clinical work by complementing or even replacing current approaches.

CT Classification Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Repeatability of AI-based, automatic measurement of vertebral and cardiovascular imaging biomarkers in low-dose chest CT: the ImaLife cohort.

Hamelink I, van Tuinen M, Kwee TC, van Ooijen PMA, Vliegenthart R

•papers•Jul 1 2025

To evaluate the repeatability of AI-based automatic measurement of vertebral and cardiovascular markers on low-dose chest CT. We included participants of the population-based Imaging in Lifelines (ImaLife) study with low-dose chest CT at baseline and 3-4 month follow-up. An AI system (AI-Rad Companion chest CT prototype) performed automatic segmentation and quantification of vertebral height and density, aortic diameters, heart volume (cardiac chambers plus pericardial fat), and coronary artery calcium volume (CACV). A trained researcher visually checked segmentation accuracy. We evaluated the repeatability of adequate AI-based measurements at baseline and repeat scan using Intraclass Correlation Coefficient (ICC), relative differences, and change in CACV risk categorization, assuming no physiological change. Overall, 632 participants (63 ± 11 years; 56.6% men) underwent short-term repeat CT (mean interval, 3.9 ± 1.8 months). Visual assessment showed adequate segmentation in both baseline and repeat scan for 98.7% of vertebral measurements, 80.1-99.4% of aortic measurements (except for the sinotubular junction (65.2%)), and 86.0% of CACV. For heart volume, 53.5% of segmentations were adequate at baseline and repeat scans. ICC for adequately segmented cases showed excellent agreement for all biomarkers (ICC > 0.9). Relative difference between baseline and repeat measurements was < 4% for vertebral and aortic measurements, 7.5% for heart volume, and 28.5% for CACV. There was high concordance in CACV risk categorization (81.2%). In low-dose chest CT, segmentation accuracy of AI-based software was high for vertebral, aortic, and CACV evaluation and relatively low for heart volume. There was excellent repeatability of vertebral and aortic measurements and high concordance in overall CACV risk categorization. Question Can AI algorithms for opportunistic screening in chest CT obtain an accurate and repeatable result when applied to multiple CT scans of the same participant? Findings Vertebral and aortic analysis showed accurate segmentation and excellent repeatability; coronary calcium segmentation was generally accurate but showed modest repeatability due to a non-electrocardiogram-triggered protocol. Clinical relevance Opportunistic screening for diseases outside the primary purpose of the CT scan is time-consuming. AI allows automated vertebral, aortic, and coronary artery calcium (CAC) assessment, with highly repeatable outcomes of vertebral and aortic biomarkers and high concordance in overall CAC categorization.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab

CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.

Lee S, Youn J, Kim H, Kim M, Yoon SH

•papers•Jul 1 2025

This study aimed to develop an open-source multimodal large language model (CXR-LLaVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists. For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLaVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.56 for six major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. Question How can a multimodal large language model be adapted to interpret chest X-rays and generate radiologic reports? Findings The developed CXR-LLaVA model effectively detects major pathological findings in chest X-rays and generates radiologic reports with a higher accuracy compared to general-purpose models. Clinical relevance This study demonstrates the potential of multimodal large language models to support radiologists by autonomously generating chest X-ray reports, potentially reducing diagnostic workloads and improving radiologist efficiency.

X-Ray LLM Radiology Report Chest Methodology In Silico Academic Lab Open Code GenAI

Automatic recognition and differentiation of pulmonary contusion and bacterial pneumonia based on deep learning and radiomics.

Deng T, Feng J, Le X, Xia Y, Shi F, Yu F, Zhan Y, Liu X, Li C

•papers•Jul 1 2025

In clinical work, there are difficulties in distinguishing pulmonary contusion(PC) from bacterial pneumonia(BP) on CT images by the naked eye alone when the history of trauma is unknown. Artificial intelligence is widely used in medical imaging, but its diagnostic performance for pulmonary contusion is unclear. In this study, artificial intelligence was used for the first time to identify lung contusion and bacterial pneumonia, and its diagnostic performance was compared with that of manual. In this retrospective study, 2179 patients between April 2016 and July 2022 from two hospitals were collected and divided into a training set, an internal validation set, an external validation set. PC and BP were automatically recognized, segmented using VB-net and radiomics features were automatically extracted. Four machine learning algorithms including Decision Trees, Logistic Regression, Random Forests and Support Vector Machines(SVM) were using to built the models. De-long test was used to compare the performance among models. The best performing model and four radiologists diagnosed the external validation set, and compare the diagnostic efficacy of human and artificial intelligence. VB-net automatically detected and segmented PC and BP. Among the four machine learning models we've built, De-long test showed that SVM model had the best performance, with AUC, accuracy, sensitivity, and specificity of 0.998 (95% CI: 0.995-1), 0.980, 0.979, 0.982 in the training set, 0.891 (95% CI: 0.854-0.928), 0.979, 0.750, 0.860 in the internal validation set, 0.885 (95% CI: 0.850-0.920), 0.903, 0.976, 0.794 in the external validation set. The diagnostic ability of the SVM model was superior to that of human (P < 0.05). Our VB-net automatically recognizes and segments PC and BP in chest CT images. SVM model based on radiomics features can quickly and accurately differentiate between them with higher accuracy than experienced radiologist.

CT Segmentation Chest Retrospective Clinical In Silico Academic Lab Benchmark SOTA

Artificial Intelligence Iterative Reconstruction for Dose Reduction in Pediatric Chest CT: A Clinical Assessment via Below 3 Years Patients With Congenital Heart Disease.

Zhang F, Peng L, Zhang G, Xie R, Sun M, Su T, Ge Y

•papers•Jul 1 2025

To assess the performance of a newly introduced deep learning-based reconstruction algorithm, namely the artificial intelligence iterative reconstruction (AIIR), in reducing the dose of pediatric chest CT by using the image data of below 3-year-old patients with congenital heart disease (CHD). The lung image available from routine-dose cardiac CT angiography (CTA) on below 3 years patients with CHD was employed as a reference for evaluating the paired low-dose chest CT. A total of 191 subjects were prospectively enrolled, where the dose for chest CT was reduced to ~0.1 mSv while the cardiac CTA protocol was kept unchanged. The low-dose chest CT images, obtained with the AIIR and the hybrid iterative reconstruction (HIR), were compared in image quality, ie, overall image quality and lung structure depiction, and in diagnostic performance, ie, severity assessment of pneumonia and airway stenosis. Compared with the reference, lung image quality was not found significantly different on low-dose AIIR images (all P >0.05) but obviously inferior with the HIR (all P <0.05). Compared with the HIR, low-dose AIIR images also achieved a closer pneumonia severity index (AIIR 4.32±3.82 vs. Ref 4.37±3.84, P >0.05; HIR 5.12±4.06 vs. Ref 4.37±3.84, P <0.05) and airway stenosis grading (consistently graded: AIIR 88.5% vs. HIR 56.5% ) to the reference. AIIR has the potential for large dose reduction in chest CT of patients below 3 years of age while preserving image quality and achieving diagnostic results nearly equivalent to routine dose scans.

CT Reconstruction Chest Prospective Clinical Pilot Startup

ToolCAP: Novel Tools to improve management of paediatric Community-Acquired Pneumonia - a randomized controlled trial- Statistical Analysis Plan

Cicconi, S., Glass, T., Du Toit, J., Bresser, M., Dhalla, F., Faye, P. M., Lal, L., Langet, H., Manji, K., Moser, A., Ndao, M. A., Palmer, M., Tine, J. A. D., Van Hoving, N., Keitel, K.

•preprint•Jun 30 2025

The ToolCAP cohort study is a prospective, observational, multi-site platform study designed to collect harmonized, high-quality clinical, imaging, and biological data on children with IMCI-defined pneumonia in low- and middle-income countries (LMICs). The primary objective is to inform the development and validation of diagnostic and prognostic tools, including lung ultrasound (LUS), point-of-care biomarkers, and AI-based models, to improve pneumonia diagnosis, management, and antimicrobial stewardship. This statistical analysis plan (SAP) outlines the analytic strategy for describing the study population, assessing the performance of candidate diagnostic tools, and enabling data sharing in support of secondary research questions and AI model development. Children under 12 years presenting with suspected pneumonia are enrolled within 24 hours of presentation and undergo clinical assessment, digital auscultation, LUS, and optional biological sampling. Follow-up occurs on Day 8 and Day 29 to assess outcomes including recovery, treatment response, and complications. The SAP details variable definitions, data management strategies, and pre-specified analyses, including descriptive summaries, sensitivity and specificity of diagnostic tools against clinical reference standards, and exploratory subgroup analyses.

Ultrasound Classification Chest Prospective Concept Academic Lab Open Dataset

Genetically Optimized Modular Neural Networks for Precision Lung Cancer Diagnosis

Agrawal, V. L., Agrawal, T.

•preprint•Jun 30 2025

Lung cancer remains one of the leading causes of cancer mortality, and while low dose CT screening improves mortality, radiological detection is challenging due to the increasing shortage of radiologists. Artificial intelligence can significantly improve the procedure and also decrease the overall workload of the entire healthcare department. Building upon the existing works of application of genetic algorithm this study aims to create a novel algorithm for lung cancer diagnosis with utmost precision. We included a total of 156 CT scans of patients divided into two databases, followed by feature extraction using image statistics, histograms, and 2D transforms (FFT, DCT, WHT). Optimal feature vectors were formed and organized into Excel based knowledge-bases. Genetically trained classifiers like MLP, GFF-NN, MNN and SVM, are then optimized, with experimentations with different combinations of parameters, activation functions, and data partitioning percentages. Evaluation metrics included classification accuracy, Mean Squared Error (MSE), Area under Receiver Operating Characteristics (ROC) curve, and computational efficiency. Computer simulations demonstrated that the MNN (Topology II) classifier, specifically when trained with FFT coefficients and a momentum learning rule, consistently achieved 100% average classification accuracy on the cross-validation dataset for both Data-base I and Data-base II, outperforming MLP-based classifiers. This genetically optimized and trained MNN (Topology II) classifier is therefore recommended as the optimal solution for lung cancer diagnosis from CT scan images.

CT Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Efficient Chest X-Ray Feature Extraction and Feature Fusion for Pneumonia Detection Using Lightweight Pretrained Deep Learning Models

Chandola, Y., Uniyal, V., Bachheti, Y.

•preprint•Jun 30 2025

Pneumonia is a respiratory condition characterized by inflammation of the alveolar sacs in the lungs, which disrupts normal oxygen exchange. This disease disproportionately impacts vulnerable populations, including young children (under five years of age) and elderly individuals (over 65 years), primarily due to their compromised immune systems. The mortality rate associated with pneumonia remains alarmingly high, particularly in low-resource settings where healthcare access is limited. Although effective prevention strategies exist, pneumonia continues to claim the lives of approximately one million children each year, earning its reputation as a "silent killer." Globally, an estimated 500 million cases are documented annually, underscoring its widespread public health burden. This study explores the design and evaluation of the CNN-based Computer-Aided Diagnostic (CAD) systems with an aim of carrying out competent as well as resourceful classification and categorization of chest radiographs into binary classes (Normal, Pneumonia). An augmented Kaggle dataset of 18,200 chest radiographs, split between normal and pneumonia cases, was utilized. This study conducts a series of experiments to evaluate lightweight CNN models--ShuffleNet, NASNet-Mobile, and EfficientNet-b0--using transfer learning that achieved accuracy of 90%, 88% and 89%, prompting the task for deep feature extraction from each of the networks and applying feature fusion to further pair it with SVM classifier and XGBoost classifier, achieving an accuracy of 97% and 98% resepectively. The proposed research emphasizes the crucial role of CAD systems in advancing radiological diagnostics, delivering effective solutions to aid radiologists in distinguishing between diagnoses by applying feature fusion, feature selection along with various machine learning algorithms and deep learning architectures.

X-Ray Classification Chest Methodology In Silico Academic Lab

Using a large language model for post-deployment monitoring of FDA approved AI: pulmonary embolism detection use case.

Sorin V, Korfiatis P, Bratt AK, Leiner T, Wald C, Butler C, Cook CJ, Kline TL, Collins JD

•papers•Jun 30 2025

Artificial intelligence (AI) is increasingly integrated into clinical workflows. The performance of AI in production can diverge from initial evaluations. Post-deployment monitoring (PDM) remains a challenging ingredient of ongoing quality assurance once AI is deployed in clinical production. To develop and evaluate a PDM framework that uses large language models (LLMs) for free-text classification of radiology reports, and human oversight. We demonstrate its application to monitor a commercially vended pulmonary embolism (PE) detection AI (CVPED). We retrospectively analyzed 11,999 CT pulmonary angiography (CTPA) studies performed between 04/30/2023-06/17/2024. Ground truth was determined by combining LLM-based radiology-report classification and the CVPED outputs, with human review of discrepancies. We simulated a daily monitoring framework to track discrepancies between CVPED and the LLM. Drift was defined when discrepancy rate exceeded a fixed 95% confidence interval (CI) for seven consecutive days. The CI and the optimal retrospective assessment period were determined from a stable dataset with consistent performance. We simulated drift by systematically altering CVPED or LLM sensitivity and specificity, and we modeled an approach to detect data shifts. We incorporated a human-in-the-loop selective alerting framework for continuous prospective evaluation and to investigate potential for incremental detection. Of 11,999 CTPAs, 1,285 (10.7%) had PE. Overall, 373 (3.1%) had discrepant classifications between CVPED and LLM. Among 111 CVPED-positive and LLM-negative cases, 29 would have triggered an alert due to the radiologist not interacting with CVPED. Of those, 24 were CVPED false-positives, one was an LLM false-negative, and the framework ultimately identified four true-alerts for incremental PE cases. The optimal retrospective assessment period for drift detection was determined to be two months. A 2-3% decline in model specificity caused a 2-3-fold increase in discrepancies, while a 10% drop in sensitivity was required to produce a similar effect. For example, a 2.5% drop in LLM specificity led to a 1.7-fold increase in CVPED-negative-LLM-positive discrepancies, which would have taken 22 days to detect using the proposed framework. A PDM framework combining LLM-based free-text classification with a human-in-the-loop alerting system can continuously track an image-based AI's performance, alert for performance drift, and provide incremental clinical value.

CT Detection Chest Retrospective Clinical FDA Cleared FDA 510(k)Academic Lab GenAI Benchmark SOTA

Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles.

Shen X, Huang H, Nichyporuk B, Arbel T

•papers•Jun 30 2025

Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations. These unknown covariate shifts present significant challenges to deep learning based methods trained on "clean" images. This often results in unreliable predictions and poorly calibrated confidence, hence hindering clinical applicability. While recent methods have been developed to address specific issues such as confidence calibration or adversarial robustness, no single framework effectively tackles all these challenges simultaneously. To bridge this gap, we propose LaDiNE, a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models for improved reliability in medical image classification. Specifically, transformer encoder blocks are used as hierarchical feature extractors that learn invariant features from images for each ensemble member, resulting in features that are robust to input perturbations. In addition, diffusion models are used as flexible density estimators to estimate member densities conditioned on the invariant features, leading to improved modeling of complex data distributions while retaining properly calibrated confidence. Extensive experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods by simultaneously improving prediction accuracy and confidence calibration under unseen noise, adversarial perturbations, and resolution degradation.

X-Ray Classification Chest Methodology In Silico Academic Lab Benchmark SOTA

Filter Papers

Tags

Malignancy risk stratification for pulmonary nodules: comparing a deep learning approach to multiparametric statistical models in different disease groups.

Repeatability of AI-based, automatic measurement of vertebral and cardiovascular imaging biomarkers in low-dose chest CT: the ImaLife cohort.

CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images.

Automatic recognition and differentiation of pulmonary contusion and bacterial pneumonia based on deep learning and radiomics.

Artificial Intelligence Iterative Reconstruction for Dose Reduction in Pediatric Chest CT: A Clinical Assessment via Below 3 Years Patients With Congenital Heart Disease.

ToolCAP: Novel Tools to improve management of paediatric Community-Acquired Pneumonia - a randomized controlled trial- Statistical Analysis Plan

Genetically Optimized Modular Neural Networks for Precision Lung Cancer Diagnosis

Efficient Chest X-Ray Feature Extraction and Feature Fusion for Pneumonia Detection Using Lightweight Pretrained Deep Learning Models

Using a large language model for post-deployment monitoring of FDA approved AI: pulmonary embolism detection use case.

Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles.

Ready to Sharpen Your Edge?